RDS Analyst Manual

From HPMRG

(Difference between revisions)
Jump to: navigation, search
m (Sample Menu)
(Analyzing the data (the Population menu))
 
(204 intermediate revisions not shown)
Line 5: Line 5:
statistical methods.
statistical methods.
-
<u>RDS Analyst</u> has an easy-to-use graphical user interface to the powerful and sophisticated capabilities of of the computer package [http://r-project.org R].  <u>RDS Analyst</u> provides a comprehensive framework for working with RDS data, including tools for estimation, testing, confidence intervals and sensitivity analysis.
+
<u>RDS Analyst</u> has an easy-to-use graphical user interface to the powerful and sophisticated capabilities of the computer package [http://r-project.org R].  <u>RDS Analyst</u> provides a comprehensive framework for working with RDS data, including tools for sample and population estimations, testing, confidence intervals and sensitivity analysis.
Example capabilities are an easy format for entering data, the visualization of
Example capabilities are an easy format for entering data, the visualization of
Line 31: Line 31:
capabilities of the [http://r-project.org R] statistical language.
capabilities of the [http://r-project.org R] statistical language.
-
 
== Current State ==
== Current State ==
-
* It is an alpha version and is under continuous development and '''should not be distributed'''. Any requests for copies should be directed to the Hidden Population Methods Research Group (HPMRG) [Lisa, Krista, Cori and Mark]. This is because this version has advanced code in it and will have bad bugs. We want to support the program and do not want immature versions floating around that give it a bad name and which we will not be able to stamp out.  
+
* It is an betaversion and is under continuous development and '''should not be distributed'''. Any requests for copies should be directed to the Hard-to-Reach Population Methods Research Group (HPMRG) [Lisa, Krista, Cori, Ian, and Mark]. This is because this version has advanced code in it and will have bad bugs. We want to support the program and do not want immature versions floating around that give it a bad name and which we will not be able to stamp out.  
-
* The current version number is 0.07.
+
* The current version number is 0.1.
== Basic facts==
== Basic facts==
-
* <u>RDS Analyst</u> is written for the [http://r-project.org R] statistical environment
+
* <u>RDS Analyst</u> is written for the [http://r-project.org R] statistical environment.
* The current development form is for Windows and Macintosh. A LINUX version will be available in installers when it is released publicly.
* The current development form is for Windows and Macintosh. A LINUX version will be available in installers when it is released publicly.
-
The purpose of the initial alpha test stage is to:
+
The purpose of the initial beta test stage is to:
* Find basic installation and running problems in real-world environments.
* Find basic installation and running problems in real-world environments.
* See if the GUI design will work for your RDS users. How can we improve on it?
* See if the GUI design will work for your RDS users. How can we improve on it?
* See if the current (basic) features work as you expect.
* See if the current (basic) features work as you expect.
-
* Suggest features that we can add in versions post the workshop
+
* Suggest features that we can add in future versions.
-
 
+
== Installation ==
== Installation ==
-
=== Installation on a Windows PC ===
+
=== Installation on an Windows PC ===
The installer is at:
The installer is at:
-
[http://hpmrg.org/software/RDSASetup.0.03.exe http://hpmrg.org/software/RDSASetup.0.03.exe]
+
http://hpmrg.org/software/RDSAnalystSetup.0.1.exe
 +
 
 +
Download the install and double-click on it to install the software. To get your installation to the latest version of the packages, use the updater  at:
 +
 
 +
http://hpmrg.org/software/RDSAnalystUpdater.0.1.exe
 +
 
 +
and for the latest and secure version of Java:
-
Download the install and double-click on it to install the software.
+
http://hpmrg.org/software/jre-7u21-windows-i586.exe
-
This can install all program and utilities needed. If you already have some
+
This can install all programs and utilities needed. If you already have some elements installed you can deselect (or cancel) during the installs. It is recommended that you install this all the first time. This installer is over 110Mb in size and will take time to download.
-
elements installed you can deselect (or cancel) during the installs. It is
+
-
recommended that you install this all the first time. This installer is over
+
-
90Mb in size and will take time to download.
+
-
Subsequently, use this updater to keep your installation to the latest version
+
A reboot is not required. You do not need to uninstall any components to update (This includes R and Java). However the <u>RDS Analyst</u> application or the R application must not be running when you update.
-
of the packages:
+
-
[http://hpmrg.org/software/RDSAUpdater.0.03.exe http://hpmrg.org/software/RDSAUpdater.0.03.exe]
+
You need the Java Runtime Environment to use  <u>RDS Analyst</u>. If you get the message <i>A JRE has been found. Do you want to install another one anyway?</i>, it means that Java is already installed. In this case, click <i>No</i> so as to not reinstall it.
-
This just installs the core packages (that is anything that has changed since
+
After you install, you <b>should</b> keep your installation to the latest version of the packages by downloading the updater:
-
the full install was made). It will typically be a few Mb in size. The very
+
-
first time you install please install both Setup and Updater (in that order).
+
-
The release will not do this but it just saves time in the downloads.
+
-
A reboot is not required. You do not need to uninstall any components to update (This includes R, graphviz, and Java). However the <u>RDS Analyst</u> application or the R application must not be running when you update.
+
http://hpmrg.org/software/RDSAnalystUpdater.0.1.exe
'''Note for experienced users:'''
'''Note for experienced users:'''
-
You should install R with this package (even if you already have R installed separately).  This creates a private version of R for <u>RDS Analyst</u> to use and ensures <u>RDS Analyst</u> has the right version of R available for its use.  The two versions will peacefully coexist and you can use the other version of R just as you were originally.
+
This creates a private version of R for <u>RDS Analyst</u> to use and ensures <u>RDS Analyst</u> has the right version of R available for its use.  If you already have R installed separately, the two versions will peacefully coexist and you can use the other version of R just as you were originally.
-
=== Installation on a Apple Macintosh ===
+
<b>Finally, be sure to sign up for the</b> [[RDS Analyst Users Group]].
-
There is now a version for Apple Macintosh computers. They must have Intel CPUs (i.e., be purchased post-2006). To install:
+
-
* Download and install R-2.11.0 from here:
+
=== Installation on an Apple Macintosh ===
 +
There is a version for Apple Macintosh computers. They must have Intel CPUs (i.e., be purchased post-2006). To install:
-
[http://hpmrg.org/software/R-2.11.0.pkg http://hpmrg.org/software/R-2.11.0.pkg]
+
* Download and install R-3.0.1 from here: http://hpmrg.org/software/R-3.0.1.pkg
-
* Download the <u>RDS Analyst</u> Installer (as a zip file):
+
* Download the <u>RDS Analyst</u> Installer: http://hpmrg.org/software/RDSAnalystInstaller.0.1.dmg
 +
** It should mount as a disk-image. Double-click on the installer in it (i.e., "RDSAnalystInstaller") to install the software.
-
[http://hpmrg.org/software/RDSAnalystInstaller.mpkg.zip http://hpmrg.org/software/RDSAnalystInstaller.mpkg.zip]
+
A reboot is not required. You do not need to uninstall any components to update (This includes R and Java). However the <u>RDS Analyst</u> application or the R application must not be running when you update.
-
* Double-click on it to uncompress it and then double-click on the installer (i.e., "RDSAnalystInstaller") to install the software.
+
<u>RDS Analyst</u> uses Java to work.
 +
The last update for Mac OS X, Mountain Lion (aka 10.8) available since July 2012 does not come with Java pre-installed. You can check to see if you have java installed at http://javatester.org/version.html
 +
If you need to install Java, it is  http://hpmrg.org/software/jre-7u21-macosx-x64.dmg
 +
 
 +
The <u>RDS Analyst</u> application and R will be in your Applications folder. To run <u>RDS Analyst</u>, double-click on it in the Applications folder.
 +
 
 +
 
 +
After you install, you <b>should</b> use the updater to keep your installation to the latest version of the packages:
 +
 
 +
* Download the <u>RDS Analyst</u> Updater: http://hpmrg.org/software/RDSAnalystUpdater.0.1.dmg
 +
** It should mount as a disk-image. Double-click on the installer in it (i.e., "RDSAnalystUpdater") to install the software.
 +
 
 +
This just installs the core packages (that is, anything that has changed since the full install was made). It will typically be a few Mb in size.
 +
 
 +
Note: To use <u>RDS Analyst</u>, you need Java installed on your Mac. If you are using Mac OS X 10.6 and below, Apple's Java comes pre-installed. If you are using
 +
Mac OS X 10.7 (Lion) or Mac OS X 10.8 (Mountain Lion) and above then Java is not pre-installed.
 +
<!-- To get the latest Java 7 from Oracle, you will need Mac OS X 10.7.3 and above. -->
 +
<!-- If you have Java 7, you will see a Java icon under System Preferences.-->
 +
To install Java version 6, open the "Java Preferences.app" located in the Applications > Utilities folder on your Mac. It will ask if you want to install Java if it is not already there. Accept its invitation.
<!-- Here is an installation video (coming). -->
<!-- Here is an installation video (coming). -->
 +
 +
<b>Finally, be sure to sign up for the</b> [[RDS Analyst Users Group]].
== Starting <u>RDS Analyst</u> ==
== Starting <u>RDS Analyst</u> ==
Line 96: Line 114:
To start <u>RDS Analyst</u>, select the ''RDS Analyst'' menu under ''Programs'', and select the ''RDS Analyst'' program there. Alternatively, if you installed the desktop icon or the ''Quick Launch'' icon (the default), you can double-click on one of these to start the program. It will start the graphical-user interface. It may take a minute to do this but it is fast once loaded.  
To start <u>RDS Analyst</u>, select the ''RDS Analyst'' menu under ''Programs'', and select the ''RDS Analyst'' program there. Alternatively, if you installed the desktop icon or the ''Quick Launch'' icon (the default), you can double-click on one of these to start the program. It will start the graphical-user interface. It may take a minute to do this but it is fast once loaded.  
-
Once the application starts up, focus on the (top) ''Data Viewer'' window. This
+
The package has two main windows. The primary one is titled the "Console" window. To do statistical analysis you choose the menus on top of the ''Console''. The ''Console'' also records a log of all the commands and output produced from them. The other window is titled the ''Data Viewer''. This is where the data are displayed and can be edited. The "Data Viewer" provides an easy to use, spreadsheet-like environment to view and edit data. Copy and pasting is supported, and is compatible with Excel 2003/2007, so data can be moved from Excel  by simply copying it to the data viewer. Contextual menus can also be used to insert, delete and copy rows and columns.
-
is where the data is displayed and can be edited. To do statistical analysis you choose the menus on top of the other window (the ''Console''). The ''Console''  
+
 
-
also records a log of all the commands and output produced from them.
+
Once the application starts up, you may need to load the ''Data Viewer'' window if it is not visible. In the "Console" window, go to "Packages & Data" menu item, select "Package Manager, and click the boxes next to "DeducerRDSAnalyst" under the  "load" and "default" columns (two boxes will need to be clicked). This creates the "Data Viewer". Subsequent openings of the "RDS Analyst" will have the "Data Viewer" window open from the beginning and this only needs to be done the first time.
 +
 
 +
The first thing to do is to let <u>RDS Analyst</u> know where the directory where the data is and where the files associated with the project will be stored. This is typically referred to as the "<i>working</i>" directory, and the program will read and save the files there by default.
 +
 
 +
To do this, select the ''Set Working Directory'' menu item from the Console ''File'' menu. Then, typically, choose the directory where the data is stored.
== Quick start demo ==
== Quick start demo ==
 +
 +
There is a video tutorial to get you started. You can run it from the button on the "Data Viewer" window or directly at
 +
 +
[http://neolab.stat.ucla.edu/cranstats/RDSAnalyst_tutorial.mov http://neolab.stat.ucla.edu/cranstats/RDSAnalyst_tutorial.mov]
 +
 +
Below is a step-by-step tutorial in words.
=== Reading the NY Jazz dataset from RDSAT ===
=== Reading the NY Jazz dataset from RDSAT ===
* Select the ''Open Data'' menu item from the ''File'' menu.
* Select the ''Open Data'' menu item from the ''File'' menu.
-
* Use the dialog boxes to select the file <tt>nyjazz.rdsat</tt> from the directory <tt>RDS Analyst Example Data Sets</tt> on your desktop.
+
* Use the dialog boxes to select the file <tt>nyjazz.rdsat</tt> from the directory "<tt>RDS Analyst Example Data Sets</tt>" which is on your "<tt>Desktop</tt>". (<i>Hint</i>: If it does not appear, (in Windows) look in "<tt>C:\Program Files\RDS Analyst\R-2.15.0\library\RDSdevelopment\extdata</tt>". For Mac, look for "<tt>~/workspace/org/bin/RDSdevelopment/inst/extdata/nyjazz.rdsat</tt>" with possible adjustment depending on the name and location of your workspace.
-
* This will open up the ''Edit RDS Data Set Attributes'' window where you can add information about the data set (such as estimates of the population size). You can just click ''Run'' to go to OK the default values for now. The data is read in with an <tt>.rds</tt> extension to indicate it is an RDS data set (rather than just a regular spread sheet, say).
+
* The data is read in as a RDS format and with a <tt>(rds)</tt> preface to indicate it is an RDS data set (rather than just a regular spread sheet, say).
-
=== Looking and editing the data in the spread sheet (the ''Data'' menu) ===
+
=== Looking and editing the data in the spread sheet (the ''Data Viewer'') ===
-
* Go to the ''Data Viewer'' window and select the ''nyjazz.rds'' data set from the ''Data Set'' menu (center of the window pane)
+
* Go to the ''Data Viewer'' window and note that the ''(rds) nyjazz'' data set is selected in the ''Data Set'' menu (center of the window pane). If you load more than one data set you can select the one to view here.
 +
* Here you can look at the data in "spread-sheet format". There is more information on the "Data Viewer" [http://www.deducer.org/pmwiki/pmwiki.php?n=Main.TheDataViewer here]
 +
* On top of the spread-sheet are three tabs:
 +
** "Data View" which is the current spread-sheet view
 +
** "Variable View" which summarizes the variables and their properties.
 +
** "RDS" where you can add information about the data set (such as estimates of the population size). It is OK to go with the default values for now.
* Click on the ''Variable View'' tab. Click the value for ''Gender.MF.'' under the ''Type'' column and select ''Factor'' value. Repeat for ''Race.WBO.'', ''Airplay.yn.'', and ''Union.yn.''. This makes sure that the program recognizes them as categorical variables.
* Click on the ''Variable View'' tab. Click the value for ''Gender.MF.'' under the ''Type'' column and select ''Factor'' value. Repeat for ''Race.WBO.'', ''Airplay.yn.'', and ''Union.yn.''. This makes sure that the program recognizes them as categorical variables.
 +
* Click on the ''RDS'' tab.  Click the value for ''Mid'' box in the ''Population Size Estimate'' section. Enter an estimate of the population size for the NY jazz population. While this is unclear, a possible value is 20000. It is used in the computations, but they are somewhat insensitive to it (as long as it is not close to the sample size).
-
=== Running an RDS analysis (the ''Population'' menu) ===
+
=== Looking at the data  (the ''Plots'' menu) ===
-
* Select ''Plot Recruitment Tree'' from the ''Diagnostics'' menu. Select ''Run''. The plot will appear in your PDF viewer application.
+
* Select ''Plot Recruitment Tree'' from the ''Plots'' menu. Select ''Run''. The plot will appear in a new graphics window (called "JavaGD (2)", ha ha). This plots the recruitment trees, labeling each node by its ID. There are options in the dialog box you can play with to get different views. Select the "File" menu from the "JavaGD (2)" window to save the file as a PDF, JPG, etc.
-
* Select ''Interval Estimates'' from the ''Population'' menu.
+
* Select ''Recruitment Diagnostics'' from the Console ''Plots'' menu. Unselect the ''Recruitment tree'' box and select the ''Network size by wave'' box. Click ''Run''. This is a plot of the network size distributions by the wave number of the respondent.
-
* Select ''Gender.MF.'' as the ''Outcome Variables'' using the arrow buttons.
+
-
* Click the ''Run'' button.
+
-
* Look in the ''Console'' window for the results. This uses a computationally-intensive algorithm to compute the confidence interval and will take 60 seconds or more to return output to the ''Console'' window.
+
-
=== Exploratory analysis (the ''Sample'' menu) ===
+
=== Looking at the data (the ''Sample'' menu) ===
 +
This is for exploratory data analysis.
* Select ''Contingency Tables'' from the ''Sample'' menu.  
* Select ''Contingency Tables'' from the ''Sample'' menu.  
* Select ''Gender.MF.'' as the ''Row'' and ''Race.WBO.'' as the ''Column''
* Select ''Gender.MF.'' as the ''Row'' and ''Race.WBO.'' as the ''Column''
* Click the ''Run'' button
* Click the ''Run'' button
-
* Look in the ''Console'' window for the results.
+
* Look in the ''Console'' window for the results. On top of the console window there are two tabs ''Console view'' and ''Element View''. The ''Console view'' gives a continuous view of the commands sent to the <u>RDA Analyst</u> engine. The ''Element View'' shows the results of each command separately, and is usually better. Click on the ''Element View'' tab.
 +
* The results are in a contingency table output followed by a test of independence of the two factors.
 +
 
 +
=== Analyzing the data  (the ''Population'' menu) ===
 +
* Select ''Frequency Estimates'' from the ''Population'' menu.
 +
* Select ''Gender.MF.'' as the ''Variables'' using the arrow buttons.
 +
* Click the ''Run'' button.
 +
* Look in the ''Console'' window for the results. This uses a computationally-intensive algorithm to compute the confidence interval and will take 20 seconds or more to return output to the ''Console'' window. The output is a summary table with the point estimate and the 95% confidence intervals. It also reports the design effect, standard error and the sample size (the last is adjusted for any missing values. For the ''nyjazz'' data set 21 of the 264 respondents did not report a network size).
 +
 
 +
=== Saving the data ===
 +
 
 +
* To save the data and any edits to it in a file, choose ''Save Data'' from the ''File'' menu in ''Console''. You can save it anywhere. It will save it with a <tt>.rdsobj</tt> so that next time you read it in the program will know that it is an RDS data set and not ask you to reenter all the information again.
=== Saving the results ===
=== Saving the results ===
-
* To save the results in a file, choose ''Save'' from the ''File'' menu in ''Console'', and make sure ''Results'' is selected from the ''Options:''. You will need to add an extension to the file name, and we suggest <tt>txt</tt> (e.g., the file name <t>rdsat_simple.txt</tt>). This should be open with <tt>WordPad</tt> under Windows as  
+
* To save the results in a file, choose ''Save'' from the ''File'' menu in ''Console'', and make sure ''Results'' is selected from the ''Options:''. You will need to add an extension to the file name, and we suggest <tt>txt</tt> (e.g., the file name <tt>rdsat_simple.txt</tt>). This should be open with, e.g., <tt>WordPad</tt> under Windows as it is a simple text file.
* To save the commands used to create the output in a file, choose ''Save'' from the ''File'' menu in ''Console'', and make sure ''Commands'' is selected from the ''Options:''.  
* To save the commands used to create the output in a file, choose ''Save'' from the ''File'' menu in ''Console'', and make sure ''Commands'' is selected from the ''Options:''.  
* To save the complete output (results interspersed with the commands that produced them) in a file, choose ''Save'' from the ''File'' menu in ''Console'', and make sure ''Complete output'' is selected from the ''Options:''.
* To save the complete output (results interspersed with the commands that produced them) in a file, choose ''Save'' from the ''File'' menu in ''Console'', and make sure ''Complete output'' is selected from the ''Options:''.
Line 133: Line 176:
== Getting started (seriously) ==
== Getting started (seriously) ==
-
Once the application starts up, focus on the (top) '''Data Viewer''' window. This is where you read in the data.  The other window is '''Console''' is where most of the analysis takes place. It also records a log of all the commands and output. It can be ignored for now.  
+
Once the application starts up, focus on the (top) '''Data Viewer''' window. This is where you read in the data.  The other window is '''Console''' where most of the analysis takes place. It also records a log of all the commands and output. It can be ignored for now.
 +
 
 +
== The Windows ==
 +
 
 +
The "Console" window has three parts to it. The panel along the left side is a navigation panel. You can use it to quickly go from one statistical output to another. You can also remove output that you don't want by selecting it in the navigation panel and pressing the''Remove'' button at the bottom of the navigation panel. The panel along the bottom is a command console. You can type in commands in the R language here. The panel in the upper right, which takes up most of the screen, is the output window. By default, <u>RDS Analyst</u> commands that are submitted for execution show up as red text and the generated output shows up as blue text. There are two tabs at the top of the output panel. ''Console view'' shows all the output, like one long ream of paper. ''Element view'' shows you just a single output element at a time, without any red <u>RDS Analyst</u>  commands. You can click between these to get a view of the last entered and the history of the commands and output.
== Loading RDS data ==
== Loading RDS data ==
-
<u>RDS Analyst</u> can read in a wide range of data formats from other packages including [http://www.spss.com/ SPSS] (*.sav), SAS export (*.xpt), and Excel (via Comma separated *.csv).  
+
<u>RDS Analyst</u> can read in a wide range of data formats from other packages including [http://www.spss.com/ SPSS] (*.sav), SAS export (*.xpt), and Excel (via *.xls and other native Excel formats). It can also read in Comma separated (*.csv) files.  
-
For a general description of this, see [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.OpenData Open Data].
+
For a general description of this, see [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.OpenData Open RDS Data Set].
-
It also directly reads in RDSAT files (they need to be renamed with a *.rdsat extension to be recognized automatically).
+
It also directly reads in RDSAT files (they should be text files to be recognized automatically or else the correct variable names will not show up).
If it is not an RDSAT file, <u>RDS Analyst</u> expects the data to be in a "spread-sheet" format containing the RDS survey data with recruitment information. This should represent valid RDS survey data. The sheet must have one row for each respondent (i.e. case), and columns for each survey response variable. In addition, the recruitment information can be specified in two ways:
If it is not an RDSAT file, <u>RDS Analyst</u> expects the data to be in a "spread-sheet" format containing the RDS survey data with recruitment information. This should represent valid RDS survey data. The sheet must have one row for each respondent (i.e. case), and columns for each survey response variable. In addition, the recruitment information can be specified in two ways:
-
* '''Coupon format''': Basically like RDSAT but without the two header lines.
+
* '''Coupon format''': Basically like RDSAT (whereby the first column is comprised of serial numbers, the second column is the network size data, the third column is the participant id numbers and the third, fourth and fifth columns are generally the recruitment coupons) but without the two header lines.
* '''Recruiter ID format''': Here it expects columns with the following names:
* '''Recruiter ID format''': Here it expects columns with the following names:
** '''id''': A column of integers giving unique '''id'''s for each respondent (i.e., row of the spreadsheet).
** '''id''': A column of integers giving unique '''id'''s for each respondent (i.e., row of the spreadsheet).
** '''recruiter.id''': A column of '''id'''s indicating the recruiter for that respondent (that is '''id'''). Recruiters can be identified by elements of '''id''' or as 0 for seeds.
** '''recruiter.id''': A column of '''id'''s indicating the recruiter for that respondent (that is '''id'''). Recruiters can be identified by elements of '''id''' or as 0 for seeds.
-
When you read in a RDS data file you will be presented with an
+
When you read in a RDS data file you will be presented with a
-
screen to ''Edit RDS Data Set Attributes''. This enables you to set the maximum number of coupons, the network size variable and other data set characteristics before the RDS data set is ready for use. The package creates a data frame with other information in it, like the '''recruiter.id''': a column of integers giving the '''id''' of the recruiter for the respondent in that row. A value of 0 means the person was a seed.
+
screen to ''Load RDS Data'' dialog. This enables you to specify the coupon variables, the subject ID, the network size variable in the data, etc. This is essential to identify the RDS data set. You can also (optionally) set the maximum number of coupons, the population size estimate and other data set characteristics before the RDS data set is ready for use. When you are done, click ''Run" to create the data set. It will appear in the ''Data Viewer''.
 +
 
 +
The package creates a spread-sheet with other information in it, like the '''recruiter.id''': a column of integers giving the '''id''' of the recruiter for the respondent in that row. A value of 0 means the person was a seed. It also creates a variable called ''seed'' which is the '''id''' of the seed recruiter for the respondent in that row (that is, the recruiter of the recruiter, etc, until you find the wave 0 seed recruiter).  It also creates a variable called ''wave'' which is the wave for the respondent in that row (that is, the number of recruiters one must go back to to find the seed for that person. The wave of a seed is 0. These variables are stored with the data set and can be analysed like other variables, including subset selection in estimation.
-
* If you read in a RDSAT file the RDS data set is created automatically as it already has the necessary information in it
+
* If you read in a RDSAT file the RDS data set is created automatically as it already has the necessary information in it. If you use a CSV file (e.g., from Excel) you will need to specify the type of formatting in the file in the dialog that is presented.  
-
. If you use a CSV file (e.g., from Excel) you will need to specify the type of formatting in the file in the dialog that is presented.  
+
* For files in Coupon format, you will need to select the coupon and network variables. If the coupon variables follow the id and network size variable in column order then you only need to specify the maximum number of coupons. The program computes the recruiter.id variable for this information.
* For files in Coupon format, you will need to select the coupon and network variables. If the coupon variables follow the id and network size variable in column order then you only need to specify the maximum number of coupons. The program computes the recruiter.id variable for this information.
* For files in Recruiter id format, you will need to select the '''id''', '''recruiter.id''' and network size variables.  
* For files in Recruiter id format, you will need to select the '''id''', '''recruiter.id''' and network size variables.  
-
* If the spreadsheet is called ''foo'' (say), the RDS data set is called ''foo.rds''. '''You should base all analysis on ''foo.rds'''''.
+
* If the spreadsheet is called ''foo'' (say), the RDS data set is called ''foo'' and will appear with the <tt>(rds)</tt> prefix to let you know it is an RDS data set.
-
* '''Select the RDS Data Set''': After you have read in the data, use the menu item under the ''Data'' menu to specify the RDS data set you will be working on. If you want to change RDS data sets, choose this menu item again to change it.
+
* You can read in and analyse multiple data sets at the same time. After you have read in the data, use the menu item under the ''Data Set'' options in the dialoges  to specify the RDS data set you will be working on.
== Saving RDS data ==
== Saving RDS data ==
Line 162: Line 210:
<u>RDS Analyst</u> can save the data sets created in a wide range of data formats. We recommend it be saved as in the internal data format for R for easy later reading (*.rda).
<u>RDS Analyst</u> can save the data sets created in a wide range of data formats. We recommend it be saved as in the internal data format for R for easy later reading (*.rda).
-
For a description of this see  [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.SaveData Save Data].
+
For a description of this see  [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.SaveData Save RDS Data Set].
== The ''Data Viewer'' window ==
== The ''Data Viewer'' window ==
Line 169: Line 217:
You can move between the two windows by clicking on them, or by using the
You can move between the two windows by clicking on them, or by using the
-
''Window'' menu.
+
''Window'' menu in the ''Console" to (re)open the ''Data Viewer''.
The ''Data Viewer'' provides an easy to use, spreadsheet-like environment to view and edit RDS data (or in fact, any spreadsheet data you load). Copy and pasting is supported, and is compatible with Excel 2003/2007, so data can be moved from Excel to R by simply copying it to the data viewer. Contextual menus are used to insert, delete and copy rows and columns.
The ''Data Viewer'' provides an easy to use, spreadsheet-like environment to view and edit RDS data (or in fact, any spreadsheet data you load). Copy and pasting is supported, and is compatible with Excel 2003/2007, so data can be moved from Excel to R by simply copying it to the data viewer. Contextual menus are used to insert, delete and copy rows and columns.
-
If there are any data frames loaded in the R session, they can be viewed by selecting them from the ''Data Set'' list. Data can be loaded into the R session by clicking ''Open Data'' button in the top left hand corner. The Currently viewed data set can be saved using the ''Save Data'' button directly to the right of the ''Open Data'' button. The Currently viewed data set can be removed from the R session by clicking the button in the upper right.
+
If there are any data frames loaded in the R session, they can be viewed by selecting them from the ''Data Set'' list. Data can be loaded into the R session by clicking ''Open RDS Data Set'' button in the top left hand corner. The Currently viewed data set can be saved using the ''Save RDS Data Set'' button directly to the right of the ''Open RDS Data Set'' button. The currently viewed data set can be removed from the R session by clicking the button in the upper right.
-
The data viewer has two modes ''Data view'' and ''Variable view'' which can be freely switched between them using the tabs. The ''Variable view'' enables you to edit the variable types of the data read in. Categorical variables (including binary variables) should be set to type "Factor" by clicking on their entry and selecting it from the menu.
+
The data viewer has two modes ''Data view'' and ''Variable view'' which can be freely switched between them using the tabs. The ''Variable view'' enables you to edit the variable types of the data read in. Categorical variables (including binary variables) should be set to type ''Factor'' by clicking on their entry and selecting it from the menu.
For details see [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.TheDataViewer data viewer].
For details see [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.TheDataViewer data viewer].
Line 183: Line 231:
== The menu structure of the ''Console'' window ==
== The menu structure of the ''Console'' window ==
-
The top menus:
+
Most of the work is done using these menu. The menus on the top of the ''Console'' window are:
* File:
* File:
* Edit:
* Edit:
Line 189: Line 237:
* Data:
* Data:
* Sample:
* Sample:
-
* Diagnostics:
+
* Population:
-
* Population
+
* Plots:
* Packages & Data:
* Packages & Data:
* Window:
* Window:
Line 197: Line 245:
=== File Menu ===
=== File Menu ===
-
* Create a new data set, open a data set and save the current data set in a file.
+
* Open a data set and save the current data set in a file.
-
* A '''text editor''': View, create and edit a text file. Use ''Open Document'' or ''New Document'' items to do this.
+
* The ''Open Data'' menu item is the primary way to read in an RDS data set from a text, CSV or other data file.
-
* Run a text file of R commands directly in R using the ''Source ...'' item.
+
* Create, open or save text files using "New Document", "Open Document" and "Save", respectively. This is a simple editor for text files such as files of R commands or notes on the analysis.
-
* '''Print''' a file
+
* Set the default place <u>RDS Analyst</u>  looks for data and saves output using the ''Set Working Directory'' item.
-
* Quit
+
* Quit the program.
=== Edit Menu ===
=== Edit Menu ===
* Copy, Cut and Paste text
* Copy, Cut and Paste text
-
* '''Preferences''': To change font, set the default "working" directory where files are looked for and saved, etc. There is a separate panel for <u>RDS Analyst</u> specific features. This enables you to change the defaults (like the location of the graphviz binary if it is installed in a non-standard place)
+
* Undo and Redo
 +
* Search and Find
 +
* Increase or decrease the font size
 +
* '''Preferences''': To change font, set the default "working" directory where files are looked for and saved, etc. There is a separate panel for <u>RDS Analyst</u> specific features called ''Data Viewer''. This enables you to change the defaults (like if you are using the basic or professional version).
=== Workspace Menu ===
=== Workspace Menu ===
-
Objects that you create during an <u>RDS Analyst</u> session are held in computer memory. The collection of objects that you currently have is called the '''workspace'''. This workspace is not saved on disk unless you tell <u>RDS Analyst</u> to do so. This means that your objects are lost when you close R and not save the objects, or worse when R or your system crashes on you during a session.
+
Objects that you create during an <u>RDS Analyst</u> session are held in computer memory. The collection of objects that you currently have is called the '''workspace'''. This workspace is not saved on disk unless you tell <u>RDS Analyst</u> to do so. This means that your objects are lost when you close the program and not save the objects, or worse when the program or your system crashes on you during a session.
When you exit <u>RDS Analyst</u>, you will be asked if you want to save your workspace. This will allow you to have the same data sets available the next time you start <u>RDS Analyst</u>. This will help to resume work on a project at the same point.
When you exit <u>RDS Analyst</u>, you will be asked if you want to save your workspace. This will allow you to have the same data sets available the next time you start <u>RDS Analyst</u>. This will help to resume work on a project at the same point.
Line 224: Line 275:
This is to recode and modify the data in the data viewer. This means you do not have to go back to [http://www.spss.com/ SPSS], SAS or Excel to recode, etc.
This is to recode and modify the data in the data viewer. This means you do not have to go back to [http://www.spss.com/ SPSS], SAS or Excel to recode, etc.
-
The term "factor" in R designates a categorical variable. In general factors are nominal but we use factors to represent both ordinal and nominal variables. By labeling a variable as a factor, <u>RDS Analyst</u> will treat it appropriately when analyzing it.
+
The term "factor" in the underlying R engine designates a categorical variable. In general factors are nominal but we use factors to represent both ordinal and nominal variables. By labeling a variable as a factor, <u>RDS Analyst</u> will treat it appropriately when analyzing it.
Click on the links below to get help on the following capabilities:
Click on the links below to get help on the following capabilities:
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.EditFactor Edit Factor]: Add or subtract the values of a categorical variable.
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.EditFactor Edit Factor]: Add or subtract the values of a categorical variable.
-
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.RecodeVariables Recode Variables]: You can recode variables into variables with new names. From the ''Recode Variables'' dialog, select the recode you want to re-target from the ''Variables to Recode'' list (e.g. "sex -> sex"). Then click on the ''Target'' button on the right. That will let you type in the name of the new variable (e.g., "sexMF"). The recode with show something like "sex -> sexMF". The original "sex" variable will be unchanged and "sexMF" will appear in the ''Data Viewer'' window.
+
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.RecodeVariables Recode Variables]: You can recode variables into variables with new names. From the ''Recode Variables'' dialog, select the recode you want to re-target from the ''Variables to Recode'' list (e.g. <tt>sex -> sex</tt>). Then click on the ''Target'' button on the right. That will let you type in the name of the new variable (e.g., <tt>sexMF</tt>). The recode with show something like <tt>sex -> sexMF</tt>. The original "sex" variable will be unchanged and "sexMF" will appear in the ''Data Viewer'' window.
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.TransformVariables Transform the variables]: These can be very complex, if needed.
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.TransformVariables Transform the variables]: These can be very complex, if needed.
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.ResetRowNames Reset Row Names]
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.ResetRowNames Reset Row Names]
-
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.Sort Sort]
+
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.Sort Sort]
-
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.MergeData Merge Data]
+
* '''Edit Meta Data''': Specify here the characteristics of the RDS data like the maximum number of coupons, the network size variable, the missing data symbol, and population size estimates.
-
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.Transpose Transpose]
+
* '''Convert to RDS''': This is a direct and manual way to (re) convert a data set into an RDS data set. It is rarely used (as the program does this automatically) but can be useful if the automatic method missed something.
-
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.Subset Subset]: Use this to create a version of the data set with the cases of your choice excluded. In the ''Subset Expression'' box enter an expression for those you want to retain (e.g., ''HIV < 2'') and click ''OK''. This will create a data set with a name with the suffix <tt>.sub</tt>. Now select this in the ''Data Viewer'' and run any procedure (e.g. ''Point Estimates'') to see results only for the retained cases.
+
* If you read in data from a CSV or other spreadsheet file (i.e., not an RDSAT data file) and edit it to make it an RDS data set then an RDS data set is formed from the current state of the spread sheet. If the spreadsheet is called ''foo'' (say), the RDS data set is also titled ''foo''.
-
* '''Select the RDS Data Set''': Use this to specify the RDS data set you will be working on. If you want to change RDS data sets, choose this again to change it.
+
-
* '''Edit RDS Data Set Attributes''': Specify here the characteristics of the RDS data like the maximum number of coupons, the network size variable, the missing data symbol, and population size estimates.
+
-
* '''Create RDS Data Set''': If you read in data from a CSV or other spreadsheet file (i.e., not an RDSAT data file) and edit it to make it an RDS data set then you can use this menu item to form an RDS data set from it.  When you do this the RDS data set is formed from the current state of the spread sheet. If the spreadsheet is called ''foo'' (say), the RDS data set is called ''foo.rds''.
+
=== Sample Menu ===
=== Sample Menu ===
-
This is for exploratory analysis of the RDS data.
+
This is for exploratory analysis of the RDS data. It is used to describe the characteristics of the sample.
It deals with continuous data, categorical data and descriptive data.
It deals with continuous data, categorical data and descriptive data.
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.Frequencies Frequencies]: Tables of one or more variables, possibly stratified by others (like SPSS)
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.Frequencies Frequencies]: Tables of one or more variables, possibly stratified by others (like SPSS)
-
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.Descriptives Descriptives]: This uses RDS weights to compute population estimates (rather than samples averages).
+
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.Descriptives Descriptives]: This produces sample summaries such as means, medians, quantiles, standard deviations  and extremes.
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.ContingencyTables Contingency Tables]: Cross tabs. They include tests (which are dubious because of the dependence).
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.ContingencyTables Contingency Tables]: Cross tabs. They include tests (which are dubious because of the dependence).
 +
* Recruitment Homophily: Compute a homophily measure for recruitment process. Do respondents differential recruit people like themselves? That is, the homophily on a variable in the recruitment chains. Take as an example HIV status. In this case, it is the ratio of number of recruits that have the same HIV status as their recruiter to the number we would expect if there was no homophily on HIV status. The difference with the Population Homophily (see below) is that this is in the recruitment chain rather than the population of social ties. For example, of the recruitment homophily on HIV status is about 1, we see little effect of recruitment homophily on HIV status (as the numbers of homophilous pairs are close to what we would expect by chance).
<!--  * [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.OneSampleTest One Sample Test]: -->
<!--  * [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.OneSampleTest One Sample Test]: -->
<!--  * [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.TwoSampleTest Two Sample Test]: -->
<!--  * [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.TwoSampleTest Two Sample Test]: -->
Line 256: Line 305:
<!--  * [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.GeneralizedLinearModel Generalized Linear Model]: -->
<!--  * [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.GeneralizedLinearModel Generalized Linear Model]: -->
-
''Frequencies'', ''Descriptives'', ''Contingency Tables'' use RDS weights to compute population estimates (rather than samples averages of the RDS data). The other entries act as if the data are an independent sample and are not (yet) RDS aware.
+
''Frequencies'', ''Descriptives'', ''Contingency Tables'' do not use RDS weights to compute population estimates (but rather are samples averages of the RDS data). The other entries act as if the data are an independent sample and are deliberately not RDS aware.
-
The basic descriptive, cross-tabs and frequencies use the RDS weights. These are stored in the data.frame and added whenever estimates are computed. The default is the Gile sequential sampling (SS) weights.
 
<!-- For the others, see the [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.DeducerManual Deducer manual]. -->
<!-- For the others, see the [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.DeducerManual Deducer manual]. -->
-
=== The Diagnostics menu===
+
=== The Population menu===
-
* Plot Recruitment Tree: Produces a publication quality graphics plot of the recruitment tree.
+
This computes estimates of the '''population''' characteristics based on the RDS sample.
-
* Bar plot of the number of recruits by wave
+
Specifically, the procedures use RDS weights to compute population estimates (rather than samples averages of the RDS data).
-
* Scatter plot of the respondent degree verses wave
+
-
* Histogram of the number of recruits for each respondent
+
-
* Bar Chart of the number of recruits from each seed
+
-
* Make all Diagnostics: Does all of the above
+
-
The dialogs retain their current settings and produce a publication quality PDF plots.
+
It deals with continuous data, categorical data and descriptive data and complements the ''Sample'' menu that describes the sample.
-
=== The Population menu===
+
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.Frequencies Frequency Estimates]: Tables of one or more variables, possibly stratified by others (like SPSS).
 +
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.Descriptives Descriptive Estimates]: This produces estimates of the population means, medians, quantiles, standard deviations  and extremes.
 +
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.ContingencyTables Population Crosstabs]: Cross tabulations of categorical variables. They include tests (which are dubious because of the dependence).
 +
* Population Test Diffference in Proportions: Test the hypothesis that two population proportions are equal.
-
This computes estimates of the '''population''' characteristics.
+
The basic descriptive, cross-tabs and frequencies use the current weights. These are stored in the data and added whenever estimates are computed. The default is the Gile sequential sampling (SS) weights.
-
It deals with continuous data, categorical data and descriptive data and complements the ''Sample'' menu that describes the sample.
+
The most important entry is the '''Frequency Estimates''' entry which computes confidence intervals for population proportions. The default method is Gile's SS estimator. The confidence interval is computed using Gile's bootstrap method.  This is a computationally-intensive procedure and can take a minute or longer to complete.
-
''Frequencies'', ''Descriptives'', ''Contingency Tables'' use RDS weights to compute population estimates (rather than samples averages of the RDS data). The other entries act as if the data are an independent sample and are not (yet) RDS aware.
+
* Estimate the Homophily in the Population for a given variable: Consider, for example HIV status. Population homophily is the homophily in the HIV status of two people who are tied in the underlying population social network (a "couple"). Specifically, the population homophily is the ratio of the expected number of HIV discordant couples absent homophily to the expected number of HIV discordant couples with the homophily. Hence larger values of population homophily indicate more homophily on HIV status. For example, a value of 1 means the couple are random with respect to HIV status. A value of 2 means there are twice as many HIV discordant couples as we would expect if there was no homophily in the population. This measure is meaningful across different levels of differential activity. As we do not see most of the population network, we estimate the population homophily from the RDS data. As an example, suppose the population homophily on HIV is 0.75 so there are 25% more HIV discordant couples than expected due to chance. So their is actually heterophily on HIV in the population. If the population homophily on sex is 1.1, there are 10% more same-sex couples than expected due to chance. Hence there is modest homophily on sex.
 +
* Estimate the Differential Activity in the Population: This is the ratio of the mean network size for those with the outcome to the mean network size of those without it.
-
The basic descriptive, cross-tabs and frequencies use the current weights. These are stored in the data.frame and added whenever estimates are computed. The default is the Gile SS weights.
+
=== The Plots menu===
-
The most important entry is the '''Interval Estimates''' entry which computes confidence intervals for population proportions. The default method is Gile's SS estimator. The confidence interval is computed using Gile's bootstrap methodThis is a computationally-intensive procedure and can take a minute or longer to complete.
+
This is to look at the RDS data with an eye to using it to estimate population characteristics. This looks at diagnostics of the sampling and possible seed dependency.
 +
 
 +
* Plot Recruitment Tree: Produces a publication quality graphics plot of the recruitment tree.
 +
* Recruitment Diagnostics: Produces a publication quality graphics of various diagnostics:
 +
** Bar plot of the number of recruits by wave
 +
** Scatter plot of the network size verses wave
 +
** Bar Chart of the number of recruits from each seed
 +
** Histogram of the number of recruits for each respondent
 +
** Boxplots by wave, seed, etc:
 +
* [http://www.deducer.org/pmwiki/index.php?n=Main.PlotBuilder Plot Builder]: An interface to create both simple and sophisticated plot from the data. It includes pie charts, histograms, barplots, scatter plots, bubble plots, and many options.  As part of Plot Builder, you can import an existing template for the plot ("Import Template"), open an exisiting plot ("Open Plot"), or work interactively with existing plot primary forms ("quick and interactive").
 +
 
 +
The dialogs retain their current settings and produce a publication quality PDF plots.
=== Packages & Data Menu ===
=== Packages & Data Menu ===
   
   
-
* We do not use this very much.
+
* ''Data Viewer''  enables you to re-open the ''Data Viewer'' and so look at the RDS data sets (and other data frames).  
-
* This enable you to install additional packages for R and look at the packages currently loaded.
+
* ''Object Browser'' allows you to edit and view any "objects" in the workspace, such as RDS data sets, spreadsheets (data.frames), functions, etc.
-
* This also allows you to edit and view any "objects" in the workspace, such as RDS data sets, spreadsheets (data.frames), functions, etc.
+
* "GUI Add-ons" are additional packages that may be helpful, such text handling, capabilities for spatial analysis, etc.
 +
* "Package Manager" enables you to load or unload additional packages from those installed and set which packages will be loaded at startup.
 +
* "Package Installer" enables you to install additional packages for the underlying R engine over the internet. There are a vast array of packages and capabilities available for the analysis of (RDS) data.
 +
* ''Example RDS data sets'': These are example RDS data sets that come from populations with known charactistics.
 +
** Example: ''faux'': An artificial data set of size 389, starting from a single seed. It has two categorical outcome variables.
 +
** Example: ''fauxmadrona'': A data set of size 500 drawn from a population of size 1000. The outcome variable is disease status. The 10 seeds are randomly drawn.
 +
** Example: ''fauxsycamore'': A data set of size 500 drawn from a population of size 715. The outcome variable is disease status.  The 10 seeds are drawn from the infected population, so there is extreme dependency induced by seed selection.
 +
To get information on the data sets, choose ''RDS Analyst Reference Manual'' from the ''Help'' menu. The first time it is used it will start a browser for help (which will take a few seconds to load). From there, you can enter any term in the "Search" box and search all the documentation for it. Try e.g., "fauxmadrona".
=== Window Menu ===
=== Window Menu ===
Line 299: Line 365:
* To get help, choose ''R help'' from the ''Help'' menu. The first time it is used it will start a browser for help (which will take a minute to load).
* To get help, choose ''R help'' from the ''Help'' menu. The first time it is used it will start a browser for help (which will take a minute to load).
-
* You can also get help on the ''RDS Analyst'' package here.
+
* You can also get help on many of the common menu items and options via ''Deducer Help''.
 +
* You can also get help an introduction on the program  via ''RDS Analyst Introduction Manual'' (This manual).
 +
* The ''RDS Analyst Reference Manual'' provides the details of all the statistical routines that under the program. It is searchable and detailed.
 +
* The ''Citation information'' provides the details of how to cite the software in your papers. Please do this!
 +
 
 +
On the Macintosh version there is a "JGR" menu on the left-hand side. Under it there is:
 +
* '''Preferences''': To change font, set the default "working" directory where files are looked for and saved, etc. There is a separate panel for <u>RDS Analyst</u> specific features called ''Data Viewer''. This enables you to change the defaults (like if you are using the basic or professional version).
 +
* Quit the program.
== Saving the results, the output and/or the batch commands that produced them! ==
== Saving the results, the output and/or the batch commands that produced them! ==
Line 305: Line 378:
When a dialog box is ''Run'' it creates output in the ''Console'' window.  
When a dialog box is ''Run'' it creates output in the ''Console'' window.  
-
To save the results in a file (typically at the end of a session), choose ''Save'' from the ''File'' menu in ''Console'', and make sure ''Results'' is selected from the ''Options:''.
+
To save the results in a file (typically at the end of a session), choose ''Save Console'' from the ''File'' menu in ''Console'', and make sure ''Results'' is selected from the ''Options:''.
-
To save the commands used to create the output in a file (typically at the end of a session), choose ''Save'' from the ''File'' menu in ''Console'', and make sure ''Commands'' is selected from the ''Options:''.  
+
To save the commands used to create the output in a file (typically at the end of a session), choose ''Save Console'' from the ''File'' menu in ''Console'', and make sure ''Commands'' is selected from the ''Options:''.  
-
To save the complete output (results interspersed with the commands that produced them)  in a file (typically at the end of a session), choose ''Save'' from the ''File'' menu in ''Console'', and make sure ''Complete output'' is selected from the ''Options:''.
+
To save the complete output (results interspersed with the commands that produced them)  in a file (typically at the end of a session), choose ''Save Console'' from the ''File'' menu in ''Console'', and make sure ''Complete output'' is selected from the ''Options:''.
== Example data ==
== Example data ==
Line 320: Line 393:
the directory <tt>RDS Analyst Example Data Sets</tt> .
the directory <tt>RDS Analyst Example Data Sets</tt> .
<!-- "C:\Program Files\R\R-2.11.0\library\RDSdevelopment\extdata\nyjazz.rdsat".  -->
<!-- "C:\Program Files\R\R-2.11.0\library\RDSdevelopment\extdata\nyjazz.rdsat".  -->
-
It can be opened from the ''Open Data'' dialog box (for example). It is the same as the RDSAT file <tt>nyjazz.txt</tt> with the extension changed to <tt>.rdsat</tt> so the package will recognize it automatically.
+
It can be opened from the ''Open RDS Data Set'' dialog box (for example). It is the same as the RDSAT file <tt>nyjazz.txt</tt> with the extension changed to <tt>.rdsat</tt> so the package will recognize it automatically.
== Getting Help from within the package ==
== Getting Help from within the package ==
-
To get help, choose ''R help'' from the ''Help'' menu. The first time it is used it will start a browser for help (which will take a few seconds to load).
+
To get help, choose ''RDS Analyst Introduction Manual'' from the ''Help'' menu. The first time it is used it will start a browser with the RDS Analyst manual page on the hpmrg.org wiki  (which will take a few seconds to load).
 +
 
 +
* You can enter any term in the "Search" box and search all the documentation for it. Try e.g., "plot".
 +
 
 +
To get help on the graphical user interface, choose ''Deducer Help'' from the ''Help'' menu. From here you can search also.
 +
 
 +
To get help on R, choose ''R Help'' from the ''Help'' menu.  
* Click on an item to get help about R and to get started with R. The "An Introduction to R" is a particularly useful reference for beginners.
* Click on an item to get help about R and to get started with R. The "An Introduction to R" is a particularly useful reference for beginners.
* Click on the ''packages'' tab to get specific help on packages (like ''RDS'').
* Click on the ''packages'' tab to get specific help on packages (like ''RDS'').
-
* Click on the ''RDSgui'' package to get help with the commands underlying the menus (except the ''RDS'' menu).
+
 
-
* Click on the ''RDSdevelopment'' package to get help with the commands underlying the ''RDS'' menu.
+
To get help, choose ''RDS Analyst Reference Manual'' from the ''Help'' menu.
 +
* You can enter any term in the "Search" box and search all the documentation for it. Try e.g., "RDS-II".
 +
* Click on the "00Index" tab and it will list all the function in the "RDS" package. Click on an item to get help.
 +
* Click on the ''packages'' tab to get specific help on packages (like ''RDS'').
 +
* Click on the ''DeducerRDSAnalyst'' package to get help with the commands underlying some of the menus (except the ''RDS'' menu).
 +
* Click on the ''RDSdevelopment'' package to get help with the commands underlying the 'menus.
** This gives help on the data sets.
** This gives help on the data sets.
** Click on e.g., RDS.I.estimates to get help on the RDS-I estimate function.
** Click on e.g., RDS.I.estimates to get help on the RDS-I estimate function.
-
* To get the online manual for the package, choose ''RDS help'' from the ''Help'' menu in the ''Data Viewer'' window.
+
 
 +
To get help on R, choose ''R Help'' from the ''Help'' menu.
 +
 
 +
* Click on an item to get help about R and to get started with R. The "An Introduction to R" is a particularly useful reference for beginners.
 +
* Click on the ''packages'' tab to get specific help on packages (like ''RDS'').
== Tips and FAQ  ==
== Tips and FAQ  ==
 +
* Many dialogs have a ''Subset'' option. Use this to analyse a subset of the data set with the cases of your choice included. In the ''Subset'' box enter an expression for those you want to retain (e.g., ''HIV < 2'') and click ''Run''. This will analyse the data set for only the retained cases. See  [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.Subset Subset] for details.
* By default, <u>RDS Analyst</u> stores all your files in your Desktop. To change this, just choose ''Set Working Directory'' from the ''File'' menu.
* By default, <u>RDS Analyst</u> stores all your files in your Desktop. To change this, just choose ''Set Working Directory'' from the ''File'' menu.
* If you want to bring the ''Data Viewer'' to be the front window, you can select it under the ''Window'' menu. This works for the ''Console'' or any other window.
* If you want to bring the ''Data Viewer'' to be the front window, you can select it under the ''Window'' menu. This works for the ''Console'' or any other window.
Line 346: Line 435:
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.CorrelationExample1 What is the relationship between reported height and weight and actual height and weight? (Correlation)]
* [http://ifellows.ucsd.edu/pmwiki/pmwiki.php?n=Main.CorrelationExample1 What is the relationship between reported height and weight and actual height and weight? (Correlation)]
-
=HPMRG Development Section (It will be deleted from the public version)=
+
== Credits for <u>RDS Analyst</u> ==
 +
 
 +
<u>RDS Analyst</u> is based on the [http://www.rforge.net/Deducer Deducer] software (written by Ian Fellows). We also use the Java based R GUI [http://jgr.markushelbig.org/JGR.html JGR] which is closely integrated with [http://www.rforge.net/Deducer Deducer]. These guys deserve most of the credit for what we see.
 +
 
 +
== Common installation problems and bugs ==
 +
 
 +
* If the program crashes, try to delete the preferences file in your home directory. It is called .JGRprefrc
 +
* If the program crashes on first use on a Macintosh, it is possible that Java (e.g. Java Runtime ENvironment (JRE)) is not installed. If this is so, see the note above in the Installation section on Java. Or you can just (re) install it directly. To do this go to http://support.apple.com/kb/DL1515.
 +
 
 +
== References==
-
== The svn ==
+
* [http://www.fort.usgs.gov/BRDScience/LearnR.htm Online course in R] (using RCmdr, but still very relevant).
-
* The svn has:
+
* An article on [http://stat-computing.org/newsletter/issues/scgn-16-2.pdf JGR]
-
** RDSgui: The R package for the GUI
+
* A [http://hpmrg.org/files/deducer_uclaa_011210.pdf presentation] on Deducer by Ian Fellows.
-
** org: The JAVA code needed  for the GUI (used by RDSgui)
+
-
** RDSdevelopment: The RDS package (development version)
+
-
* The Deducer website is [http://www.deducer.org/ http://www.deducer.org/]
+
-
* A {{deducer_uclaa_011210.pdf presentation}} on Deducer by Ian Fellows.
+
-
* What is needed to compile Deducer?
+
-
** [http://www.murdoch-sutherland.com/Rtools/ Get Rtools for Windows] Remember to let the Rtools Installer optionally edit your PATH variable as follows:
+
-
***PATH=c:\Rtools\bin;c:\Rtools\perl\bin;c:\Rtools\MinGW\bin;c:\R\bin;<others>
+
-
** Edit .cshrc to add the line "setenv NOAWT 1"
+
-
** [http://ifellows.ucsd.edu/pmwiki/index.php?n=Main.BuildingDeducerFromTheSource org and Deducer from svn]. '''Get this from our svn as it has been edited'''
+

Current revision as of 20:17, 2 July 2013

Contents

Introduction

RDS Analyst (RDS-A) is a software package for the analysis of Respondent-driven sampling (RDS) data that implements recent advances in statistical methods.

RDS Analyst has an easy-to-use graphical user interface to the powerful and sophisticated capabilities of the computer package R. RDS Analyst provides a comprehensive framework for working with RDS data, including tools for sample and population estimations, testing, confidence intervals and sensitivity analysis.

Example capabilities are an easy format for entering data, the visualization of recruitment chains, regression modeling, and missing data.

The interface of RDS Analyst is similar to SPSS. RDS Analyst is also a free, easy to use, alternative to proprietary data analysis software such as SPSS, STATA, SAS/JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an Excel-like spreadsheet in which to view and edit data.

RDS Analyst is meant for users who want to use state-of-the-art techniques for estimation and quantification of uncertainty from data collected via RDS. It represents advanced, comprehensive and open-source software to visualize, model and conduct sensitivity analyzes for RDS data.

RDS Analyst is an intuitive, cross-platform graphical data analysis system for the analysis of RDS data. It uses menus and dialogs to guide the user efficiently through the data manipulation and analysis process, and has an Excel-like spreadsheet for easy data frame visualization and editing. It is also the front-end to the very powerful capabilities accessible via the R command-line interface and also the extensive capabilities of the R statistical language.

Current State

  • It is an betaversion and is under continuous development and should not be distributed. Any requests for copies should be directed to the Hard-to-Reach Population Methods Research Group (HPMRG) [Lisa, Krista, Cori, Ian, and Mark]. This is because this version has advanced code in it and will have bad bugs. We want to support the program and do not want immature versions floating around that give it a bad name and which we will not be able to stamp out.
  • The current version number is 0.1.

Basic facts

  • RDS Analyst is written for the R statistical environment.
  • The current development form is for Windows and Macintosh. A LINUX version will be available in installers when it is released publicly.

The purpose of the initial beta test stage is to:

  • Find basic installation and running problems in real-world environments.
  • See if the GUI design will work for your RDS users. How can we improve on it?
  • See if the current (basic) features work as you expect.
  • Suggest features that we can add in future versions.

Installation

Installation on an Windows PC

The installer is at:

http://hpmrg.org/software/RDSAnalystSetup.0.1.exe

Download the install and double-click on it to install the software. To get your installation to the latest version of the packages, use the updater at:

http://hpmrg.org/software/RDSAnalystUpdater.0.1.exe

and for the latest and secure version of Java:

http://hpmrg.org/software/jre-7u21-windows-i586.exe

This can install all programs and utilities needed. If you already have some elements installed you can deselect (or cancel) during the installs. It is recommended that you install this all the first time. This installer is over 110Mb in size and will take time to download.

A reboot is not required. You do not need to uninstall any components to update (This includes R and Java). However the RDS Analyst application or the R application must not be running when you update.

You need the Java Runtime Environment to use RDS Analyst. If you get the message A JRE has been found. Do you want to install another one anyway?, it means that Java is already installed. In this case, click No so as to not reinstall it.

After you install, you should keep your installation to the latest version of the packages by downloading the updater:

http://hpmrg.org/software/RDSAnalystUpdater.0.1.exe

Note for experienced users: This creates a private version of R for RDS Analyst to use and ensures RDS Analyst has the right version of R available for its use. If you already have R installed separately, the two versions will peacefully coexist and you can use the other version of R just as you were originally.

Finally, be sure to sign up for the RDS Analyst Users Group.

Installation on an Apple Macintosh

There is a version for Apple Macintosh computers. They must have Intel CPUs (i.e., be purchased post-2006). To install:

A reboot is not required. You do not need to uninstall any components to update (This includes R and Java). However the RDS Analyst application or the R application must not be running when you update.

RDS Analyst uses Java to work. The last update for Mac OS X, Mountain Lion (aka 10.8) available since July 2012 does not come with Java pre-installed. You can check to see if you have java installed at http://javatester.org/version.html If you need to install Java, it is http://hpmrg.org/software/jre-7u21-macosx-x64.dmg

The RDS Analyst application and R will be in your Applications folder. To run RDS Analyst, double-click on it in the Applications folder.


After you install, you should use the updater to keep your installation to the latest version of the packages:

This just installs the core packages (that is, anything that has changed since the full install was made). It will typically be a few Mb in size.

Note: To use RDS Analyst, you need Java installed on your Mac. If you are using Mac OS X 10.6 and below, Apple's Java comes pre-installed. If you are using Mac OS X 10.7 (Lion) or Mac OS X 10.8 (Mountain Lion) and above then Java is not pre-installed. To install Java version 6, open the "Java Preferences.app" located in the Applications > Utilities folder on your Mac. It will ask if you want to install Java if it is not already there. Accept its invitation.


Finally, be sure to sign up for the RDS Analyst Users Group.

Starting RDS Analyst

To start RDS Analyst, select the RDS Analyst menu under Programs, and select the RDS Analyst program there. Alternatively, if you installed the desktop icon or the Quick Launch icon (the default), you can double-click on one of these to start the program. It will start the graphical-user interface. It may take a minute to do this but it is fast once loaded.

The package has two main windows. The primary one is titled the "Console" window. To do statistical analysis you choose the menus on top of the Console. The Console also records a log of all the commands and output produced from them. The other window is titled the Data Viewer. This is where the data are displayed and can be edited. The "Data Viewer" provides an easy to use, spreadsheet-like environment to view and edit data. Copy and pasting is supported, and is compatible with Excel 2003/2007, so data can be moved from Excel by simply copying it to the data viewer. Contextual menus can also be used to insert, delete and copy rows and columns.

Once the application starts up, you may need to load the Data Viewer window if it is not visible. In the "Console" window, go to "Packages & Data" menu item, select "Package Manager, and click the boxes next to "DeducerRDSAnalyst" under the "load" and "default" columns (two boxes will need to be clicked). This creates the "Data Viewer". Subsequent openings of the "RDS Analyst" will have the "Data Viewer" window open from the beginning and this only needs to be done the first time.

The first thing to do is to let RDS Analyst know where the directory where the data is and where the files associated with the project will be stored. This is typically referred to as the "working" directory, and the program will read and save the files there by default.

To do this, select the Set Working Directory menu item from the Console File menu. Then, typically, choose the directory where the data is stored.

Quick start demo

There is a video tutorial to get you started. You can run it from the button on the "Data Viewer" window or directly at

http://neolab.stat.ucla.edu/cranstats/RDSAnalyst_tutorial.mov

Below is a step-by-step tutorial in words.

Reading the NY Jazz dataset from RDSAT

  • Select the Open Data menu item from the File menu.
  • Use the dialog boxes to select the file nyjazz.rdsat from the directory "RDS Analyst Example Data Sets" which is on your "Desktop". (Hint: If it does not appear, (in Windows) look in "C:\Program Files\RDS Analyst\R-2.15.0\library\RDSdevelopment\extdata". For Mac, look for "~/workspace/org/bin/RDSdevelopment/inst/extdata/nyjazz.rdsat" with possible adjustment depending on the name and location of your workspace.
  • The data is read in as a RDS format and with a (rds) preface to indicate it is an RDS data set (rather than just a regular spread sheet, say).

Looking and editing the data in the spread sheet (the Data Viewer)

  • Go to the Data Viewer window and note that the (rds) nyjazz data set is selected in the Data Set menu (center of the window pane). If you load more than one data set you can select the one to view here.
  • Here you can look at the data in "spread-sheet format". There is more information on the "Data Viewer" here
  • On top of the spread-sheet are three tabs:
    • "Data View" which is the current spread-sheet view
    • "Variable View" which summarizes the variables and their properties.
    • "RDS" where you can add information about the data set (such as estimates of the population size). It is OK to go with the default values for now.
  • Click on the Variable View tab. Click the value for Gender.MF. under the Type column and select Factor value. Repeat for Race.WBO., Airplay.yn., and Union.yn.. This makes sure that the program recognizes them as categorical variables.
  • Click on the RDS tab. Click the value for Mid box in the Population Size Estimate section. Enter an estimate of the population size for the NY jazz population. While this is unclear, a possible value is 20000. It is used in the computations, but they are somewhat insensitive to it (as long as it is not close to the sample size).

Looking at the data (the Plots menu)

  • Select Plot Recruitment Tree from the Plots menu. Select Run. The plot will appear in a new graphics window (called "JavaGD (2)", ha ha). This plots the recruitment trees, labeling each node by its ID. There are options in the dialog box you can play with to get different views. Select the "File" menu from the "JavaGD (2)" window to save the file as a PDF, JPG, etc.
  • Select Recruitment Diagnostics from the Console Plots menu. Unselect the Recruitment tree box and select the Network size by wave box. Click Run. This is a plot of the network size distributions by the wave number of the respondent.

Looking at the data (the Sample menu)

This is for exploratory data analysis.

  • Select Contingency Tables from the Sample menu.
  • Select Gender.MF. as the Row and Race.WBO. as the Column
  • Click the Run button
  • Look in the Console window for the results. On top of the console window there are two tabs Console view and Element View. The Console view gives a continuous view of the commands sent to the RDA Analyst engine. The Element View shows the results of each command separately, and is usually better. Click on the Element View tab.
  • The results are in a contingency table output followed by a test of independence of the two factors.

Analyzing the data (the Population menu)

  • Select Frequency Estimates from the Population menu.
  • Select Gender.MF. as the Variables using the arrow buttons.
  • Click the Run button.
  • Look in the Console window for the results. This uses a computationally-intensive algorithm to compute the confidence interval and will take 20 seconds or more to return output to the Console window. The output is a summary table with the point estimate and the 95% confidence intervals. It also reports the design effect, standard error and the sample size (the last is adjusted for any missing values. For the nyjazz data set 21 of the 264 respondents did not report a network size).

Saving the data

  • To save the data and any edits to it in a file, choose Save Data from the File menu in Console. You can save it anywhere. It will save it with a .rdsobj so that next time you read it in the program will know that it is an RDS data set and not ask you to reenter all the information again.

Saving the results

  • To save the results in a file, choose Save from the File menu in Console, and make sure Results is selected from the Options:. You will need to add an extension to the file name, and we suggest txt (e.g., the file name rdsat_simple.txt). This should be open with, e.g., WordPad under Windows as it is a simple text file.
  • To save the commands used to create the output in a file, choose Save from the File menu in Console, and make sure Commands is selected from the Options:.
  • To save the complete output (results interspersed with the commands that produced them) in a file, choose Save from the File menu in Console, and make sure Complete output is selected from the Options:.

Getting started (seriously)

Once the application starts up, focus on the (top) Data Viewer window. This is where you read in the data. The other window is Console where most of the analysis takes place. It also records a log of all the commands and output. It can be ignored for now.

The Windows

The "Console" window has three parts to it. The panel along the left side is a navigation panel. You can use it to quickly go from one statistical output to another. You can also remove output that you don't want by selecting it in the navigation panel and pressing theRemove button at the bottom of the navigation panel. The panel along the bottom is a command console. You can type in commands in the R language here. The panel in the upper right, which takes up most of the screen, is the output window. By default, RDS Analyst commands that are submitted for execution show up as red text and the generated output shows up as blue text. There are two tabs at the top of the output panel. Console view shows all the output, like one long ream of paper. Element view shows you just a single output element at a time, without any red RDS Analyst commands. You can click between these to get a view of the last entered and the history of the commands and output.

Loading RDS data

RDS Analyst can read in a wide range of data formats from other packages including SPSS (*.sav), SAS export (*.xpt), and Excel (via *.xls and other native Excel formats). It can also read in Comma separated (*.csv) files. For a general description of this, see Open RDS Data Set.

It also directly reads in RDSAT files (they should be text files to be recognized automatically or else the correct variable names will not show up).

If it is not an RDSAT file, RDS Analyst expects the data to be in a "spread-sheet" format containing the RDS survey data with recruitment information. This should represent valid RDS survey data. The sheet must have one row for each respondent (i.e. case), and columns for each survey response variable. In addition, the recruitment information can be specified in two ways:

  • Coupon format: Basically like RDSAT (whereby the first column is comprised of serial numbers, the second column is the network size data, the third column is the participant id numbers and the third, fourth and fifth columns are generally the recruitment coupons) but without the two header lines.
  • Recruiter ID format: Here it expects columns with the following names:
    • id: A column of integers giving unique ids for each respondent (i.e., row of the spreadsheet).
    • recruiter.id: A column of ids indicating the recruiter for that respondent (that is id). Recruiters can be identified by elements of id or as 0 for seeds.

When you read in a RDS data file you will be presented with a screen to Load RDS Data dialog. This enables you to specify the coupon variables, the subject ID, the network size variable in the data, etc. This is essential to identify the RDS data set. You can also (optionally) set the maximum number of coupons, the population size estimate and other data set characteristics before the RDS data set is ready for use. When you are done, click Run" to create the data set. It will appear in the Data Viewer.

The package creates a spread-sheet with other information in it, like the recruiter.id: a column of integers giving the id of the recruiter for the respondent in that row. A value of 0 means the person was a seed. It also creates a variable called seed which is the id of the seed recruiter for the respondent in that row (that is, the recruiter of the recruiter, etc, until you find the wave 0 seed recruiter). It also creates a variable called wave which is the wave for the respondent in that row (that is, the number of recruiters one must go back to to find the seed for that person. The wave of a seed is 0. These variables are stored with the data set and can be analysed like other variables, including subset selection in estimation.

  • If you read in a RDSAT file the RDS data set is created automatically as it already has the necessary information in it. If you use a CSV file (e.g., from Excel) you will need to specify the type of formatting in the file in the dialog that is presented.
  • For files in Coupon format, you will need to select the coupon and network variables. If the coupon variables follow the id and network size variable in column order then you only need to specify the maximum number of coupons. The program computes the recruiter.id variable for this information.
  • For files in Recruiter id format, you will need to select the id, recruiter.id and network size variables.
  • If the spreadsheet is called foo (say), the RDS data set is called foo and will appear with the (rds) prefix to let you know it is an RDS data set.
  • You can read in and analyse multiple data sets at the same time. After you have read in the data, use the menu item under the Data Set options in the dialoges to specify the RDS data set you will be working on.

Saving RDS data

RDS Analyst can save the data sets created in a wide range of data formats. We recommend it be saved as in the internal data format for R for easy later reading (*.rda).

For a description of this see Save RDS Data Set.

The Data Viewer window

The RDS Analyst Console window has menus (at the top) for the basic capabilities of the package and a data viewer (below the top) for looking at our data.

You can move between the two windows by clicking on them, or by using the Window menu in the Console" to (re)open the Data Viewer.

The Data Viewer provides an easy to use, spreadsheet-like environment to view and edit RDS data (or in fact, any spreadsheet data you load). Copy and pasting is supported, and is compatible with Excel 2003/2007, so data can be moved from Excel to R by simply copying it to the data viewer. Contextual menus are used to insert, delete and copy rows and columns.

If there are any data frames loaded in the R session, they can be viewed by selecting them from the Data Set list. Data can be loaded into the R session by clicking Open RDS Data Set button in the top left hand corner. The Currently viewed data set can be saved using the Save RDS Data Set button directly to the right of the Open RDS Data Set button. The currently viewed data set can be removed from the R session by clicking the button in the upper right.

The data viewer has two modes Data view and Variable view which can be freely switched between them using the tabs. The Variable view enables you to edit the variable types of the data read in. Categorical variables (including binary variables) should be set to type Factor by clicking on their entry and selecting it from the menu.

For details see data viewer.

Important: When a menu item is chosen it opens a dialog box where the variables are selected, options are set and the computation is done by clicking the Run button. This creates output in the Console window (you will need to click over to it to see it). These results and output can be saved at any point (typically at the end of the session) by using the Save command (see below).

The menu structure of the Console window

Most of the work is done using these menu. The menus on the top of the Console window are:

  • File:
  • Edit:
  • Workspace:
  • Data:
  • Sample:
  • Population:
  • Plots:
  • Packages & Data:
  • Window:
  • Help:

File Menu

  • Open a data set and save the current data set in a file.
  • The Open Data menu item is the primary way to read in an RDS data set from a text, CSV or other data file.
  • Create, open or save text files using "New Document", "Open Document" and "Save", respectively. This is a simple editor for text files such as files of R commands or notes on the analysis.
  • Set the default place RDS Analyst looks for data and saves output using the Set Working Directory item.
  • Quit the program.

Edit Menu

  • Copy, Cut and Paste text
  • Undo and Redo
  • Search and Find
  • Increase or decrease the font size
  • Preferences: To change font, set the default "working" directory where files are looked for and saved, etc. There is a separate panel for RDS Analyst specific features called Data Viewer. This enables you to change the defaults (like if you are using the basic or professional version).

Workspace Menu

Objects that you create during an RDS Analyst session are held in computer memory. The collection of objects that you currently have is called the workspace. This workspace is not saved on disk unless you tell RDS Analyst to do so. This means that your objects are lost when you close the program and not save the objects, or worse when the program or your system crashes on you during a session.

When you exit RDS Analyst, you will be asked if you want to save your workspace. This will allow you to have the same data sets available the next time you start RDS Analyst. This will help to resume work on a project at the same point.

You can open (previously saved) and save workspaces from this menu. So if you have multiple projects you can save the entire workspace for each project in a separate file. Then open them from this menu.

The Clear all item empties the workspace (that is, removes all objects). So you can Clear all items and then open a complete workspace you have saved before.

Note that the Opened workspace is added to the current one. So if you only want the original files you should Clear all first.

Data Menu

This is to recode and modify the data in the data viewer. This means you do not have to go back to SPSS, SAS or Excel to recode, etc.

The term "factor" in the underlying R engine designates a categorical variable. In general factors are nominal but we use factors to represent both ordinal and nominal variables. By labeling a variable as a factor, RDS Analyst will treat it appropriately when analyzing it.

Click on the links below to get help on the following capabilities:

  • Edit Factor: Add or subtract the values of a categorical variable.
  • Recode Variables: You can recode variables into variables with new names. From the Recode Variables dialog, select the recode you want to re-target from the Variables to Recode list (e.g. sex -> sex). Then click on the Target button on the right. That will let you type in the name of the new variable (e.g., sexMF). The recode with show something like sex -> sexMF. The original "sex" variable will be unchanged and "sexMF" will appear in the Data Viewer window.
  • Transform the variables: These can be very complex, if needed.
  • Reset Row Names
  • Sort
  • Edit Meta Data: Specify here the characteristics of the RDS data like the maximum number of coupons, the network size variable, the missing data symbol, and population size estimates.
  • Convert to RDS: This is a direct and manual way to (re) convert a data set into an RDS data set. It is rarely used (as the program does this automatically) but can be useful if the automatic method missed something.
  • If you read in data from a CSV or other spreadsheet file (i.e., not an RDSAT data file) and edit it to make it an RDS data set then an RDS data set is formed from the current state of the spread sheet. If the spreadsheet is called foo (say), the RDS data set is also titled foo.

Sample Menu

This is for exploratory analysis of the RDS data. It is used to describe the characteristics of the sample.

It deals with continuous data, categorical data and descriptive data.

  • Frequencies: Tables of one or more variables, possibly stratified by others (like SPSS)
  • Descriptives: This produces sample summaries such as means, medians, quantiles, standard deviations and extremes.
  • Contingency Tables: Cross tabs. They include tests (which are dubious because of the dependence).
  • Recruitment Homophily: Compute a homophily measure for recruitment process. Do respondents differential recruit people like themselves? That is, the homophily on a variable in the recruitment chains. Take as an example HIV status. In this case, it is the ratio of number of recruits that have the same HIV status as their recruiter to the number we would expect if there was no homophily on HIV status. The difference with the Population Homophily (see below) is that this is in the recruitment chain rather than the population of social ties. For example, of the recruitment homophily on HIV status is about 1, we see little effect of recruitment homophily on HIV status (as the numbers of homophilous pairs are close to what we would expect by chance).

Frequencies, Descriptives, Contingency Tables do not use RDS weights to compute population estimates (but rather are samples averages of the RDS data). The other entries act as if the data are an independent sample and are deliberately not RDS aware.


The Population menu

This computes estimates of the population characteristics based on the RDS sample. Specifically, the procedures use RDS weights to compute population estimates (rather than samples averages of the RDS data).

It deals with continuous data, categorical data and descriptive data and complements the Sample menu that describes the sample.

  • Frequency Estimates: Tables of one or more variables, possibly stratified by others (like SPSS).
  • Descriptive Estimates: This produces estimates of the population means, medians, quantiles, standard deviations and extremes.
  • Population Crosstabs: Cross tabulations of categorical variables. They include tests (which are dubious because of the dependence).
  • Population Test Diffference in Proportions: Test the hypothesis that two population proportions are equal.

The basic descriptive, cross-tabs and frequencies use the current weights. These are stored in the data and added whenever estimates are computed. The default is the Gile sequential sampling (SS) weights.

The most important entry is the Frequency Estimates entry which computes confidence intervals for population proportions. The default method is Gile's SS estimator. The confidence interval is computed using Gile's bootstrap method. This is a computationally-intensive procedure and can take a minute or longer to complete.

  • Estimate the Homophily in the Population for a given variable: Consider, for example HIV status. Population homophily is the homophily in the HIV status of two people who are tied in the underlying population social network (a "couple"). Specifically, the population homophily is the ratio of the expected number of HIV discordant couples absent homophily to the expected number of HIV discordant couples with the homophily. Hence larger values of population homophily indicate more homophily on HIV status. For example, a value of 1 means the couple are random with respect to HIV status. A value of 2 means there are twice as many HIV discordant couples as we would expect if there was no homophily in the population. This measure is meaningful across different levels of differential activity. As we do not see most of the population network, we estimate the population homophily from the RDS data. As an example, suppose the population homophily on HIV is 0.75 so there are 25% more HIV discordant couples than expected due to chance. So their is actually heterophily on HIV in the population. If the population homophily on sex is 1.1, there are 10% more same-sex couples than expected due to chance. Hence there is modest homophily on sex.
  • Estimate the Differential Activity in the Population: This is the ratio of the mean network size for those with the outcome to the mean network size of those without it.

The Plots menu

This is to look at the RDS data with an eye to using it to estimate population characteristics. This looks at diagnostics of the sampling and possible seed dependency.

  • Plot Recruitment Tree: Produces a publication quality graphics plot of the recruitment tree.
  • Recruitment Diagnostics: Produces a publication quality graphics of various diagnostics:
    • Bar plot of the number of recruits by wave
    • Scatter plot of the network size verses wave
    • Bar Chart of the number of recruits from each seed
    • Histogram of the number of recruits for each respondent
    • Boxplots by wave, seed, etc:
  • Plot Builder: An interface to create both simple and sophisticated plot from the data. It includes pie charts, histograms, barplots, scatter plots, bubble plots, and many options. As part of Plot Builder, you can import an existing template for the plot ("Import Template"), open an exisiting plot ("Open Plot"), or work interactively with existing plot primary forms ("quick and interactive").

The dialogs retain their current settings and produce a publication quality PDF plots.

Packages & Data Menu

  • Data Viewer enables you to re-open the Data Viewer and so look at the RDS data sets (and other data frames).
  • Object Browser allows you to edit and view any "objects" in the workspace, such as RDS data sets, spreadsheets (data.frames), functions, etc.
  • "GUI Add-ons" are additional packages that may be helpful, such text handling, capabilities for spatial analysis, etc.
  • "Package Manager" enables you to load or unload additional packages from those installed and set which packages will be loaded at startup.
  • "Package Installer" enables you to install additional packages for the underlying R engine over the internet. There are a vast array of packages and capabilities available for the analysis of (RDS) data.
  • Example RDS data sets: These are example RDS data sets that come from populations with known charactistics.
    • Example: faux: An artificial data set of size 389, starting from a single seed. It has two categorical outcome variables.
    • Example: fauxmadrona: A data set of size 500 drawn from a population of size 1000. The outcome variable is disease status. The 10 seeds are randomly drawn.
    • Example: fauxsycamore: A data set of size 500 drawn from a population of size 715. The outcome variable is disease status. The 10 seeds are drawn from the infected population, so there is extreme dependency induced by seed selection.

To get information on the data sets, choose RDS Analyst Reference Manual from the Help menu. The first time it is used it will start a browser for help (which will take a few seconds to load). From there, you can enter any term in the "Search" box and search all the documentation for it. Try e.g., "fauxmadrona".

Window Menu

  • We use this to go between the Console and Data Viewer windows and also to choose a graphics window.
  • Lists the currently open windows to choose from.
  • You can go here to choose a window to bring to the front to work on.

Help Menu

  • To get help, choose R help from the Help menu. The first time it is used it will start a browser for help (which will take a minute to load).
  • You can also get help on many of the common menu items and options via Deducer Help.
  • You can also get help an introduction on the program via RDS Analyst Introduction Manual (This manual).
  • The RDS Analyst Reference Manual provides the details of all the statistical routines that under the program. It is searchable and detailed.
  • The Citation information provides the details of how to cite the software in your papers. Please do this!

On the Macintosh version there is a "JGR" menu on the left-hand side. Under it there is:

  • Preferences: To change font, set the default "working" directory where files are looked for and saved, etc. There is a separate panel for RDS Analyst specific features called Data Viewer. This enables you to change the defaults (like if you are using the basic or professional version).
  • Quit the program.

Saving the results, the output and/or the batch commands that produced them!

When a dialog box is Run it creates output in the Console window.

To save the results in a file (typically at the end of a session), choose Save Console from the File menu in Console, and make sure Results is selected from the Options:.

To save the commands used to create the output in a file (typically at the end of a session), choose Save Console from the File menu in Console, and make sure Commands is selected from the Options:.

To save the complete output (results interspersed with the commands that produced them) in a file (typically at the end of a session), choose Save Console from the File menu in Console, and make sure Complete output is selected from the Options:.

Example data

We have three data sets already stored within R and the example data file from RDSAT (nyjazz.rdsat) as examples.

  • To find out about the faux, fauxmadrona, and fauxsycamore data sets, just use help (see above) and search for them by name. There are manual pages on them :-)

The nyjazz.rdsat file is stored in your desktop under the directory RDS Analyst Example Data Sets . It can be opened from the Open RDS Data Set dialog box (for example). It is the same as the RDSAT file nyjazz.txt with the extension changed to .rdsat so the package will recognize it automatically.

Getting Help from within the package

To get help, choose RDS Analyst Introduction Manual from the Help menu. The first time it is used it will start a browser with the RDS Analyst manual page on the hpmrg.org wiki (which will take a few seconds to load).

  • You can enter any term in the "Search" box and search all the documentation for it. Try e.g., "plot".

To get help on the graphical user interface, choose Deducer Help from the Help menu. From here you can search also.

To get help on R, choose R Help from the Help menu.

  • Click on an item to get help about R and to get started with R. The "An Introduction to R" is a particularly useful reference for beginners.
  • Click on the packages tab to get specific help on packages (like RDS).

To get help, choose RDS Analyst Reference Manual from the Help menu.

  • You can enter any term in the "Search" box and search all the documentation for it. Try e.g., "RDS-II".
  • Click on the "00Index" tab and it will list all the function in the "RDS" package. Click on an item to get help.
  • Click on the packages tab to get specific help on packages (like RDS).
  • Click on the DeducerRDSAnalyst package to get help with the commands underlying some of the menus (except the RDS menu).
  • Click on the RDSdevelopment package to get help with the commands underlying the 'menus.
    • This gives help on the data sets.
    • Click on e.g., RDS.I.estimates to get help on the RDS-I estimate function.

To get help on R, choose R Help from the Help menu.

  • Click on an item to get help about R and to get started with R. The "An Introduction to R" is a particularly useful reference for beginners.
  • Click on the packages tab to get specific help on packages (like RDS).

Tips and FAQ

  • Many dialogs have a Subset option. Use this to analyse a subset of the data set with the cases of your choice included. In the Subset box enter an expression for those you want to retain (e.g., HIV < 2) and click Run. This will analyse the data set for only the retained cases. See Subset for details.
  • By default, RDS Analyst stores all your files in your Desktop. To change this, just choose Set Working Directory from the File menu.
  • If you want to bring the Data Viewer to be the front window, you can select it under the Window menu. This works for the Console or any other window.
  • If you close the Data Viewer, you can reopen it under the Packages & Data menu.

Tutorials

These are pages created by Ian Fellows that illustrate simple exploratory analyzes.

Credits for RDS Analyst

RDS Analyst is based on the Deducer software (written by Ian Fellows). We also use the Java based R GUI JGR which is closely integrated with Deducer. These guys deserve most of the credit for what we see.

Common installation problems and bugs

  • If the program crashes, try to delete the preferences file in your home directory. It is called .JGRprefrc
  • If the program crashes on first use on a Macintosh, it is possible that Java (e.g. Java Runtime ENvironment (JRE)) is not installed. If this is so, see the note above in the Installation section on Java. Or you can just (re) install it directly. To do this go to http://support.apple.com/kb/DL1515.

References

Personal tools