RDS Analyst Manual

From HPMRG

Revision as of 12:21, 16 March 2011 by Lsjohnston (Talk | contribs)
Jump to: navigation, search

Contents

Introduction

RDS Analyst (RDS-A) is a software package for the analysis of Respondent-driven sampling (RDS) data that implements recent advances in statistical methods.

RDS Analyst has an easy-to-use graphical user interface to the powerful and sophisticated capabilities of the computer package R. RDS Analyst provides a comprehensive framework for working with RDS data, including tools for sample and population estimations, testing, confidence intervals and sensitivity analysis.

Example capabilities are an easy format for entering data, the visualization of recruitment chains, regression modeling, and missing data.

The interface of RDS Analyst is similar to SPSS. RDS Analyst is also a free, easy to use, alternative to proprietary data analysis software such as SPSS, STATA, SAS/JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an Excel-like spreadsheet in which to view and edit data.

RDS Analyst is meant for users who want to use state-of-the-art techniques for estimation and quantification of uncertainty from data collected via RDS. It represents advanced, comprehensive and open-source software to visualize, model and conduct sensitivity analyzes for RDS data.

RDS Analyst is an intuitive, cross-platform graphical data analysis system for the analysis of RDS data. It uses menus and dialogs to guide the user efficiently through the data manipulation and analysis process, and has an Excel-like spreadsheet for easy data frame visualization and editing. It is also the front-end to the very powerful capabilities accessible via the R command-line interface and also the extensive capabilities of the R statistical language.

Current State

  • It is an alpha version and is under continuous development and should not be distributed. Any requests for copies should be directed to the Hidden Population Methods Research Group (HPMRG) [Lisa, Krista, Cori and Mark]. This is because this version has advanced code in it and will have bad bugs. We want to support the program and do not want immature versions floating around that give it a bad name and which we will not be able to stamp out.
  • The current version number is 0.09.

Basic facts

  • RDS Analyst is written for the R statistical environment.
  • The current development form is for Windows and Macintosh. A LINUX version will be available in installers when it is released publicly.

The purpose of the initial alpha test stage is to:

  • Find basic installation and running problems in real-world environments.
  • See if the GUI design will work for your RDS users. How can we improve on it?
  • See if the current (basic) features work as you expect.
  • Suggest features that we can add in versions post the workshop.

Installation

Installation on an Windows PC

The installer is at:

http://hpmrg.org/software/RDSAnalystSetup.0.09.exe

Download the install and double-click on it to install the software.

This can install all programs and utilities needed. If you already have some elements installed you can deselect (or cancel) during the installs. It is recommended that you install this all the first time. This installer is over 84Mb in size and will take time to download.

Subsequently, you can download this updater to keep your installation to the latest version of the packages:

http://hpmrg.org/software/RDSAnalystUpdater.0.09.exe

This just installs the core packages (that is, anything that has changed since the full install was made). It will typically be a few Mb in size. The very first time you install please install both Setup and Updater (in that order). The release will not do this but it just saves time in the downloads.

A reboot is not required. You do not need to uninstall any components to update (This includes R and Java). However the RDS Analyst application or the R application must not be running when you update.

Note for experienced users: You should install R with this package (even if you already have R installed separately). This creates a private version of R for RDS Analyst to use and ensures RDS Analyst has the right version of R available for its use. The two versions will peacefully coexist and you can use the other version of R just as you were originally.

Installation on an Apple Macintosh

There is now a version for Apple Macintosh computers. They must have Intel CPUs (i.e., be purchased post-2006). To install:

A reboot is not required. You do not need to uninstall any components to update (This includes R and Java). However the RDS Analyst application or the R application must not be running when you update.

The RDS Analyst application and R version 2.12.1 will be in your Applications folder. To run RDS Analyst, double-click on it in the Applications folder.


In the future, you can use an updater to keep your installation to the latest version of the packages:

This just installs the core packages (that is, anything that has changed since the full install was made). It will typically be a few Mb in size.


Starting RDS Analyst

To start RDS Analyst, select the RDS Analyst menu under Programs, and select the RDS Analyst program there. Alternatively, if you installed the desktop icon or the Quick Launch icon (the default), you can double-click on one of these to start the program. It will start the graphical-user interface. It may take a minute to do this but it is fast once loaded.

Once the application starts up, you need to load the Data Viewer window. In the "Console" window, go to "Packages & Data" menu item, select "Package Manager, and click the boxes next to "DeducerRDSAnalyst" under the "load" and "default" columns (two boxes will need to be clicked). This creates the "Data Viewer". Subsequent openings of the "RDS Analyst" had the "Data Viewer" window open from the beginning and this only needs to be done the first time.

Now focus on the (top) Data Viewer window. This is where the data are displayed and can be edited. To do statistical analysis you choose the menus on top of the other window (the Console). The Console also records a log of all the commands and output produced from them.

Quick start demo

There is a video tutorial to get you started. You can run it from the button on the "Data Viewer" window or directly at

http://neolab.stat.ucla.edu/cranstats/RDSAnalyst_tutorial.mov

Below is a step-by-step tutorial in words.

Reading the NY Jazz dataset from RDSAT

  • Select the Open Data menu item from the Data Viewer File menu.
  • Use the dialog boxes to select the file nyjazz.rdsat from the directory C:\Program Files\RDS Analyst\R-2.12.1\library\RDSdevelopment\extdata.
  • This will open up the Edit RDS Data Set Attributes window where you can add information about the data set (such as estimates of the population size). You can just click Run to go to OK the default values for now. The data is read in with an .rds extension to indicate it is an RDS data set (rather than just a regular spread sheet, say).

Looking and editing the data in the spread sheet (the Data menu)

  • Go to the Data Viewer window and select the nyjazz.rds data set from the Data Set menu (center of the window pane)
  • Click on the Variable View tab. Click the value for Gender.MF. under the Type column and select Factor value. Repeat for Race.WBO., Airplay.yn., and Union.yn.. This makes sure that the program recognizes them as categorical variables.

Running an RDS analysis (the Population menu)

  • Select Plot Recruitment Tree from the Diagnostics menu. Select Run. The plot will appear in your PDF viewer application.
  • Select Interval Estimates from the Population menu.
  • Select Gender.MF. as the Outcome Variables using the arrow buttons.
  • Click the Run button.
  • Look in the Console window for the results. This uses a computationally-intensive algorithm to compute the confidence interval and will take 60 seconds or more to return output to the Console window.

Exploratory analysis (the Sample menu)

  • Select Contingency Tables from the Sample menu.
  • Select Gender.MF. as the Row and Race.WBO. as the Column
  • Click the Run button
  • Look in the Console window for the results.

Saving the results

  • To save the results in a file, choose Save Console from the File menu in Console, and make sure Results is selected from the Options:. You will need to add an extension to the file name, and we suggest txt (e.g., the file name rdsat_simple.txt). This should be open with WordPad under Windows as
  • To save the commands used to create the output in a file, choose Save Console from the File menu in Console, and make sure Commands is selected from the Options:.
  • To save the complete output (results interspersed with the commands that produced them) in a file, choose Save Console from the File menu in Console, and make sure Complete output is selected from the Options:.

Getting started (seriously)

Once the application starts up, focus on the (top) Data Viewer window. This is where you read in the data. The other window is Console is where most of the analysis takes place. It also records a log of all the commands and output. It can be ignored for now.

Loading RDS data

RDS Analyst can read in a wide range of data formats from other packages including SPSS (*.sav), SAS export (*.xpt), and Excel (via Comma separated *.csv). For a general description of this, see Open RDS Data Set.

It also directly reads in RDSAT files (they need to be renamed with a *.rdsat extension to be recognized automatically or else the correct variable names will not show up).

If it is not an RDSAT file, RDS Analyst expects the data to be in a "spread-sheet" format containing the RDS survey data with recruitment information. This should represent valid RDS survey data. The sheet must have one row for each respondent (i.e. case), and columns for each survey response variable. In addition, the recruitment information can be specified in two ways:

  • Coupon format: Basically like RDSAT but without the two header lines.
  • Recruiter ID format: Here it expects columns with the following names:
    • id: A column of integers giving unique ids for each respondent (i.e., row of the spreadsheet).
    • recruiter.id: A column of ids indicating the recruiter for that respondent (that is id). Recruiters can be identified by elements of id or as 0 for seeds.

When you read in a RDS data file you will be presented with an screen to Edit RDS Data Set Attributes. This enables you to set the maximum number of coupons, the network size variable and other data set characteristics before the RDS data set is ready for use. The package creates a data frame with other information in it, like the recruiter.id: a column of integers giving the id of the recruiter for the respondent in that row. A value of 0 means the person was a seed.

  • If you read in a RDSAT file the RDS data set is created automatically as it already has the necessary information in it

. If you use a CSV file (e.g., from Excel) you will need to specify the type of formatting in the file in the dialog that is presented.

  • For files in Coupon format, you will need to select the coupon and network variables. If the coupon variables follow the id and network size variable in column order then you only need to specify the maximum number of coupons. The program computes the recruiter.id variable for this information.
  • For files in Recruiter id format, you will need to select the id, recruiter.id and network size variables.
  • If the spreadsheet is called foo (say), the RDS data set is called foo.rds. You should base all analysis on foo.rds.
  • Select the RDS Data Set: After you have read in the data, use the menu item under the Data menu to specify the RDS data set you will be working on. If you want to change RDS data sets, choose this menu item again to change it.

Saving RDS data

RDS Analyst can save the data sets created in a wide range of data formats. We recommend it be saved as in the internal data format for R for easy later reading (*.rda).

For a description of this see Save RDS Data Set.

The Data Viewer window

The RDS Analyst Console window has menus (at the top) for the basic capabilities of the package and a data viewer (below the top) for looking at our data.

You can move between the two windows by clicking on them, or by using the Window menu.

The Data Viewer provides an easy to use, spreadsheet-like environment to view and edit RDS data (or in fact, any spreadsheet data you load). Copy and pasting is supported, and is compatible with Excel 2003/2007, so data can be moved from Excel to R by simply copying it to the data viewer. Contextual menus are used to insert, delete and copy rows and columns.

If there are any data frames loaded in the R session, they can be viewed by selecting them from the Data Set list. Data can be loaded into the R session by clicking Open RDS Data Set button in the top left hand corner. The Currently viewed data set can be saved using the Save RDS Data Set button directly to the right of the Open RDS Data Set button. The currently viewed data set can be removed from the R session by clicking the button in the upper right.

The data viewer has two modes Data view and Variable view which can be freely switched between them using the tabs. The Variable view enables you to edit the variable types of the data read in. Categorical variables (including binary variables) should be set to type Factor by clicking on their entry and selecting it from the menu.

For details see data viewer.

Important: When a menu item is chosen it opens a dialog box where the variables are selected, options are set and the computation is done by clicking the Run button. This creates output in the Console window (you will need to click over to it to see it). These results and output can be saved at any point (typically at the end of the session) by using the Save Console command (see below).

The menu structure of the Console window

Most of the work is done using these menu. The menus on the top of the Console window are:

  • File:
  • Edit:
  • Workspace:
  • Data:
  • Sample:
  • Diagnostics:
  • Population
  • Packages & Data:
  • Window:
  • Help:

File Menu

  • Create a new data set, open a data set and save the current data set in a file.
  • The Open RDS Data Set menu item is the primary way to read in an RDS data set from a text, CSV or other data file.
  • Run a text file of R commands directly in R using the Source ... item.
  • Print a file
  • Quit

Edit Menu

  • Copy, Cut and Paste text
  • Preferences: To change font, set the default "working" directory where files are looked for and saved, etc. There is a separate panel for RDS Analyst specific features called Data Viewer. This enables you to change the defaults (like the location of the graphviz binary if it is installed in a non-standard place)

Workspace Menu

Objects that you create during an RDS Analyst session are held in computer memory. The collection of objects that you currently have is called the workspace. This workspace is not saved on disk unless you tell RDS Analyst to do so. This means that your objects are lost when you close R and not save the objects, or worse when R or your system crashes on you during a session.

When you exit RDS Analyst, you will be asked if you want to save your workspace. This will allow you to have the same data sets available the next time you start RDS Analyst. This will help to resume work on a project at the same point.

You can open (previously saved) and save workspaces from this menu. So if you have multiple projects you can save the entire workspace for each project in a separate file. Then open them from this menu.

The Clear all item empties the workspace (that is, removes all objects). So you can Clear all items and then open a complete workspace you have saved before.

Note that the Opened workspace is added to the current one. So if you only want the original files you should Clear all first.

Data Menu

This is to recode and modify the data in the data viewer. This means you do not have to go back to SPSS, SAS or Excel to recode, etc.

The term "factor" in R designates a categorical variable. In general factors are nominal but we use factors to represent both ordinal and nominal variables. By labeling a variable as a factor, RDS Analyst will treat it appropriately when analyzing it.

Click on the links below to get help on the following capabilities:

  • Edit Factor: Add or subtract the values of a categorical variable.
  • Recode Variables: You can recode variables into variables with new names. From the Recode Variables dialog, select the recode you want to re-target from the Variables to Recode list (e.g. "sex -> sex"). Then click on the Target button on the right. That will let you type in the name of the new variable (e.g., "sexMF"). The recode with show something like "sex -> sexMF". The original "sex" variable will be unchanged and "sexMF" will appear in the Data Viewer window.
  • Transform the variables: These can be very complex, if needed.
  • Reset Row Names
  • Sort
  • Merge Data
  • Transpose
  • Subset: Use this to create a version of the data set with the cases of your choice excluded. In the Subset Expression box enter an expression for those you want to retain (e.g., HIV < 2) and click OK. This will create a data set with a name with the suffix .sub. Now select this in the Data Viewer and run any procedure (e.g. Point Estimates) to see results only for the retained cases.
  • Select the RDS Data Set: Use this to specify the RDS data set you will be working on. If you want to change RDS data sets, choose this again to change it.
  • Edit RDS Data Set Attributes: Specify here the characteristics of the RDS data like the maximum number of coupons, the network size variable, the missing data symbol, and population size estimates.
  • If you read in data from a CSV or other spreadsheet file (i.e., not an RDSAT data file) and edit it to make it an RDS data set then an RDS data set is formed from the current state of the spread sheet. If the spreadsheet is called foo (say), the RDS data set is called foo.rds.

Sample Menu

This is for exploratory analysis of the RDS data. It is used to describe the characteristics of the sample.

It deals with continuous data, categorical data and descriptive data.

  • Frequencies: Tables of one or more variables, possibly stratified by others (like SPSS)
  • Descriptives: This uses RDS weights to compute population estimates (rather than samples averages).
  • Contingency Tables: Cross tabs. They include tests (which are dubious because of the dependence).
  • Scatter plots:
  • Homophily in Recruitment: Compute a homophily measure for recruitment.

Frequencies, Descriptives, Contingency Tables use RDS weights to compute population estimates (rather than samples averages of the RDS data). The other entries act as if the data are an independent sample and are not (yet) RDS aware.

The basic descriptive, cross-tabs and frequencies use the RDS weights. These are stored in the data.frame and added whenever estimates are computed. The default is the Gile sequential sampling (SS) weights.

The Diagnostics menu

This is to look at the RDS data with an eye to using it to estimate population characteristics. This looks at diagnostics of the sampling and possible seed dependency.

  • Plot Recruitment Tree: Produces a publication quality graphics plot of the recruitment tree.
  • Bar plot of the number of recruits by wave
  • Scatter plot of the respondent degree verses wave
  • Histogram of the number of recruits for each respondent
  • Bar Chart of the number of recruits from each seed
  • Make all Diagnostics: Does all of the above
  • Boxplots by wave, seed, etc:
  • Estimate the Homophily in the Population:
  • Estimate the Differential Activity in the Population:

The dialogs retain their current settings and produce a publication quality PDF plots.

The Population menu

This computes estimates of the population characteristics based on the RDS sample.

It deals with continuous data, categorical data and descriptive data and complements the Sample menu that describes the sample.

Frequencies, Descriptives, Contingency Tables use RDS weights to compute population estimates (rather than samples averages of the RDS data). The other entries act as if the data are an independent sample and are not (yet) RDS aware.

The basic descriptive, cross-tabs and frequencies use the current weights. These are stored in the data.frame and added whenever estimates are computed. The default is the Gile SS weights.

The most important entry is the Interval Estimates entry which computes confidence intervals for population proportions. The default method is Gile's SS estimator. The confidence interval is computed using Gile's bootstrap method. This is a computationally-intensive procedure and can take a minute or longer to complete.

Packages & Data Menu

  • This enables you to re-open the Data Viewer and Console windows.
  • This enable you to install additional packages for R and look at the packages currently loaded.
  • This also allows you to edit and view any "objects" in the workspace, such as RDS data sets (spreadsheets with the extension .rds), spreadsheets (data.frames), functions, etc.

Window Menu

  • We use this to go between the Console and Data Viewer windows and also to choose a graphics window.
  • Lists the currently open windows to choose from.
  • You can go here to choose a window to bring to the front to work on.

Help Menu

  • To get help, choose R help from the Help menu. The first time it is used it will start a browser for help (which will take a minute to load).
  • You can also get help on the RDS Analyst package here.

Saving the results, the output and/or the batch commands that produced them!

When a dialog box is Run it creates output in the Console window.

To save the results in a file (typically at the end of a session), choose Save Console from the File menu in Console, and make sure Results is selected from the Options:.

To save the commands used to create the output in a file (typically at the end of a session), choose Save Console from the File menu in Console, and make sure Commands is selected from the Options:.

To save the complete output (results interspersed with the commands that produced them) in a file (typically at the end of a session), choose Save Console from the File menu in Console, and make sure Complete output is selected from the Options:.

Example data

We have three data sets already stored within R and the example data file from RDSAT (nyjazz.rdsat) as examples.

  • To find out about the faux, fauxmadrona, and fauxsycamore data sets, just use help (see above) and search for them by name. There are manual pages on them :-)

The nyjazz.rdsat file is stored in your desktop under the directory RDS Analyst Example Data Sets . It can be opened from the Open RDS Data Set dialog box (for example). It is the same as the RDSAT file nyjazz.txt with the extension changed to .rdsat so the package will recognize it automatically.

Getting Help from within the package

To get help, choose R help from the Help menu. The first time it is used it will start a browser for help (which will take a few seconds to load).

  • Click on an item to get help about R and to get started with R. The "An Introduction to R" is a particularly useful reference for beginners.
  • Click on the packages tab to get specific help on packages (like RDS).
  • Click on the RDSgui package to get help with the commands underlying the menus (except the RDS menu).
  • Click on the RDSdevelopment package to get help with the commands underlying the RDS menu.
    • This gives help on the data sets.
    • Click on e.g., RDS.I.estimates to get help on the RDS-I estimate function.
  • To get the online manual for the package, choose RDS help from the Help menu in the Data Viewer window.

Tips and FAQ

  • By default, RDS Analyst stores all your files in your Desktop. To change this, just choose Set Working Directory from the File menu.
  • If you want to bring the Data Viewer to be the front window, you can select it under the Window menu. This works for the Console or any other window.
  • If you close the Data Viewer, you can reopen it under the Packages & Data menu.

Tutorials

These are pages created by Ian Fellows that illustrate simple exploratory analyzes.

Credits for RDS Analyst

RDS Analyst is based on the Deducer software (written by Ian Fellows). We also use the Java based R GUI JGR which is closely integrated with Deducer. These guys deserve most of the credit for what we see.

Common installation problems and bugs

  • Graphviz does not install or run? Check the location of the binary as specified in the Preferences -> Data Viewer menu.

References

Personal tools