Workshop

From HPMRG

(Difference between revisions)
Jump to: navigation, search
(Discussion)
(RDS Analyst Software)
Line 32: Line 32:
[[RDS_Analyst_Manual RDS Analyst manual]], including installation instructions.
[[RDS_Analyst_Manual RDS Analyst manual]], including installation instructions.
 +
 +
The data is at:
 +
 +
C:\Program Files\RDS Analyst\R-2.11.1\library\RDSdevelopment\extdata
== Papers ==
== Papers ==

Revision as of 17:49, 16 June 2010

Contents

Workshop on Respondent-driven Sampling Analyst Software

JUNE 15 AND 16, 2010

Venue: University of California, San Francisco. 50 Beale Street, Suite 1200 (12th Floor)

Sponsor: Centers for Disease Control and Prevention Global AIDS Program, Surveillance Branch

Description

RDS is a relatively new methodology used worldwide to gather HIV prevalence and risk factors data from hard to reach populations. In this workshop, the Hidden Population Methods Research Group (HPMRG) is pleased to introduce a new comprehensive, user friendly and open-source software package for the analysis of RDS Data. The new software, RDS Analyst (RDS-A), includes a user friendly point-and-click graphical user interface allowing for the computation of new and existing estimators and standard errors, visualization of recruitment chains, and diagnostic analysis. It allows for the analysis of multiple variables at once, and the saving and re-use of syntax. For more technical users, the package may also be accessed through a command line interface to the open-source R programming language (http://www.r-project.org/).

The purpose of this 2-day workshop is to introduce RDS-A to researchers already experienced in RDS methodology and statistics. Participants will receive training on the RDS-A state-of-the-art analysis and graphic functions and will be asked to provide feedback in the interest of improving the software prior to more widespread distribution among users of RDS.

This workshop is designed as an introduction to the analysis of RDS data using RDS-A.

It will cover the full RDS-A suite of functions. This begins with data entry and loading data, coding missing data, and re-coding variables. It then treats descriptive and diagnostic methods including visualization methods, followed by existing and new tools for estimation, testing models, confidence intervals and sensitivity analysis. The workshop concludes with an introduction to the re-usable syntax and R command line capabilities of the software.

Workshop participants currently working with RDS data will be encouraged to bring these data, and evaluate them using RDS-A.

The workshop will be open to researchers in epidemiology, social and behavioral sciences with experience using RDS methodology, theory and statistics. You will need to bring a lap top.

The workshop is free, however all travel and other expenses are covered by the participant. We will be forward an agenda and any other pertinent information once you register.

Outline

Outline as PDF

Presentations

RDS Analyst Software

RDS_Analyst_Manual RDS Analyst manual, including installation instructions.

The data is at:

C:\Program Files\RDS Analyst\R-2.11.1\library\RDSdevelopment\extdata

Papers

Notes taken during the day

Sampling: A Review

Sampling M&Ms

  • Screen of the table: The goal is to determine the proportion of red in each bag?
  • Kitchen side of the table: The goal is to determine the proportion of orange in each bag?

The four bags have: 10%, 20%, 25%, 30%

Four bags:

Screen Kitchen

-

A = E

-

C =

-

B =

-

H =

Repeated sampling without replacement. Sample

Screen: 0%, 75%, 0%, 25% Kitchen: 0%, 75%, 0%, 25%

Fundamentals of Sampling

Sampling sizes: 300-500 Population sizes: SFO: 60K Africa: 2-3K

In USA usually choose cities with large at risk populations On key behaviors the sample proportion can be 50%

Discussion

  • How to define homophily? Homophily is a very general term (like "cluster", or "dependence"). There is a definition in RDSAT, Gile's thesis and in RDS-A. How many do we need? Which is best for which circumstances.
  • For summary statistics, homophily should be measured on readily observable characteristics, even though the key homophily measure is that on the outcome variable (e.g., disease status).
  • How many coupons? One rule is 3-5 per respondent. How should this be determined?
  • Importance of a simulation study of confidence intervals based on Salganik and Gile's SS bootstrap procedures
  • How to determine the number of friends a respondent knows that are HIV positive?
  • Non-preferential distribution of coupons is an assumption of current estimation methods. What diagnostics can we compute or develop for it?
  • In RDSAT, dual-component computes individualized weights for export to another program to allow use in more advanced statistics.
    • The multiplicity estimator is the Salganik-Heckathorn (RDS-I) estimator.
  • What are the needs and issues with regression methods for RDS data?
  • Suppose the sample proportion was low, but there is differential activity. How will the various estimates perform?
  • How does sampling with or without replacement influence variance? Differing results between Salganik & Goel and Gile's results. Why is this?
  • Real-world issues that might effect simulations. For example, people always respond, and coupons are all returned.
    • In the real-world the coupon return rate is about 30%.
  • Is there information in the secondary interview in the question about if your coupon was refused because others were in the study?
    • Krista and Lisa are working on related issues
  • What about very small population sizes (relative to the sample size)? When the sample fraction is large, the SS estimator is still appropriate as it is primarily developed to address large sample fraction effects.
    • If it is very small, then RDS can be used as a data collection method rather than using the sampling mechanism as a basis for inference.
  • For the model-assisted method, for are the standard errors computed? How do the standard errors compare to that of the design-based estimators.
    • The true standard errors appear to be smaller (as shown in the presentation).

Issues noted during practice

  • Loading data
    • When you load data and when you Run, the "edit RDS data set attributes" window closes automatically: this is confusing for new users. Suggestion to allow people to close the window itself
    • Suggestion to change the default # of bootstraps from 100 to 2000 in the Gile's SS.
    • Suggestion to add random seed procedure
    • When you resize the deducer the text do not line up in boxes
    • Specify that the 95% CI is two tailed next to box
  • Saving working dataset from original
    • Always assign the .rds - the program assigns the .robj itself
    • To save dataset, make sure it is done from the data window
    • When re-opening saved dagta, a message comes up and asks if you want to save, say cance.
    • When re-opening, check to make sure it saved attributes (in variable view in)
  • RDSAT (5.6+) v RDS R
    • Add an option to impute degree for similar estimators (the Heckatorn dual component estimator sets degree to the average for that partition if it is missing or zero.)
    • Add an eval of degree: distribution, median, mean, number of missing or zero. (there are usually more than one degree questions--sometimes you have to evaluate which one is best b/c relative degree is more important than using the one that was meant to be used: implementation is commonly imperfect)
  • Commands
    • for descriptives, add a button that will allow you to add all stats at once instead of adding one at a time
    • Contingency tables: the %s in the bottom line are confusing. See sas or spss to see how they are formatted.
    • Some people's computers froze when they jittered: the suspicion is that the problem is resizing the window, not the jittering
    • we talked about how much jittering to get, but don't know how to set the option (low priority)
    • option in jittering "pairs" or "pairsplot"


  • Homophily
    • Currently homophily works for dichotomous variables only
    • Output is confusing: suggestion to add a table with row %s (similar to transition probabilities) in rdsat 5.6+
    • Estimated Population Homophily v sample (or recruitment) homphily
  • Plots
    • Backslash - forward slash reversed when specifying file
    • Add legend to output (include coloring for missing)
    • There is a bug in windows 7
    • Add homophily and/or efficacy in output?
  • Unique ID
    • Make sure ID does not have to be in order
    • Explain option to specify unique ID
    • Does "own coupon" have to be missing for seeds (they may have a survey number whether they were seeds or not)? It makes it easier for quality control in recruits to make sure the original numbers are retained.
  • Which estimators (VH, SS, MA, etc) work better under which circumstances. What rules-of-thumb can we have about when each estimator works well and when it does not?
  • A common phenomena is to see a few seed produce long-chains while most die out. How should this be adjusted for? (Should it be adjusted for?)
  • Big question: What diagnostics should we run to believe that we can compute valid population estimates from a particular population data set?

Long-term software goals

  • Add population simulation capability to provide a "virtual laboratory" within which to assess changes in the sample design
  • Add a RDS simulation capability to allow virtual or real populations to be sampled repeatedly so as to assess different sample designs and estimation methods
  • With this you can, for example,:
    • Do "power computations"
    • What is the best number of coupons to use?
  • Add an option to set a random number "seed" value. Where to put it?
  • issues with the format of the Sample -> Tables
  • Jitter on Scatterplots sometimes causes a crash when the window is resized. It is likely a problem with JavaGD
  • In sample homophily table, use the nice RDSAT format for it. Especially the seeds. See "Recruitment tab".
  • In plot recruitment, make node sizes constant. Also an option for node size to be proportional to degree.


  • Data smoothing. For Salganik estimator? For reported degrees - as measurement error? (Amy Drake - don't do it automatically. In surveilance: may end up treating differently over time.
  • Carl Kendall: in the future, will move toward asking people for people they can name.
  • Carl Kendall: letting people report a number asks for unreliability. But proportionality to degree a bit easier to trust.
  • Some folks have seen differences in point estimates with different network size measures. Not differences outside confidence intervals.
  • Carl Kendall: reality needs to be much smaller than some of huge numbers we see.
  • Secondary incentive, how were coupons distributed. How many people could have recruited them.
  • Overseas, high secondary incentives returned
  • Domestic low return for secondary incentives
  • Liz: cognitively test network size questions. Input? IDU, MSM, het.
Personal tools