Workshop

From HPMRG

Revision as of 00:06, 16 June 2010 by Handcock (Talk | contribs)
Jump to: navigation, search

Contents

Workshop on Respondent-driven Sampling Analyst Software

JUNE 15 AND 16, 2010

Venue: University of California, San Francisco. 50 Beale Street, Suite 1200 (12th Floor)

Sponsor: Centers for Disease Control and Prevention Global AIDS Program, Surveillance Branch

Description

RDS is a relatively new methodology used worldwide to gather HIV prevalence and risk factors data from hard to reach populations. In this workshop, the Hidden Population Methods Research Group (HPMRG) is pleased to introduce a new comprehensive, user friendly and open-source software package for the analysis of RDS Data. The new software, RDS Analyst (RDS-A), includes a user friendly point-and-click graphical user interface allowing for the computation of new and existing estimators and standard errors, visualization of recruitment chains, and diagnostic analysis. It allows for the analysis of multiple variables at once, and the saving and re-use of syntax. For more technical users, the package may also be accessed through a command line interface to the open-source R programming language (http://www.r-project.org/).

The purpose of this 2-day workshop is to introduce RDS-A to researchers already experienced in RDS methodology and statistics. Participants will receive training on the RDS-A state-of-the-art analysis and graphic functions and will be asked to provide feedback in the interest of improving the software prior to more widespread distribution among users of RDS.

This workshop is designed as an introduction to the analysis of RDS data using RDS-A.

It will cover the full RDS-A suite of functions. This begins with data entry and loading data, coding missing data, and re-coding variables. It then treats descriptive and diagnostic methods including visualization methods, followed by existing and new tools for estimation, testing models, confidence intervals and sensitivity analysis. The workshop concludes with an introduction to the re-usable syntax and R command line capabilities of the software.

Workshop participants currently working with RDS data will be encouraged to bring these data, and evaluate them using RDS-A.

The workshop will be open to researchers in epidemiology, social and behavioral sciences with experience using RDS methodology, theory and statistics. You will need to bring a lap top.

The workshop is free, however all travel and other expenses are covered by the participant. We will be forward an agenda and any other pertinent information once you register.

Outline

Outline as PDF

Presentations

RDS Analyst Software

RDS_Analyst_Manual RDS Analyst manual, including installation instructions.

Papers

Notes taken during the day

Sampling: A Review

Sampling M&Ms

  • Screen of the table: The goal is to determine the proportion of red in each bag?
  • Kitchen side of the table: The goal is to determine the proportion of orange in each bag?

The four bags have: 10%, 20%, 25%, 30%

Four bags:

Screen Kitchen

-

A = E

-

C =

-

B =

-

H =

Repeated sampling without replacement. Sample

Screen: 0%, 75%, 0%, 25% Kitchen: 0%, 75%, 0%, 25%

Fundamentals of Sampling

Sampling sizes: 300-500 Population sizes: SFO: 60K Africa: 2-3K

In USA usually choose cities with large at risk populations On key behaviors the sample proportion can be 50%

Discussion

  • How to define homophily? Homophily is a very general term (like "cluster", or "dependence"). There is a definition in RDSAT, Gile's thesis and in RDS-A. How many do we need? Which is best for which circumstances.
  • For summary statistics, homophily should be measured on readily observable characteristics, even though the key homophily measure is that on the outcome variable (e.g., disease status).
  • How many coupons? One rule is 3-5 per respondent. How should this be determined?
  • Importance of a simulation study of confidence intervals based on Salganik and Gile's SS bootstrap procedures
  • How to determine the number of friends a respondent knows that are HIV positive?
  • Non-preferential distribution of coupons is an assumption of current estimation methods. What diagnostics can we compute or develop for it?
  • In RDSAT, dual-component computes individualized weights for export to another program to allow use in more advanced statistics.
    • The multiplicity estimator is the Salganik-Heckathorn (RDS-I) estimator.
  • What are the needs and issues with regression methods for RDS data?
  • Suppose the sample proportion was low, but there is differential activity. How will the various estimates perform?
  • How does sampling with or without replacement influence variance? Differing results between Salganik & Goel and Gile's results. Why is this?
  • Real-world issues that might effect simulations. For example, people always respond, and coupons are all returned.
    • In the real-world the coupon return rate is about 30%.
  • Is there information in the secondary interview in the question about if your coupon was refused because others were in the study?
    • Krista and Lisa are working on related issues
  • What about very small population sizes (relative to the sample size)? When the sample fraction is large, the SS estimator is still appropriate as it is primarily developed to address large sample fraction effects.
    • If it is very small, then RDS can be used as a data collection method rather than using the sampling mechanism as a basis for inference.
  • For the model-assisted method, for are the standard errors computed? How do the standard errors compare to that of the design-based estimators.
    • The true standard errors appear to be smaller (as shown in the presentation).

Long-term software goals

  • Add population simulation capability to provide a "virtual laboratory" within which to assess changes in the sample design
  • Add a RDS simulation capability to allow virtual or real populations to be sampled repeatedly so as to assess different sample designs and estimation methods
  • With this you can, for example,:
    • Do "power computations"
    • What is the best number of coupons to use?
  • Add an option to set a random number "seed" value. Where to put it?
  • issues with the format of the Sample -> Tables
  • Jitter on Scatterplots sometimes causes a crash when the window is resized. It is likely a problem with JavaGD
  • In sample homophily table, use the nice RDSAT format for it. Especially the seeds. See "Recruitment tab".
  • In plot recruitment, make node sizes constant. Also an option for node size to be proportional to degree.
Personal tools