Workshop

From HPMRG

(Difference between revisions)
Jump to: navigation, search
(Long-term software goals)
(39 intermediate revisions not shown)
Line 1: Line 1:
-
=== Workshop on Respondent-driven Sampling Analyst Software ===
+
'''UPCOMING WORKSHOPS'''
-
JUNE 15 AND 16, 2010
+
-
'''Venue''':  University of California, San Francisco.  50 Beale Street, Suite 1200 (12th Floor)
 
-
'''Sponsor''':  Centers for Disease Control and Prevention Global AIDS Program, Surveillance Branch
+
'''RDS Analyst'''
 +
October 26-30, 2015 in Zagreb, Croatia: http://www.whohub-zagreb.org/902
-
==Description==
+
'''RDS Analyst'''
 +
November 30 to December 4, 2015 in Kiev, Ukraine
-
RDS is a relatively new methodology used worldwide to gather HIV prevalence and risk factors data from hard to reach populations.  In this workshop, the Hidden Population Methods Research Group (HPMRG) is pleased to introduce a new comprehensive, user friendly and open-source software package for the analysis of RDS Data.  The new software, RDS Analyst (RDS-A), includes a user friendly point-and-click graphical user interface allowing for the computation of new and existing estimators and standard errors, visualization of recruitment chains, and diagnostic analysis. It allows for the analysis of multiple variables at once, and the saving and re-use of syntax.  For more technical users, the package may also be accessed through a command line interface to the open-source R programming language (http://www.r-project.org/).
+
'''RDS Methods and RDS Analyst'''
 +
January 4-8, 2016 in New Orleans, Louisiana, USA.
 +
Contact: lsjohnston.global@gmail.com
-
The purpose of this 2-day workshop is to introduce RDS-A to researchers already experienced in RDS methodology and statistics.  Participants will receive training on the RDS-A state-of-the-art analysis and graphic functions and will be asked to provide feedback in the interest of improving the software prior to more widespread distribution among users of RDS.  
+
'''RDS Methods and RDS Analyst'''
 +
February 29-March 4, 2016 at ECPR Winter School Bamberg, Germany: http://ecpr.eu/
-
This workshop is designed as an introduction to the analysis of RDS data using RDS-A.
 
-
It will cover the full RDS-A suite of functions.  This begins with data entry and loading data, coding missing data, and re-coding variables.  It then treats descriptive and diagnostic methods including visualization methods, followed by existing and new tools for estimation, testing models, confidence intervals and sensitivity analysis. The workshop concludes with an introduction to the re-usable syntax and R command line capabilities of the software. 
 
-
Workshop participants currently working with RDS data will be encouraged to bring these data, and evaluate them using RDS-A.
+
=== Past Workshops on RDS Analyst Software ===
 +
March 17 - 21, 2014
-
The workshop will be open to researchers in epidemiology, social and behavioral sciences with experience using RDS methodology, theory and statistics.  You will need to bring a lap top. 
+
'''Venue''':  WHO Collaborating Centre for HIV Surveillance
 +
School of Public Health “Andrija Stampar”
 +
Rockefellerova 4
 +
10 000 Zagreb, Croatia
-
The workshop is free, however all travel and other expenses are covered by the participant. We will be forward an agenda and any other pertinent information once you register.
+
'''Sponsor''': WHO Collaborating Centre for HIV Surveillance
-
== Outline ==
+
==Description==
-
[http://hpmrg.org/files/rdsaoutline.pdf Outline as PDF]
+
-
== Presentations ==
+
RDS is a relatively new methodology used worldwide to gather HIV prevalence and risk factors data from hard to reach populations. In this workshop, the Hard-to-Reach Population Methods Research Group (HPMRG) is pleased to introduce a new comprehensive, user friendly and open-source software package for the analysis of RDS Data. The new software, RDS Analyst (RDS-A), includes a user friendly point-and-click graphical user interface allowing for the computation of new and existing estimators and standard errors, visualization of recruitment chains, and diagnostic analysis. It allows for the analysis of multiple variables at once, and the saving and re-use of syntax. For more technical users, the package may also be accessed through a command line interface to the open-source R programming language (http://www.r-project.org/).
-
* [http://hpmrg.org/files/RDSAIntroduction.pdf Introduction]
+
-
* [http://hpmrg.org/files/SamplingFundamentalsPresentation.pdf Sampling: A Brief Review], including the [http://hpmrg.org/files/FiguresforSamplingFundamentals.pdf figures].
+
-
== RDS Analyst Software ==
+
The purpose of this three-day workshop is to introduce RDS-A to researchers already experienced in RDS methodology and statistics.  Participants will receive training on the RDS-A state-of-the-art analysis and graphic functions and will be asked to provide feedback in the interest of improving the software prior to more widespread distribution among users of RDS.   
-
[[RDS_Analyst_Manual RDS Analyst manual]], including installation instructions.
+
This workshop is designed as an introduction to the analysis of RDS data using RDS-A.
-
The data is at:
+
It will cover the full RDS-A suite of functions.  This begins with data entry and loading data, coding missing data, and re-coding variables.  It then treats descriptive and diagnostic methods including visualization methods, followed by existing and new tools for estimation, testing models, confidence intervals and sensitivity analysis. The workshop concludes with an introduction to the re-usable syntax and R command line capabilities of the software. 
-
C:\Program Files\RDS Analyst\R-2.11.1\library\RDSdevelopment\extdata
+
Workshop participants currently working with RDS data will be encouraged to bring these data, and evaluate them using RDS-A.
-
== Papers ==
+
The workshop will be open to researchers in epidemiology, social and behavioral sciences with experience using RDS methodology, theory and statistics.  If possible please bring a lap top. if you can not you should be able to share with someone who does.
-
* [http://hpmrg.org/files/gilehandcockSM2010.pdf ''Respondent-Driven Sampling: An Assessment of Current Methodology''] by Krista J. Gile and Mark S. Handcock. To appear in ''Sociological Methodology'', 2010.
+
The workshop is free, however all travel and other expenses are covered by the participant. We will be forward an agenda and any other pertinent information once you register.
-
= Notes taken during the day =
+
== Outline of the Workshop ==
 +
[http://hpmrg.org/files/rdsaoutline2013.pdf Outline as PDF]
-
==Sampling: A Review ==
+
== Presentations ==
 +
* [http://hpmrg.org/files/I.pdf I. Introduction and Logistics]
 +
* [http://hpmrg.org/files/II.pdf II. Overview of RDS from a Statistical Perspective]
 +
* [http://hpmrg.org/files/III.pdf III. Structure of the RDS-A program]
 +
* [http://hpmrg.org/files/IV.pdf iV. Basic Analysis]
 +
* [http://hpmrg.org/files/V.pdf V. Preparing data for uploading to RDS-A]
 +
* [http://hpmrg.org/files/VI.pdf VI. Exploratory Data Analysis and Diagnostics]
 +
* [http://hpmrg.org/files/VII.pdf VII. Estimation of population proportions and means]
 +
* [http://hpmrg.org/files/VIII.pdf VIII. Estimation of uncertainty: confidence intervals and standard errors]
 +
* [http://hpmrg.org/files/IX.pdf IX. Data Issues: Recoding, transformations, missing data]
 +
* Step-by-step anatomy of a full RDS analysis
-
=== Sampling M&Ms ===
+
Background Materials
 +
* [http://hpmrg.org/files/SamplingFundamentalsPresentation.pdf Sampling: A Brief Review], including the  [http://hpmrg.org/files/FiguresforSamplingFundamentals.pdf figures].
-
* Screen of the table: The goal is to determine the proportion of red in each bag?
+
== RDS Analyst Software ==
-
 
+
-
* Kitchen side of the table: The goal is to determine the proportion of orange in each bag?
+
-
 
+
-
The four bags have: 10%, 20%, 25%, 30%
+
-
 
+
-
Four bags:
+
-
{|
+
-
|    || Screen||  Kitchen  ||
+
-
-
+
-
|A = ||      ||  E      ||
+
-
-
+
-
|C = ||      ||          ||
+
-
-
+
-
|B = ||      ||          ||
+
-
-                 
+
-
|H = ||      ||          || 
+
-
|}
+
-
 
+
-
Repeated sampling without replacement. Sample
+
-
 
+
-
Screen: 0%, 75%, 0%, 25%
+
-
Kitchen:  0%, 75%, 0%, 25%
+
-
 
+
-
=== Fundamentals of Sampling ===
+
-
 
+
-
Sampling sizes: 300-500
+
-
Population sizes:
+
-
SFO: 60K
+
-
Africa: 2-3K
+
-
 
+
-
In USA usually choose cities with large at risk populations
+
-
On key behaviors the sample proportion can be 50%
+
-
 
+
-
= Discussion =
+
-
 
+
-
* How to define homophily? Homophily is a very general term (like "cluster", or "dependence"). There is a definition in RDSAT, Gile's thesis and in RDS-A. How many do we need? Which is best for which circumstances.
+
-
* For summary statistics, homophily should be measured on readily observable characteristics, even though the key homophily measure is that on the outcome variable (e.g., disease status).
+
-
 
+
-
* How many coupons? One rule is 3-5 per respondent. How should this be determined?
+
-
 
+
-
* Importance of a simulation study of confidence intervals based on Salganik and Gile's SS bootstrap procedures
+
-
 
+
-
* How to determine the number of friends a respondent knows that are HIV positive?
+
-
**
+
-
 
+
-
* Non-preferential distribution of coupons is an assumption of current estimation methods. What diagnostics can we compute or develop for it?
+
-
 
+
-
* In RDSAT, dual-component computes individualized weights for export to another program to allow use in more advanced statistics.
+
-
** The multiplicity estimator is the Salganik-Heckathorn (RDS-I) estimator.
+
-
 
+
-
* What are the needs and issues with regression methods for RDS data?
+
-
 
+
-
* Suppose the sample proportion was low, but there is differential activity. How will the various estimates perform?
+
-
 
+
-
* How does sampling with or without replacement influence variance? Differing results between Salganik & Goel and Gile's results. Why is this?
+
-
 
+
-
* Real-world issues that might effect simulations. For example, people always respond, and coupons are all returned.
+
-
** In the real-world the coupon return rate is about 30%.
+
-
 
+
-
*Is there information in the secondary interview in the question about if your coupon was refused because others were in the study?
+
-
** Krista and Lisa are working on related issues
+
-
 
+
-
* What about very small population sizes (relative to the sample size)? When the sample fraction is large, the SS estimator is still appropriate as it is primarily developed to address large sample fraction effects.
+
-
** If it is very small, then RDS can be used as a data collection method rather than using the sampling mechanism as a basis for inference.
+
-
 
+
-
* For the model-assisted method, for are the standard errors computed? How do the standard errors compare to that of the design-based estimators.
+
-
** The ''true'' standard errors appear to be smaller (as shown in the presentation).
+
-
 
+
-
Issues noted during practice
+
-
*Loading data
+
-
**When you load data and when you Run, the "edit RDS data set attributes" window closes automatically: this is confusing for new users.  Suggestion to allow people to close the window itself
+
-
**Suggestion to change the default # of bootstraps from 100 to 2000 in the Gile's SS.
+
-
**Suggestion to add random seed procedure
+
-
**When you resize the deducer the text do not line up in boxes
+
-
**Specify that the 95% CI is two tailed next to box
+
-
 
+
-
*Saving working dataset from original
+
-
**Always assign the .rds - the program assigns the .robj itself
+
-
**To save dataset, make sure it is done from the data window
+
-
**When re-opening saved dagta, a message comes up and asks if you want to save, say cance.
+
-
**When re-opening, check to make sure it saved attributes (in variable view in)
+
-
 
+
-
*RDSAT (5.6+) v RDS R
+
-
**Add an option to impute degree for similar estimators (the Heckatorn dual component estimator sets degree to the average for that partition if it is missing or zero.)
+
-
**Add an eval of degree: distribution, median, mean, number of missing or zero.  (there are usually more than one degree questions--sometimes you have to evaluate which one is best b/c relative degree is more important than using the one that was meant to be used: implementation is commonly imperfect)
+
-
 
+
-
*Commands
+
-
**for descriptives, add a button that will allow you to add all stats at once instead of adding one at a time
+
-
**Contingency tables: the %s in the bottom line are confusing.  See sas or spss to see how they are formatted.
+
-
**Some people's computers froze when they jittered: the suspicion is that the problem is resizing the window, not the jittering
+
-
**we talked about how much jittering to get, but don't know how to set the option (low priority)
+
-
**option in jittering "pairs" or "pairsplot"
+
-
 
+
-
 
+
-
*Homophily
+
-
**Currently homophily works for dichotomous variables only
+
-
**Output is confusing: suggestion to add a table with row %s (similar to transition probabilities) in rdsat 5.6+
+
-
**Estimated Population Homophily v sample (or recruitment) homphily
+
-
 
+
-
*Plots
+
-
**Backslash - forward slash reversed when specifying file
+
-
**Add legend to output (include coloring for missing)
+
-
**There is a bug in windows 7
+
-
**Add homophily and/or efficacy in output?
+
-
 
+
-
*Unique ID
+
-
**Make sure ID does not have to be in order
+
-
**Explain option to specify unique ID
+
-
**Does "own coupon" have to be missing for seeds (they may have a survey number whether they were seeds or not)?  It makes it easier for quality control in recruits to make sure the original numbers are retained.
+
-
 
+
-
*Which estimators (VH, SS, MA, etc) work better under which circumstances. What rules-of-thumb can we have about when each estimator works well and when it does not?
+
-
 
+
-
*A common phenomena is to see a few seed produce long-chains while most die out. How should this be adjusted for?  (Should it be adjusted for?)
+
-
*Big question: What diagnostics should we run to believe that we can compute valid population estimates from a particular population data set?
+
[http://www.deducer.org/pmwiki/pmwiki.php?n=Main.RDSAnalyst RDS Analyst manual and download], including installation instructions.
-
= Long-term software goals =
+
== Background Papers ==
-
* Add population simulation capability to provide a "virtual laboratory" within which to assess changes in the sample design
+
* [http://arxiv.org/abs/0904.1855 ''Respondent-Driven Sampling: An Assessment of Current Methodology''] by Krista J. Gile and Mark S. Handcock. Pre-print of ''Sociological Methodology'', 40, p 285-327, 2010.
-
* Add a RDS simulation capability to allow virtual or real populations to be sampled repeatedly so as to assess different sample designs and estimation methods
+
-
* With this you can, for example,:
+
-
** Do "power computations"
+
-
** What is the best number of coupons to use?
+
-
* Add an option to set a random number "seed" value. Where to put it?
+
-
* issues with the format of the Sample -> Tables
+
-
* Jitter on Scatterplots sometimes causes a crash when the window is resized. It is likely a problem with JavaGD
+
-
* In sample homophily table, use the nice RDSAT format for it. Especially the seeds. See "Recruitment tab".
+
-
* In plot recruitment, make node sizes constant. Also an option for node size to be proportional to degree.
+
 +
== Announcements and Discussion Forum ==
-
* Data smoothing.  For Salganik estimator?  For reported degrees - as measurement error?  (Amy Drake - don't do it automatically.  In surveilance: may end up treating differently over time.
+
[http://lists.stat.ucla.edu/mailman/listinfo/rdsanalyst_help/ Subscribe to the list]
-
* Carl Kendall:  in the future, will move toward asking people for people they can name.
+
-
* Carl Kendall:  letting people report a number asks for unreliability. But proportionality to degree a bit easier to trust. 
+
-
* Some folks have seen differences in point estimates with different network size measures.  Not differences outside confidence intervals.
+
-
* Carl Kendall:  reality needs to be much smaller than some of huge numbers we see. 
+
-
* Secondary incentive, how were coupons distributed.  How many people could have recruited them. 
+
-
* Overseas, high secondary incentives returned
+
-
*In Interval Estimates the values of the values are recoded to 0 and 1 rather than using the original values (bug)
+
-
* Domestic low return for secondary incentives
+
-
* Liz: cognitively test network size questions.  Input?  IDU, MSM, het.
+
-
* RDS-II weights seem to be the inverse of what is needed.
+
== Notes taken during the workshop ==
-
* Remove scientific notation in descriptive tables, and elsewhere
+
-
* Read Rds to lowercase
+

Revision as of 02:17, 17 October 2015

UPCOMING WORKSHOPS


RDS Analyst October 26-30, 2015 in Zagreb, Croatia: http://www.whohub-zagreb.org/902

RDS Analyst November 30 to December 4, 2015 in Kiev, Ukraine

RDS Methods and RDS Analyst January 4-8, 2016 in New Orleans, Louisiana, USA. Contact: lsjohnston.global@gmail.com

RDS Methods and RDS Analyst February 29-March 4, 2016 at ECPR Winter School Bamberg, Germany: http://ecpr.eu/


Contents

Past Workshops on RDS Analyst Software

March 17 - 21, 2014

Venue: WHO Collaborating Centre for HIV Surveillance School of Public Health “Andrija Stampar” Rockefellerova 4 10 000 Zagreb, Croatia

Sponsor: WHO Collaborating Centre for HIV Surveillance

Description

RDS is a relatively new methodology used worldwide to gather HIV prevalence and risk factors data from hard to reach populations. In this workshop, the Hard-to-Reach Population Methods Research Group (HPMRG) is pleased to introduce a new comprehensive, user friendly and open-source software package for the analysis of RDS Data. The new software, RDS Analyst (RDS-A), includes a user friendly point-and-click graphical user interface allowing for the computation of new and existing estimators and standard errors, visualization of recruitment chains, and diagnostic analysis. It allows for the analysis of multiple variables at once, and the saving and re-use of syntax. For more technical users, the package may also be accessed through a command line interface to the open-source R programming language (http://www.r-project.org/).

The purpose of this three-day workshop is to introduce RDS-A to researchers already experienced in RDS methodology and statistics. Participants will receive training on the RDS-A state-of-the-art analysis and graphic functions and will be asked to provide feedback in the interest of improving the software prior to more widespread distribution among users of RDS.

This workshop is designed as an introduction to the analysis of RDS data using RDS-A.

It will cover the full RDS-A suite of functions. This begins with data entry and loading data, coding missing data, and re-coding variables. It then treats descriptive and diagnostic methods including visualization methods, followed by existing and new tools for estimation, testing models, confidence intervals and sensitivity analysis. The workshop concludes with an introduction to the re-usable syntax and R command line capabilities of the software.

Workshop participants currently working with RDS data will be encouraged to bring these data, and evaluate them using RDS-A.

The workshop will be open to researchers in epidemiology, social and behavioral sciences with experience using RDS methodology, theory and statistics. If possible please bring a lap top. if you can not you should be able to share with someone who does.

The workshop is free, however all travel and other expenses are covered by the participant. We will be forward an agenda and any other pertinent information once you register.

Outline of the Workshop

Outline as PDF

Presentations

Background Materials

RDS Analyst Software

RDS Analyst manual and download, including installation instructions.

Background Papers

Announcements and Discussion Forum

Subscribe to the list

Notes taken during the workshop

Personal tools