UNAIDS Reference Group Consultation
From HPMRG
UNAIDS Reference Group Consultation on Population Size Estimation Based on Respondent-driven Sampling
Dates: June 9 - 10, 2014
Venue: University of Massachusetts –Amherst, hosted by the Statistics Department and Krista Gile
Who: The consultation is called by the UNAIDS Reference Group on Estimates, Modelling and Projections. We are inviting a diverse group of field implementers, end data users, and statisticians/mathematicians.
Goal: Obtain methods that may yield valid population size estimates using data from chain referral sampling. Current methods are largely considered “good enough” at best. We are looking for additions to the tool box, preferably better ones.
Key objectives:
- Determine whether the "sequential sampling for population size estimation" (SS-PSE) developed by Handcock-Gile-Mar method yields reasonable size estimates in comparison with existing methods with appropriate attention to assumptions and potential biases.
- Comparison and use of SS-PSE with methods that use additional data sources and collection (capture-recapture, multiplier, mapping).
- Discuss how the above method and other social network-based approaches may provide size estimations of populations that otherwise remain hidden from routine census efforts.
Background Issues
The epidemiologic and programmatic need to obtain reasonably accurate size estimates of key populations remains strong in the 3rd decade of the HIV epidemic and response. UNAIDS recently supported elucidation of the Network Scale-Up Method which is not yet widely implemented for a variety of reasons. Currently, the Global Fund, UNAIDS and others are working with MEASURE and University of Manitoba to develop a prototype protocol for programmatic mapping and size estimation. New work by Handcock and Gile develops a size estimator based on social network statistics that estimate the size of a network from incomplete network data. (See here for a manuscript). The estimator can be used in a very straightforward manner by nearly anyone who has implemented a respondent driven sampling survey.
Their methodology is implemented in the new RDS Analyst tool developed by Handcock, Fellows and Gile. The field results of this estimator have been analysed by Johnston and others, in comparison with multipliers and capture-recapture results from the same surveys. The estimator should be reviewed independently by the estimates, Projections and Modelling Reference Group, to get a sense of its validity and how it might be used by surveillance teams globally.
Preliminary Agenda
The preliminary agenda is here
Primary Presentations on SS-PSE
These are the presentation by Mark S. Handcock, Krista J. Gile and Corinne M. Mar describing the SS-PSE approach:
- Motivations and Objectives of the SS-Size method
- Overview of the SS-Size methodology
- Skepticism, Sensitivity and Policy Relevance
Software Used in the Presentations
R package for SS-PSE
- The software package size package for R by Handcock and Gile is available by emailing handcock@ucla.edu. This is not-publically-released source code, and is made available for the purposes of evaluation and verification.
- The authors of the size package believe that it is essential that software for statistical methods be made available with published papers. This is for a number of reasons. Primarily, most modern statistical methods are sufficiently complex to not be fully specified in a research paper. The software then is a primary means to complete the specification of the methodology to the level where it can be used in practice. Hence the software enables the methodology to be assessed, understood, improved and expanded in ways important for scientific progress. In particular, the important goal of reproducibility of research is very difficult to reach without publicly available software. This is the basis we provide it here.
- The software will be publicly announced, following final peer review of the methodology and possible revision. As a condition of using this version, do not redistribute it. It is important that users will see the final version rather than a possibly inconsistent one.
- To install the size package directly in binary please use
install.packages("size",repos="http://www.stat.ucla.edu/~handcock")
- The size package required the locfit package (which can be installed via
install.packages("locfit")
). - Please make sure that you are running the latest R version (3.1.0). To check which version of R you are using type in R:
R.version.string
.
- Introduction to the size package:
- The primary function to call is "posteriorsize". To get information on it try:
> help(posteriorsize)
- There are three example code files below. Each fits a simulated RDS degree sequence from a known (simulated) population.
- exampleposteriorsize.simple.R: Tries on a simulated network
- testnets1000.52.RData simulated network with known network size and statistical properties
- exampleposteriorsize.fauxmadrona.R Uses the "fauxmadrona" population
- exampleposteriorsize.fauxmadrona.flat.R Same as above, with a flat prior
- The "fauxmadrona" network is in the RDS package. It has known population size (N=1000) and complex structure. For details see,
help(fauxmadrona)
.
- The primary function to call is "posteriorsize". To get information on it try:
User-friendly software for the analysis of RDS data, including SS-PSE
RDS Analyst (RDS-A) is a software package for the analysis of Respondent-driven sampling (RDS) data that implements recent advances in statistical methods.
RDS Analyst has an easy-to-use graphical user interface to the powerful and sophisticated capabilities of the computer package R. RDS Analyst has been developed to provide a comprehensive framework for working with RDS data, including tools for sample and population estimations, testing, confidence intervals and sensitivity analysis.
Example capabilities are an easy format for entering data, the visualization of recruitment chains, regression modeling, and missing data.
The interface of RDS Analyst is similar to SPSS. RDS Analyst is also a free, easy to use, alternative to proprietary data analysis software such as SPSS, STATA, SAS/JMP, and Minitab. It has a menu system to do common data manipulation and analysis tasks, and an Excel-like spreadsheet in which to view and edit data.
For the manual and download information, go to the manual. More information on installing RDS Analyst can be found here.
A video illustrating a very simple use of the population size estimation routines is here. The use of the routines to estimate population size in practice requires the specification of other information, as will be described in the meeting.
R package for network scale-up method
- The software package networkreporting package for population size estimation using network scale-up is on CRAN. It is written by Denis Feehan and Matt Salganik.
- To install it from within R, use
install.packages("networkreporting")
- This is an alpha release that is meant for people who are already comfortable with R, but will be improved over time.
- The most recent version of the source code is on github: https://github.com/dfeehan/networkreporting.
Background Papers
Some statistical background on RDS
- Respondent-Driven Sampling: An Assessment of Current Methodology by Krista J. Gile and Mark S. Handcock. Pre-print of Sociological Methodology, 40, p 285-327, 2010.
- Network Model-Assisted Inference from Respondent-Driven Sampling Data by Krista J. Gile and Mark S. Handcock. arXiv.org, 2011.
SS-PSE
- Estimating the Size of Populations at High Risk for HIV using Respondent-Driven Sampling Data by Mark S. Handcock, Krista J. Gile and Corinne M. Mar. Accepted to Biometrics, 2014.
- Estimating Hidden Population Size using Respondent-Driven Sampling Data by Mark S. Handcock, Krista J. Gile and Corinne M. Mar. arXiv.org, 2012.
Network Scaleup-type Methods
- Estimating the Size of Hidden Populations Using the Generalized Network Scale-Up Estimator by Dennis M. Feehan and Matthew J. Salganik. arXiv.org, 2014.
- Estimating Population Size Using the Network Scale Up Method' by Rachael Maltiel, Adrian E. Raftery, Tyler H. McCormick. arXiv.org, 2013.
Time-to-Recruitment Methods
- A recruitment model and population size estimation for respondent-driven sampling by Forrest W. Crawford. arXiv.org, 2014.
- Modeling and Analysing Respondent Driven Sampling as a Counting Process by Yakir Berchenko, Jonathan Rosenblatt, Simon D.W. Frost. arXiv.org, 2013.
Announcements and Discussion Forum
Support mailing list for RDA Analyst