SOCR Educational Materials - SOCR Data
The links below contain a number of datasets that may be used for demonstration purposes in probability and statistics education. There are two types of data - simulated (computer-generated using random sampling) and observed (research, observationally or experimentally acquired).
The SOCR resources provide a number of mechanisms to simulate data using computer random-number generators. Here are some of the most commonly used SOCR generators of simulated data:
- SOCR Experiments - each experiment reports random outcomes, sample and population distributions and summary statistics.
- SOCR random-number generator - enables sampling of any size from any of the SOCR Distributions.
- SOCR Analyses - all of the SOCR analyses allow random sampling from various populations appropriate for the user-specified analysis.
The following collections include a number of real observed datasets from different disciplines, acquired using different techniques and applicable in different situations.
- Antarctic Ice Thickness at Mawson, Davis and Casey (01/Apr/1954 to 15/Jan/2002). Number of data points is 1636.
- Energy Resources, Production and Consumption Dataset
- California Ozone Data (1980-2006)
- California and US Ozone Data Snapshot
- Human Height and Weight data
- Population Data by Country 2000-2006
- Los Angeles City Neighborhoods Data (from US Census)
- Ranking of the top 100 Countries in the World based on political, economic, health, and quality-of-life factors
Economic, Business and Stock Market Data
Consumer Price Index (CPI)
- Consumer Price Index (1981-2006) - Fuel and Food Data
- Consumer Price Index (1981-2007) - One-, Two- or Three-Way ANOVA Data by items, months and years
- Housing Price Index (2000-2006) (motion charts)
- S&P Home Price Index (1991-2009) (motion charts)
- Sun Microsystems (Java) Stock price (2007-2008)
- S&P 500 (2007-2008)
- US Economy by Sectors (1997-2007) and 2007-2009 Recession Data
- Ranking, Profits and Income of Fortune500 Companies (1955-2008) Dataset
- US Federal Reserve monetary-base data (1959-2009)
- Monthly US Economics data including monetary-base data, interest, CPI, HPI, S&P, Unemployment, Inflation, etc. (1959-2009)
- Monthly Monetary Inflation for Several Countries (2002-2012)
Budgets and Deficits Data
Sector Data, Population Perception Trends data
- Neuroimaging study of 27 Alzheimer's disease (AD) subjects, 35 normal controls (NC), and 42 mild cognitive impairment subjects (MCI)
- Alzheimer's Disease neuroimaging Data
- Neuroimaging study of super-resolution image enhancing
- Neuroimaging study of Prefrontal Cortex Volume across Species and Tissue Types
- Normal and Schizophrenia Children Neuroimaging study
- A large Neuroimaging study of pain including visceral pain, irritable bowel syndrome, ulcerative colitis, and Crohn's disease
- Human Health: Predictive Big Data Analytics, Modeling and Visualization of Clinical, Genetic and Imaging Data for Parkinson’s Disease
- 1993 New York State Heart Attack Patients: Acute Myocardial Infarction (AMI), N=12,844
- Human Health: Modeling and Analysis of Clinical, Genetic and Imaging Data of Alzheimer’s Disease
- Allometric relationship between population density, body mass and metabolic activity in Plants
- Fisher's multivariate dataset on iris sepal and petal length
- Body Density & Body Mass Index (BMI) Data
- Knee Pain Centroid Locations Data
- Neonate Infant Pain Score (NIPS) Data (Vitamin K shots)
Healthcare and Health Science Data
- A number of case-studies including Big and Heterogeneous clinical, nursing, and healthcare datasets.
- Los Angeles County Neighborhoods Data (from US Census)
- 2011 US Jobs Ranking (200 Best to Worst Jobs in the USA for 2011)
- US Electoral College vs. Popular Vote Presidential Elections Mandate Data (1828-2008)
- US Elections and Counties data for 2004
- Brain to Body Weight Dataset
- California Earthquakes Data (1969-2007)
- California Lottery (1992-2011)
- Faculty Publications
- 2007 Advanced Placement (AP) Exam Scores by Discipline
- Online Math Center: A large archive of data from different scientific observations
- Largemouth Bass Mercury Contamination Dataset
- Latin Letters Frequency Data
- NISER Datasets
- Texas Wolfcamp aquifer data
- Hot Dog Calorie and Sodium Dataset
- DataCite DOI Archive for diverse types of datasets
SOCR Course Data and Case-Studies
- UMich HS 853, Fall 2015
- UMich HS 853, Fall 2016
- UMich Data Science and Predictive Analytics (HS 650)
Machine Interfaces to Downloading SOCR Data
In addition to human interactions with the SOCR Data, we provide several machine interfaces to consume and process these data.
- SOCR Data can be copy pasted directly from the Wiki HTML pages into any of the SOCR Java applets.
- SOCR Data can also be loaded into an R computational environment automatically using the protocol below illustrated with the case of a Parkinson's Disease dataset:
library(rvest) # Loading required package: xml2 wiki_url <- read_html("http://wiki.socr.umich.edu/index.php/SOCR_Data_PD_BiomedBigMetadata") html_nodes(wiki_url, "#content") pd_data <- html_table(html_nodes(wiki_url,"table")[]) head(pd_data); summary(pd_data)
- SOCR Home page: http://www.socr.ucla.edu
Translate this page: