SOCR ResamplingSimulation Docs

From Socr

Revision as of 18:38, 17 February 2014 by IvoDinov (Talk | contribs)
Jump to: navigation, search

Contents

SOCR Educational Materials - Activities - SOCR Resampling, Randomization and Simulation Framework: Technical Documentation

This technical documentation provides instructions and support for the use and expansion of the SOCR Resampling and Simulation Inference Framework, including the [webapp, activity, source-code and examples.

Version

Documentation version 1.0.0, February 09, 2014.

1. Introduction

In Summer 2012, as part of the GSoC 2012, we created for SOCR a randomization, simulation and resampling HTML5 based webapp. We have reached some important milestones and successfully released the first version of this app. There is a plethora of online reading materials regarding statistical inference and data analytics, including randomization tests. The SOCR resampling webapp was designed and implemented with focus on the end-users (learners, instructors, researchers). The user-interface (UI) is made intuitive and responsive and its primary purpose is for students to learn how to draw statistical inference using randomized resampling methods. This tool can also be employed in research studies involving fairly large datasets.

If you are interested only learning how to use the application, Please read sections 1, 2, 3, 4, 8.

If you are a developer, interested in contributing, please look at section 5, 6, 7.

2. How to Use/Install

The current beta version of the application is hosted at SOCR UCLA server. You can freely use and test any of these online webapps.

Currently we are not supporting the offline version due to the cross-domain issue <<Include a link!!!>>, but we are planning to soon release the offline version which you can download and use on your local machine.

This webapp documentation is available online. The code is well documented so the best starting point for developers will be to browse the source code available at github.

The web-app has been developed purely using JavaScript and HTML5. We have tried to make it work all the popular browsers but the best results can be expected in Google chrome browser. We have tested this application with: Firefox 9+, Internet Explorer 9, Google chrome

3. User Interface (UI)

We have paid lot of attention to the UI. We have iterated through many designs to reach the final view which packs lot of functionalities in a single web view. While keeping it simple, we surface the most relevant information corresponding to the action performed by the user at any moment. We designed the application to be a single page web app for simplicity. Since we use, AJAX based queries to world bank API, no page reloads.

The controls for the simulation are nicely tucked into the left side of the screen as a tile, which is unobtrusive while the results are displayed. This tile has all the major controls for selection/generation of datasets, random samples, Inference variables and plots. Other very specific controls are localized geographically near the response area in the web view.

The web view is separated primarily into a central accordion tab structure showing all the different checkpoints in the analysis. The central accordion tabs are placed which denote the major access points in the simulation. The right sidebar holds all the random samples in boxes, generated during the experiment. User can navigate to any random sample and visualize it with the action buttons present in the box containing that random sample. The user interface works best on computer browsers. Mobile browsers work just fine with performance directly proportional to computing power of the device. We also implemented a help menu to aid in guiding the user throughout the simulation.

4. Datasets

Just like any statistical analysis, data sits at the centre fueling any logical analysis to be performed by the user. The app was built to cater not just simple explanatory datasets but fairly large and real ones. It currently supports random sample generation, world bank datasets and any tabular datasets of size ~1000 rows. The app currently supports over 20 datasets from the world bank. We intend to expand this number soon. Also, SOCR datasets will be soon integrated into the app increasing available options to users.

Simulation Data

There are set of preloaded virtual experiments which can be used to generate data. For example, user can choose binomial experiment and simulate coin tosses to get an initial dataset let’s say H,T,T,H,T,H,H,H,T (H=head, T=tail).

User-provided Data

There is a spreadsheet available where users can import data spreadsheets into the webapp from external sources using the mouse/keyboard copy-paste functionality. Alternatively, users can import data from the SOCR data archive by providing the unique SOCR data URL address. You can also specify and import data using the WorldBank API.

WorldBank API

The Worldbank API module allows dataset import from Worldbank. There are three components of the API, among with the RESTful version of fetching indicator datasets is used. The parameters supplied are the development indicators and the range of years to be considered for comparison. The application in current version supports 20 development indicators which can be selected from the dropdown menu.

SOCR Datasets

SOCR wiki has a rich collection of curated datasets which have example analysis given to aid students to better understand statistics. Datasets are currently present in wikimedia MySQL database. Efforts are being made to move the datasets to more useful API based service.

5. Framework MVC design

MVC Core

The design of the SOCR Randomization webapp is based on established, reliable and widely used software design pattern – Model-View-Controller (MVC). MVC categorizes the code into logical sections. The business logic resides in the model, in our case the calculations, where numerical manipulations, data and result generation happens occur in the model. View prepares the look and feel of the user interface components and facilitates all user controls and graphics. Finally, the Controller acts as the glue between model and the view components to enable smooth dynamic integration of data, process information, and mediate the user experiences.

The whole application is built around the 3 components of the web-app i.e. Model (appModel.js) , Controller (appController.js) and View (appView.js). Even though we have tried to keep all components decoupled but there certain bindings between them which will be eventually phased out.

All the calculation, app logic resides primarily in appModel.js. Computation specific to each experiment is present in its own .js file in the exp/ folder. Depending on the need, the functions are loaded into the app. All the DOM events are bound to functions in the controller. Imported, Computed data is stored in a dataStore namespaced under SOCR. Its the only place where we store all data, this way we centralize the data to the whole application. Read/Write to the data performed only through appModel.

Flow

When the web-app starts, the user gets to choose the mode of data input. Once he is done cleaning and choosing the data, he/she gets to load it into the app by clicking “generate random samples”. On clicking this button, the initial datasets are loaded into the dataStore. All the initial datasets are merged into the sample space. This will act as the pool from where we draw random samples.

Now the controller refreshes to show a set of new buttons to perform action. On every click of “run” or “step” button, random samples are generated from the sample space and stored in the dataStore subsequently. Choosing the analysis to perform doesn't interfere with random sample generation. You are free to alter the analysis whenever you want. Once he/she is contented with the number of random samples, inference plots can be generated. The variables that can be plotted of course depends on the kind of analysis performed.

Currently supported analyses include:

Getting data

Data input grid is an Excel like spreadsheet written purely in JavaScript and HTML to facilitate entering of custom data to be processed for the experiment. It is a highly flexible plugin that can be used to copy paste data and retain the formatting from MS Excel, Google Drive or simply HTML tables from a web page. The columns and rows shrink or expand according to the size and organization of the dataset. The data input grid allows simple data import from other web-based data resources by entering the URL of the desired dataset. The application makes simple XMLHttpRequest (XHR request) [29] and parses out the largest table of the given web page in the spreadsheet for further editing. Finally, the SOCR Randomization webapp allow data import from WorldBank via services API calls.

The performance of large data sets was enhanced by using an asynchronous request based data import mechanism to avoid the limitation of JavaScript running as a single thread, which may cause frequent browser hang-ups. Once data is imported, they can be easily edited and a relevant to the experiment subset of data can be selected. The choice of data can be specified either column-wise, using the column header, or by highlighting and selecting the required cells using the mouse. The data is then staged allowing more data to be added later, e.g., in the case of multi-group (K) experiment by repeating the same protocol.

Storing

We designed a javascript based in memory dataset with no persistence. Its an global object acting as the only source of truth for data present in the application. The dataStore creates new namespace for any data added to it. A helper class attaches a utility toolkit to the namespace generated by the dataStore (getData, removeObject, etc.)

Randomization

Randomization is at the core of the application. All inferences are derived using random samples generated from the initial datasets, i.e., Bootstrapping. Currently the Math.rand JavaScript function is employed for the core random value generation. This function is used to generate the index of the next data-point in a random sample.

Charting

The visualization of the results generated by the webapp is done in different forms – e.g., sample-based tokens, cards, coins, dynamically-annotated histogram plots, or text. For histogram plots, the bin size and number calculation are primarily done based on Sturges’ formula, however, it is modified in an effort to make the chart more readable. Along with the graph of the sampling distribution, a vertical bar with a distinctive color is plotted as an indication for the value of the data-driven statistic calculated from the original dataset. Hovering the mouse over this vertical bar displays the sample statistic along with its original value. In the p-value distribution graph, the percentage distribution to the left and to the right of the original sample statistic is computed displayed in the graph. All numerical calculations are performed in the model component of the application. Just like application plug-ins, all experiments included in the randomization webapp are independently coded and stored as separate JavaScript files. We have defined a common structure that current and future randomization webapp plugins should follow, which allows the entire community to extend the webapp, improve the functionality and enhance the graphical interfaces.

6. Technologies Used

We used a lot of open source tools, packages and services to create the webapp. We are of course grateful to the awesome developer community. Git, a Distributed Version Control and source code management (SCM) system, is used as a fast, efficient and decentralized architecture. All the source-code for the randomization webapp is developed, hosted, documented and deployed at GitHub.

  • JQuery - is a versatile and extensible JavaScript library [31] that provides application programming interface (API) to browser-agnostic support for HTML document traversal and manipulation, event handling, animation, and data exchange via AJAX (Asynchronous JavaScript and XML) protocol.
  • D3 (Data Driven Documents) is a JavaScript library for manipulating documents with data. It is used to create the webapp data visualizations using SVG, HTML and CSS.
  • The Head.js library is used to reduce the initial load time for computationally intensive scripts. It provides an efficient mechanism to make calls to all the required scripts in parallel unlike the usual sequential order for script loading. Our choices for technologies employed in the development of the randomization webapp were based on tools and software being licensed under MIT, BSD, LGPL or other open-source mechanisms.
  • Twitter bootstrap - Frontend UI Toolkit built using HTML5/css3/jQuery.
  • Handsontable - a minimalistic Excel-like data grid editor for HTML, JavaScript & jQuery .
  • Worldbank API.

7. Add a new Experiment

The process to add a experiment is simple 3-step process.

  1. You need to create a JavaScript file {experiment_name}.js file in the /js/exp/ folder.
  2. Edit the default functions in the js files according to your needs.
  3. Plug the next experiment in the init.js file.


Contributions

Currently we need JS developers for adding features to the Webapp. For more details please contact the development team at:

FAQ

What is SOCR?

The Statistics Online Computational Resource (SOCR) aims to design, validate and freely disseminate knowledge. Specifically, SOCR provides portable online aids for probability and statistics education, technology based instruction and statistical computing. SOCR tools and resources include a repository of interactive Java applets, portable webapps, computational and graphing tools, instructional and course materials.

What does this web-app Do?

This is basically a simulation app for drawing important inferences from either user provided data OR simulation generated data. It uses the bootstrap algorithm to draw random samples from a dataset to calculate the relevant parameters.

How to generate random samples?

  1. First choose a pre-loaded experiment from the list in the input tile {which opens up when you click on the blue input vertical button }.
  2. Set the parameters in the controller tile.
  3. Generate a dataset to draw random samples from.
  4. Generate a lot random samples by pressing the green run button.
  5. Set the range of samples you want to see in the SampleList (on the right side of the page) and click show button.
  6. Choose the variable from the drop down in the controller tile and click on the red infer button to see the visualization.

How can I use the SOCR resampling and randomization app?

How can be a part of the developer community?

Where and how can I ask questions or share experiences?

What is the license that this app carries?

LGPL/CC-BY

What is this share instance button?

In case you want to share your app with the data loaded {not the generated random samples} and the predefined settings. You can just generate the URLare {click the share instance} and share it with anyone you like!

What inference can I deduce from this application?

Currently we support 4 variables. Mean, Count, Standard Deviation, Percentile.

Where can I find the documentation for the SOCR webapp?

This page.

How to install my own experiment into SOCR webapp?

We will be uploading shortly the API along with the documentation.

See also

References




Translate this page:

(default)

Deutsch

Español

Français

Italiano

Português

日本語

България

الامارات العربية المتحدة

Suomi

इस भाषा में

Norge

한국어

中文

繁体中文

Русский

Nederlands

Ελληνικά

Hrvatska

Česká republika

Danmark

Polska

România

Sverige

Personal tools