SOCR EduMaterials AnalysesCommandLineVolumeMultipleRegression

From Socr

Revision as of 13:19, 12 December 2013 by IvoDinov (Talk | contribs)

(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)

Analyses Command-Line - Volume-based Multiple Linear Regression Analysis

This page includes the information on how to access the Multiple Linear Regression library for the purpose of computing VOLUME/IMAGE MLR analyses. Access is provided via shell-based command-line interface on local machines. More information about other SOCR Analyses command-line interfaces is available here.

Introduction

In addition to the graphical user interfaces, via a web-browser, all SOCR Analyses allow command-line shell execution on local systems.

General Usage

Get the latest SOCR JAR files from the SOCR page (http://socr.ucla.edu/htmls/jars/).
The command-line interface to SOCR Analyses generally uses EXAMPLE 1 from the list of example data files for the corresponding analysis.
All Input files are ASCII (see examples within each of the specific analyses).
a -h flag at the end of the command-line indicates that the first row in all ASCII input data files is a HEADER row (so it's not interpreted as data)
Number of variables can be indicated at the end (after -h flag). If no number of variables is specified, 3 is set as default.

Try-It-Online

You can test the Multivariate Regression functionality using the Pipeline PWS Web-start server.

Volume Multiple Linear Regression Usage

Generic Setting:

java -cp /ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_core.jar:/ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_plugin.jar edu.ucla.stat.SOCR.analyses.command.volume.VolumeMultipleRegression -dm DesignMatrix.txt -h -regressors [name1,name2,...,name_k] -dim Zmax Ymax XMax [-intercept interceptConstant] [-p PValue_Filename] [-r RValue_Filename] [-t TStat_Filename] -data_type [0,1,2,3,4] -mask /ifs/tmp/myMaskVolume.img [-byteorder string]

Options:
- -help: print usage
- -dm [DesignMatrix.txt]: specify a tab-separated text file containing the design matrix. Note: Be careful with the construction of the design matrix ... The dm matrix file may need to be imported as an excel spreadsheet, first, and then recopied back to text edit using a PC/Windows machine. Mac and other platforms may introduce hidden characters (e.g., tab/return keys). So if you get an error like Beginning the stat analyses ... VolumeMultipleRegression Error!!!!!!!!!!!!!!, then please review your Design Matrix file. Missing values in the DM are indicated by "." Subjects with missing values in one predictor variable (see regressors) are not used in the analysis if this specific predictor is selected as a covariate.
- -mask [Mask-volume.img]: specify a mask-volume (0 or 1 intensities) restricting the voxels, where the regression models are computed (optional), 1 Unsigned-Byte Analyze format volume of the same dimensions as the data (intensity spectrum [0:255], all intensities >0 are considered part of the mask and processed)
- -h: DesignMatrix contains a header (first row)
- -regressors [name1,name2,...name_k]: specify which columns/variables from the Design-Matrix should be used as regressors/covariates
  - Interactions: Interactions are specified within the -regressors flag. For interactions, each interaction variable must be composed of individual regressor variables that must also be previously included in the regressor list. For instance, if we need a triple interaction name1*name2*name3, we must (minimally) specify:
    -regressors name1,name2,name3,name1*name2,name2*name3,name1*name3,name1*name2*name3
- -dim Zmax Ymax XMax: specify the dimension-sizes (for 2D images use ZMax=1, for 1D, Zmax=Y_Max=1
- -intercept [Base_Filename]: Base Filename for the 4 Intercept Estimates. This base filename will be appended with:
  - _Intercept_Pvalue.img
  - _Intercept_Beta.img
  - _Intercept_RPartCorr.img
  - _Intercept_TStat.img
- -i [Base_Filename]: Identical to -intercept [Base_Filename]
- -p [PValue_Filename]: output the p-value volume (enter only the base of the filename)
- -b [Beta_Filename]: output the [AP_Statistics_Curriculum_2007_GLM_Regress#Estimating_the_Best_Linear_Fit | Beta effect-size coefficient]] volume (enter only the base of the filename)
- -r [RValue_Filename]: output the partial-correlation volume (enter only the base of the filename)
- -t [Tstat_Filename]: output the T-Statistics for the effect-size Beta (enter only the base of the filename). Note that if you have one-factor (like CDR scores), this factor may have different levels (e.g., CDR score in {0.0, 0.5, 1.0, 1.5, etc.}). When the number of levels > 2, the results stored in the files specified by the –t flag contain the F-statistics (F-maps) for the effect of this one-factor (CDR). In the special case of the CDR factor having only 2 levels, then the F-map is actually identical to the T-statistics (T-Map).
- -data_type [0,1,2,3,4]: Type=0 is for Unsigned Byte, Type=1 is for Signed Byte, Type=2 is for Unsigned Short Integer, Type=3 is for Signed Short Integer and Type=4 is for 4Byte=Float Volume Input;
- -byteorder string: string is one of {big, little, other}.
  - big = BIG_ENDIAN processor
  - little = LITTLE_ENDIAN processor
  - other = default processor (java.nio.ByteOrder.nativeOrder())
- -byteswap: (deprecated) Only enter this flag if you want the input data to be read in and byteswapped! Note that -byteswap effects : input data, mask-volume and output results!
  - A better alternative is to always swap bytes (if necessary) outside this program (e.g., %> dd conv=swab if=20088_jacobian.img of=20088_jacobian_BS.img).
- Memory Use: Note that for some large file sizes, you may need to request more memory form the JVM. If your data is larger than 200³ then use these parameters after the initial java call (-ms1000m -mx2000m), see the example below. This requests 1-2GB or RAM memory for this process. You may need more or less memory depending on the number of volumes and dimension sizes.

Example: Edit a new file (VolumeMultipleRegression.csh) using any editor and paste this inside (make sure the file has executable permissions). Some operating systems/platforms may require variants of this (C-shell) script.

#!/bin/csh

date

java -ms200m -mx500m -cp /ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_core.jar:/ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_plugin.jar edu.ucla.stat.SOCR.analyses.command.volume.VolumeMultipleRegression -dm /ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/SOCR_CSV_test_Scripts_Data/DM.txt -h -regressors CDR,MMSE -dim 220 220 220 -intercept interceptConstant -p /ifs/tmp/P_Value -r /ifs/tmp/R_Value -t /ifs/tmp/TStat_Value -data_type 2

# Or

java -ms200m -mx500m -cp /ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_core.jar:/ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_plugin.jar edu.ucla.stat.SOCR.analyses.command.volume.VolumeMultipleRegression -dm /ifs/ccb/CCB_SW_Tools/Statistics/SOCR_Statistics/SOCR_CSV_test_Scripts_Data/DM.txt -h -regressors AGE,CDR,AGE*CDR -dim 220 220 220 -p /ifs/ccb/CCB_SW_Tools/Statistics/SOCR_Statistics/SOCR_CSV_test_Scripts_Data/VolumeMultipleRegressionTest/P_Value_mask_New -r /ifs/ccb/CCB_SW_Tools/Statistics/SOCR_Statistics/SOCR_CSV_test_Scripts_Data/VolumeMultipleRegressionTest/R_Value_mask_New -mask /ifs/ccb/CCB_SW_Tools/Statistics/SOCR_Statistics/SOCR_CSV_test_Scripts_Data/VolumeMultipleRegressionTest/UC_mask_final8bit.img -t /ifs/ccb/CCB_SW_Tools/Statistics/SOCR_Statistics/SOCR_CSV_test_Scripts_Data/VolumeMultipleRegressionTest/T_Value_mask_New -data_type 2 -byteorder little &

Note the specification of the AGE*CDR interaction term.

date

exit

Example Input data files

The design-matrix datafile must be provided as tab-separated ASCII/text file (DM.txt). The ASCII content of each of these files should follow the syntax below. Note that the first lines in these files are column headers. This requires the "-h" flag on the command line at execution so that these first lines are interpreted as column headers. The first two columns are the Subject Identifier and filenames for the corresponding imaging volumes, respectively. Columns 3 and on store the corresponding predictor variable (covariate) values. Typically there will be between 1 and 10 covariates. Note that as of March 2009, all covariates have to be numerical values - you can encode all string variables as numbers. For example, SEX can become 0(Male), 1(Female); and GROUP_ID can be 0(Normal), 1(MCI), 2(AD), as shown below.

SUBJECT_ID	FILENAME	SEX	GROUP_ID	AGE	CDR	MMSE
1	/dir_1/1.img	1	0	76.38	0	29
2	/dir_2/2.img	0	0	79.37	0	30
3	/dir_3/3.img	1	1	65.22	0.5	27
4	/dir_4/4.img	1	1	69.42	0.5	25
5	/dir_5/5.img	1	2	70.75	0.5	26
6	/dir_6/6.img	0	1	73.73	0.5	25
7	/dir_7/7.img	1	0	71.2	0	30
8	/dir_8/8.img	0	1	82.78	0.5	28
9	/dir_9/9.img	1	0	70.8	0	30
10	/dir_10/10.img	1	1	75.35	0.5	24
11	/dir_11/11.img	0	1	85.65	0.5	26
12	/dir_12/12.img	0	0	84.76	0	30
13	/dir_13/13.img	0	1	78.87	0.5	26
14	/dir_14/14.img	0	0	70.87	0	29
15	/dir_15/15.img	1	2	71.12	0.5	21
16	/dir_16/16.img	0	0	72.98	0	29
17	/dir_17/17.img	1	0	73.44	0	30
18	/dir_18/18.img	1	2	74.76	1	25
19	/dir_19/19.img	0	2	63.65	0.5	22
20	/dir_20/20.img	0	0	75.91	0	28
21	/dir_21/21.img	1	0	75.4	0	27
22	/dir_22/22.img	0	1	76.31	0.5	26
23	/dir_23/23.img	0	1	83.24	0.5	28
24	/dir_24/24.img	1	2	90.99	0.5	26
25	/dir_25/25.img	0	0	87.33	0	30
26	/dir_26/26.img	1	0	71.94	0	30
27	/dir_27/27.img	1	1	68.98	0.5	26
28	/dir_28/28.img	1	1	56.25	0.5	24
29	/dir_29/29.img	0	1	67.49	0.5	27
30	/dir_30/30.img	0	1	84.44	0.5	29
31	/dir_31/31.img	0	1	66.37	0.5	28
32	/dir_32/32.img	1	2	83.66	1	20
33	/dir_33/33.img	0	0	76.11	0	30
34	/dir_34/34.img	0	1	77.87	0.5	29
35	/dir_35/35.img	0	1	68.54	0.5	30
36	/dir_36/36.img	0	2	83.37	1	20
37	/dir_37/37.img	0	0	59.98	0	30
38	/dir_38/38.img	0	1	70.03	0.5	25
39	/dir_39/39.img	0	2	65.56	1	24
40	/dir_40/40.img	1	1	80.04	0.5	29
41	/dir_41/41.img	1	0	77.81	0	30
42	/dir_42/42.img	0	0	70.2	0	30
43	/dir_43/43.img	1	0	77.14	0	29
44	/dir_44/44.img	1	1	76.07	0.5	24
45	/dir_45/45.img	1	2	65.94	0.5	25
46	/dir_46/46.img	1	1	64.66	0.5	27
47	/dir_47/47.img	0	1	70.55	0.5	28
48	/dir_48/48.img	0	1	71.3	0.5	25
49	/dir_49/49.img	0	1	86.59	0.5	25
50	/dir_50/50.img	1	1	84.09	0.5	24

Errors

There are several types of errors (e.g., Beginning the stat analyses ... VolumeMultipleRegression Error!!!!!!!!!!!!!!) that can occur for various reasons related to system-memory, syntax, data format etc. Below are some examples to watch out for:

Memory Errors: Depending on the size of each subject dataset and the total number of subjects, the system may require specification of additional memory at the start of hte Java JVM (java -ms200m -mx500m ...).
Incorrect Regressors: If your DM file includes a regressor listed as DXCUR, however in the invocation you list the regressor as DX, the call will error out (as the specified regressor is not included in the DM meta-data input file).
Hidden/Special Characters: The DM matrix file may need to be first imported as an excel spreadsheet, and then recopied/exported back to (pure ASCII) text edit using a Linux/PC/Windows machine. Mac and other platforms may introduce hidden characters (e.g., tab/return keys). So, if you get an error like Beginning the stat analyses ... VolumeMultipleRegression Error!!!!!!!!!!!!!!, then review your Design Matrix file.
Missing values: Missing data points in the DM are indicated by "." In current implementation, subjects with missing values in one predictor variable (see regressors) are not used in the analysis if this specific predictor is selected as a covariate.

Assumptions

The SOCR MLR analysis implements the General Linear Model (GLM) and does *not* require normality. Only the 2 assumptions listed are:

The design matrix X must have full column rank, otherwise the parameter vector β will not be identified — at most we will be able to narrow down its value to some linear subspace of $R p$ . For this property to hold, we must have n > p, where n is the sample size, and p is the column rank. Methods for fitting.
The regressors $X i$ are assumed to be error-free, that is they are not contaminated with measurement errors. Although not realistic in many settings, dropping this assumption leads to significantly more difficult errors-in-variables models.

Supplementary information

Auxiliary tools

Masking statistical volumes

This tool provides the functionality to construct Masks of Stat Analysis results (e.g., P, R or T-stat maps outputted by VolumeMultipleRegression) given a user-specified threshold value. The output masks are 0 and 1 Byte volumes (determined by the threshold value).

Example call:

java -cp /ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_core.jar:/ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_plugin.jar edu.ucla.stat.SOCR.analyses.command.SplitMaskPositiveNegativeAnalysisResults -dim Zmax Ymax XMax -input filename -mask mask_filename -threshold Value [-below filename] [-above filename] -data_type [0,1,2,3,4] -byteorder little

Options:
- -help: print usage
- -mask [Mask-volume.img]: specify a mask-volume (0 or 1 intensities) restricting the voxels where the regression models are computed (optional), 1Byte Analyze format volume of the same dimensions as the data
- -dim Zmax Ymax XMax: specify the dimension-sizes (for 2D images use ZMax=1, for 1D, Zmax=Y_Max=1
- -below [Filename]: output mask of the intensities, where input <= threshold (1-Byte)
- -above [Filename]: output mask of the intensities, where input < threshold (1-Byte)
- -threshold [threshold_value]: threshold value separating the below/above intensities of the input file
- -input [Filename]: input file-name
- -data_type [0,1,2,3,4]:
  - Type=0 is for Unsigned Byte input volumes;
  - Type=1 is for Signed Byte input volumes;
  - Type=2 is for Unsigned-short integers
  - Type=3 is for Signed-short integers
  - Type=4 is for 4Byte=Float Volumes
- -byteorder string: string is one of {big, little, other}.
  - big = BIG_ENDIAN processor
  - little = LITTLE_ENDIAN processor
  - other = default processor (java.nio.ByteOrder.nativeOrder())
- -byteswap: (deprecated) Only enter this flag if you want the input data to be read in and byteswapped! Note that -byteswap effects: input data, mask-volume and output results!

References

Che, Annie, Cui, Jenny, and Dinov, Ivo (2009). SOCR Analyses: Implementation and Demonstration of a New Graphical Statistics Educational Toolkit. JSS, Vol. 30, Issue 3, Apr 2009.
Che, A, Cui, J, and Dinov, ID (2009) SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit, JOLT, 5(1), 1-19, March 2009.
Dinov, ID. Statistics Online Computational Resource, Journal of Statistical Software, Vol. 16, No. 1, 1-16, October 2006.

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige