# SOCR EduMaterials AnalysesCommandLineVolumeMultipleRegression

(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)

## Analyses Command-Line - Volume-based Multiple Linear Regression Analysis

This page includes the information on how to access the Multiple Linear Regression library for the purpose of computing VOLUME/IMAGE MLR analyses. Access is provided via shell-based command-line interface on local machines. More information about other SOCR Analyses command-line interfaces is available here.

### Introduction

In addition to the graphical user interfaces, via a web-browser, all SOCR Analyses allow command-line shell execution on local systems.

### General Usage

• Get the latest SOCR JAR files from the SOCR page (http://socr.ucla.edu/htmls/jars/).
• The command-line interface to SOCR Analyses generally uses EXAMPLE 1 from the list of example data files for the corresponding analysis.
• All Input files are ASCII (see examples within each of the specific analyses).
• a -h flag at the end of the command-line indicates that the first row in all ASCII input data files is a HEADER row (so it's not interpreted as data)
• Number of variables can be indicated at the end (after -h flag). If no number of variables is specified, 3 is set as default.

### Try-It-Online

You can test the Multivariate Regression functionality using the Pipeline PWS Web-start server.

### Volume Multiple Linear Regression Usage

• Generic Setting:

 java -cp /ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_core.jar:/ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_plugin.jar edu.ucla.stat.SOCR.analyses.command.volume.VolumeMultipleRegression -dm DesignMatrix.txt -h -regressors [name1,name2,...,name_k] -dim Zmax Ymax XMax [-intercept interceptConstant] [-p PValue_Filename] [-r RValue_Filename] [-t TStat_Filename] -data_type [0,1,2,3,4] -mask /ifs/tmp/myMaskVolume.img [-byteorder string]

• Options:
• -help: print usage
• -dm [DesignMatrix.txt]: specify a tab-separated text file containing the design matrix. Note: Be careful with the construction of the design matrix ... The dm matrix file may need to be imported as an excel spreadsheet, first, and then recopied back to text edit using a PC/Windows machine. Mac and other platforms may introduce hidden characters (e.g., tab/return keys). So if you get an error like Beginning the stat analyses ... VolumeMultipleRegression Error!!!!!!!!!!!!!!, then please review your Design Matrix file. Missing values in the DM are indicated by "." Subjects with missing values in one predictor variable (see regressors) are not used in the analysis if this specific predictor is selected as a covariate.
• -mask [Mask-volume.img]: specify a mask-volume (0 or 1 intensities) restricting the voxels, where the regression models are computed (optional), 1 Unsigned-Byte Analyze format volume of the same dimensions as the data (intensity spectrum [0:255], all intensities >0 are considered part of the mask and processed)
• -h: DesignMatrix contains a header (first row)
• -regressors [name1,name2,...name_k]: specify which columns/variables from the Design-Matrix should be used as regressors/covariates
• Interactions: Interactions are specified within the -regressors flag. For interactions, each interaction variable must be composed of individual regressor variables that must also be previously included in the regressor list. For instance, if we need a triple interaction name1*name2*name3, we must (minimally) specify:
-regressors name1,name2,name3,name1*name2,name2*name3,name1*name3,name1*name2*name3
• -dim Zmax Ymax XMax: specify the dimension-sizes (for 2D images use ZMax=1, for 1D, Zmax=Y_Max=1
• -intercept [Base_Filename]: Base Filename for the 4 Intercept Estimates. This base filename will be appended with:
• _Intercept_Pvalue.img
• _Intercept_Beta.img
• _Intercept_RPartCorr.img
• _Intercept_TStat.img
• -i [Base_Filename]: Identical to -intercept [Base_Filename]
• -p [PValue_Filename]: output the p-value volume (enter only the base of the filename)
• -b [Beta_Filename]: output the [AP_Statistics_Curriculum_2007_GLM_Regress#Estimating_the_Best_Linear_Fit | Beta effect-size coefficient]] volume (enter only the base of the filename)
• -r [RValue_Filename]: output the partial-correlation volume (enter only the base of the filename)
• -t [Tstat_Filename]: output the T-Statistics for the effect-size Beta (enter only the base of the filename). Note that if you have one-factor (like CDR scores), this factor may have different levels (e.g., CDR score in {0.0, 0.5, 1.0, 1.5, etc.}). When the number of levels > 2, the results stored in the files specified by the –t flag contain the F-statistics (F-maps) for the effect of this one-factor (CDR). In the special case of the CDR factor having only 2 levels, then the F-map is actually identical to the T-statistics (T-Map).
• -data_type [0,1,2,3,4]: Type=0 is for Unsigned Byte, Type=1 is for Signed Byte, Type=2 is for Unsigned Short Integer, Type=3 is for Signed Short Integer and Type=4 is for 4Byte=Float Volume Input;
• -byteorder string: string is one of {big, little, other}.
• big = BIG_ENDIAN processor
• little = LITTLE_ENDIAN processor
• other = default processor (java.nio.ByteOrder.nativeOrder())
• -byteswap: (deprecated) Only enter this flag if you want the input data to be read in and byteswapped! Note that -byteswap effects : input data, mask-volume and output results!
• A better alternative is to always swap bytes (if necessary) outside this program (e.g., %> dd conv=swab if=20088_jacobian.img of=20088_jacobian_BS.img).
• Memory Use: Note that for some large file sizes, you may need to request more memory form the JVM. If your data is larger than 2003 then use these parameters after the initial java call (-ms1000m -mx2000m), see the example below. This requests 1-2GB or RAM memory for this process. You may need more or less memory depending on the number of volumes and dimension sizes.
• Example: Edit a new file (VolumeMultipleRegression.csh) using any editor and paste this inside (make sure the file has executable permissions). Some operating systems/platforms may require variants of this (C-shell) script.

#!/bin/csh

date

java -ms200m -mx500m -cp /ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_core.jar:/ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_plugin.jar edu.ucla.stat.SOCR.analyses.command.volume.VolumeMultipleRegression -dm /ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/SOCR_CSV_test_Scripts_Data/DM.txt -h -regressors CDR,MMSE -dim 220 220 220 -intercept interceptConstant -p /ifs/tmp/P_Value -r /ifs/tmp/R_Value -t /ifs/tmp/TStat_Value -data_type 2 

# Or

java -ms200m -mx500m -cp /ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_core.jar:/ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_plugin.jar edu.ucla.stat.SOCR.analyses.command.volume.VolumeMultipleRegression -dm /ifs/ccb/CCB_SW_Tools/Statistics/SOCR_Statistics/SOCR_CSV_test_Scripts_Data/DM.txt -h -regressors AGE,CDR,AGE*CDR -dim 220 220 220 -p /ifs/ccb/CCB_SW_Tools/Statistics/SOCR_Statistics/SOCR_CSV_test_Scripts_Data/VolumeMultipleRegressionTest/P_Value_mask_New -r /ifs/ccb/CCB_SW_Tools/Statistics/SOCR_Statistics/SOCR_CSV_test_Scripts_Data/VolumeMultipleRegressionTest/R_Value_mask_New -mask /ifs/ccb/CCB_SW_Tools/Statistics/SOCR_Statistics/SOCR_CSV_test_Scripts_Data/VolumeMultipleRegressionTest/UC_mask_final8bit.img -t /ifs/ccb/CCB_SW_Tools/Statistics/SOCR_Statistics/SOCR_CSV_test_Scripts_Data/VolumeMultipleRegressionTest/T_Value_mask_New -data_type 2 -byteorder little &

• Note the specification of the AGE*CDR interaction term.

date

exit

### Example Input data files

The design-matrix datafile must be provided as tab-separated ASCII/text file (DM.txt). The ASCII content of each of these files should follow the syntax below. Note that the first lines in these files are column headers. This requires the "-h" flag on the command line at execution so that these first lines are interpreted as column headers. The first two columns are the Subject Identifier and filenames for the corresponding imaging volumes, respectively. Columns 3 and on store the corresponding predictor variable (covariate) values. Typically there will be between 1 and 10 covariates. Note that as of March 2009, all covariates have to be numerical values - you can encode all string variables as numbers. For example, SEX can become 0(Male), 1(Female); and GROUP_ID can be 0(Normal), 1(MCI), 2(AD), as shown below.

SUBJECT_IDFILENAMESEXGROUP_IDAGECDRMMSE
1/dir_1/1.img1076.38029
2/dir_2/2.img0079.37030
3/dir_3/3.img1165.220.527
4/dir_4/4.img1169.420.525
5/dir_5/5.img1270.750.526
6/dir_6/6.img0173.730.525
7/dir_7/7.img1071.2030
8/dir_8/8.img0182.780.528
9/dir_9/9.img1070.8030
10/dir_10/10.img1175.350.524
11/dir_11/11.img0185.650.526
12/dir_12/12.img0084.76030
13/dir_13/13.img0178.870.526
14/dir_14/14.img0070.87029
15/dir_15/15.img1271.120.521
16/dir_16/16.img0072.98029
17/dir_17/17.img1073.44030
18/dir_18/18.img1274.76125
19/dir_19/19.img0263.650.522
20/dir_20/20.img0075.91028
21/dir_21/21.img1075.4027
22/dir_22/22.img0176.310.526
23/dir_23/23.img0183.240.528
24/dir_24/24.img1290.990.526
25/dir_25/25.img0087.33030
26/dir_26/26.img1071.94030
27/dir_27/27.img1168.980.526
28/dir_28/28.img1156.250.524
29/dir_29/29.img0167.490.527
30/dir_30/30.img0184.440.529
31/dir_31/31.img0166.370.528
32/dir_32/32.img1283.66120
33/dir_33/33.img0076.11030
34/dir_34/34.img0177.870.529
35/dir_35/35.img0168.540.530
36/dir_36/36.img0283.37120
37/dir_37/37.img0059.98030
38/dir_38/38.img0170.030.525
39/dir_39/39.img0265.56124
40/dir_40/40.img1180.040.529
41/dir_41/41.img1077.81030
42/dir_42/42.img0070.2030
43/dir_43/43.img1077.14029
44/dir_44/44.img1176.070.524
45/dir_45/45.img1265.940.525
46/dir_46/46.img1164.660.527
47/dir_47/47.img0170.550.528
48/dir_48/48.img0171.30.525
49/dir_49/49.img0186.590.525
50/dir_50/50.img 1184.090.524

### Errors

There are several types of errors (e.g., Beginning the stat analyses ... VolumeMultipleRegression Error!!!!!!!!!!!!!!) that can occur for various reasons related to system-memory, syntax, data format etc. Below are some examples to watch out for:

• Memory Errors: Depending on the size of each subject dataset and the total number of subjects, the system may require specification of additional memory at the start of hte Java JVM (java -ms200m -mx500m ...).
• Incorrect Regressors: If your DM file includes a regressor listed as DXCUR, however in the invocation you list the regressor as DX, the call will error out (as the specified regressor is not included in the DM meta-data input file).
• Hidden/Special Characters: The DM matrix file may need to be first imported as an excel spreadsheet, and then recopied/exported back to (pure ASCII) text edit using a Linux/PC/Windows machine. Mac and other platforms may introduce hidden characters (e.g., tab/return keys). So, if you get an error like Beginning the stat analyses ... VolumeMultipleRegression Error!!!!!!!!!!!!!!, then review your Design Matrix file.
• Missing values: Missing data points in the DM are indicated by "." In current implementation, subjects with missing values in one predictor variable (see regressors) are not used in the analysis if this specific predictor is selected as a covariate.

### Assumptions

The SOCR MLR analysis implements the General Linear Model (GLM) and does *not* require normality. Only the 2 assumptions listed are:

• The design matrix X must have full column rank, otherwise the parameter vector β will not be identified — at most we will be able to narrow down its value to some linear subspace of Rp. For this property to hold, we must have n > p, where n is the sample size, and p is the column rank. Methods for fitting.
• The regressors Xi are assumed to be error-free, that is they are not contaminated with measurement errors. Although not realistic in many settings, dropping this assumption leads to significantly more difficult errors-in-variables models.

### Supplementary information

#### Auxiliary tools

This tool provides the functionality to construct Masks of Stat Analysis results (e.g., P, R or T-stat maps outputted by VolumeMultipleRegression) given a user-specified threshold value. The output masks are 0 and 1 Byte volumes (determined by the threshold value).

• Example call:

 java -cp /ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_core.jar:/ifs/ccb/CCB_SW_Tools/others/Statistics/SOCR_Statistics/bin/SOCR_plugin.jar edu.ucla.stat.SOCR.analyses.command.SplitMaskPositiveNegativeAnalysisResults -dim Zmax Ymax XMax -input filename -mask mask_filename -threshold Value [-below filename] [-above filename] -data_type [0,1,2,3,4] -byteorder little

• Options:
• -help: print usage
• -mask [Mask-volume.img]: specify a mask-volume (0 or 1 intensities) restricting the voxels where the regression models are computed (optional), 1Byte Analyze format volume of the same dimensions as the data
• -dim Zmax Ymax XMax: specify the dimension-sizes (for 2D images use ZMax=1, for 1D, Zmax=Y_Max=1
• -below [Filename]: output mask of the intensities, where input <= threshold (1-Byte)
• -above [Filename]: output mask of the intensities, where input < threshold (1-Byte)
• -threshold [threshold_value]: threshold value separating the below/above intensities of the input file
• -input [Filename]: input file-name
• -data_type [0,1,2,3,4]:
• Type=0 is for Unsigned Byte input volumes;
• Type=1 is for Signed Byte input volumes;
• Type=2 is for Unsigned-short integers
• Type=3 is for Signed-short integers
• Type=4 is for 4Byte=Float Volumes
• -byteorder string: string is one of {big, little, other}.
• big = BIG_ENDIAN processor
• little = LITTLE_ENDIAN processor
• other = default processor (java.nio.ByteOrder.nativeOrder())
• -byteswap: (deprecated) Only enter this flag if you want the input data to be read in and byteswapped! Note that -byteswap effects: input data, mask-volume and output results!