3. SVM_predict.py

3.1. Description

Build SVM model from “train_file” and then predict cases in “data_file”

3.2. Options

--version show program’s version number and exit
-h, --help show this help message and exit
-t TRAIN_FILE, --train_file=TRAIN_FILE
 Tab or space separated file (for tranining purpose, to build SVM model). The first column contains sample IDs; the second column contains sample labels in integer (must be 0 or 1); the third column contains sample label names (string, must be consistent with column-2). The remaining columns contain featuers used to build SVM model.
-d DATA_FILE, --data_file=DATA_FILE
 Tab or space separated file (new data to predict the label). The first column contains sample IDs; the second column contains sample labels in integer (must be 0 or 1); the third column contains sample label names (string, must be consistent with column-2). The remaining columns contain featuers used to build SVM model.
-C C_VALUE, --cvalue=C_VALUE
 C value. default=1.0
-k S_KERNEL, --kernel=S_KERNEL
 Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. default=linear

3.3. Input files format

TRAIN_FILE and DATA_FILE use the same format as below. the 2nd and 3rd columns in DATA_FILE can be consideres as Original Label and Original Name.

ID Label Label_name feature_1 feature_2 feature_3 feature_n
sample_1 1 WT 1560 795 0.9716 feature_n
sample_2 1 WT 784 219 0.4087 feature_n
sample_3 1 WT 2661 2268 1.1691 feature_n
sample_4 0 Mut 643 198 0.5458 feature_n
sample_5 0 Mut 534 87 1.0545 feature_n
sample_6 0 Mut 332 75 0.5115 feature_n

3.4. Command

$ python3  SVM_predict.py -t lung_CES_5features.tsv  -d lung_CES_data_to_predict.tsv -C 10

3.5. Output to screen

TCGA_ID        Ori_Label       Ori_name        Predict_Label   Predict_Name
TCGA-05-4244   unknown TP53_WT 1       Truncating
TCGA-05-4249   unknown TP53_WT 1       Truncating
TCGA-05-4250   unknown TP53_WT 1       Truncating
TCGA-05-4389   unknown TP53_WT 1       Truncating
TCGA-05-4390   unknown TP53_WT 1       Truncating
TCGA-05-4403   unknown TP53_WT 1       Truncating
TCGA-38-7271   unknown TP53_WT 1       Truncating
TCGA-38-A44F   unknown TP53_WT 0       Normal
TCGA-39-5030   unknown TP53_WT 1       Truncating