104 Using Brainome’s Predictor CLI¶

The predictor generated by Brainome is capable of being used by the command line interface (CLI).

Predictor –help
Validate test csv dataset
Classify unlabeled csv dataset
Feature engineering predictions

Prerequisites¶

This notebook assumes brainome is installed as per notebook brainome_101_Quick_Start

The data sets are:

titanic_train.csv for training data
titanic_validate.csv for validation
titanic_predict.csv for predictions

!python3 -m pip install brainome  --quiet
!brainome -version

import urllib.request as request
response1 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_train.csv', 'titanic_train.csv')
response2 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_validate.csv', 'titanic_validate.csv')
response3 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_predict.csv', 'titanic_predict.csv')
%ls -lh titanic_train.csv titanic_validate.csv titanic_predict.csv

WARNING: You are using pip version 22.0.3; however, version 22.0.4 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.9.10/x64/bin/python3 -m pip install --upgrade pip' command.

/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index

brainome v1.8-120-prod

-rw-r--r-- 1 runner docker  858 Mar 12 21:06 titanic_predict.csv
-rw-r--r-- 1 runner docker  57K Mar 12 21:06 titanic_train.csv
-rw-r--r-- 1 runner docker 5.8K Mar 12 21:06 titanic_validate.csv

Generate a predictor¶

!brainome titanic_train.csv -rank -y -o predictor_104.py -modelonly -q
print("The predictor filename is predictor_104.py")
%ls -lh predictor_104.py
# Preview predictor
%pycat predictor_104.py

/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index

The predictor filename is predictor_104.py

-rw-r--r-- 1 runner docker 35K Mar 12 21:06 predictor_104.py

1. Predictor help¶

Brainome predictors are really short and sweet. They just validate and classify data.

While the predictor source code is portable, it does require numpy to run and optionally scipy to generate the confusion matrices.

!python3 predictor_104.py --help

usage: predictor_104.py [-h] [-validate] [-headerless] [-json] [-trim] csvfile

Predictor trained on ['titanic_train.csv']

positional arguments:
  csvfile      CSV file containing test set (unlabeled).

optional arguments:
  -h, --help   show this help message and exit
  -validate    Validation mode. csvfile must be labeled. Output is
               classification statistics rather than predictions.
  -headerless  Do not treat the first line of csvfile as a header.
  -json        report measurements as json
  -trim        If true, the prediction will not output ignored columns.

2. Validate test dataset¶

The validate function takes a csv data set identical to the training data set and, with the -validate parameter, compares outcomes.

!python3 predictor_104.py -validate titanic_validate.csv

Classifier Type:                    Random Forest
System Type:                        2-way classifier

Accuracy:
    Best-guess accuracy:            61.25%
    Model accuracy:                 80.00% (64/80 correct)
    Improvement over best guess:    18.75% (of possible 38.75%)

Model capacity (MEC):               17 bits
Generalization ratio:               3.62 bits/bit

Confusion Matrix:

      Actual | Predicted
    -------- | ---------
        died |  45   4 
    survived |  12  19 

Accuracy by Class:

      target | TP  FP  TN  FN      TPR      TNR      PPV      NPV       F1       TS
    -------- | --  --  --  --  -------  -------  -------  -------  -------  -------
        died | 45  12  19   4    91.84%    61.29%    78.95%    82.61%    84.91%    73.77%
    survived | 19   4  45  12    61.29%    91.84%    82.61%    78.95%    70.37%    54.29%

3. Classify unlabeled dataset¶

The predictor can classify a similar to training/validation data set sans target column.

It will generate a complete data set with the “Prediction” column appended.

!python3 predictor_104.py titanic_predict.csv > classifications_104.csv
print('Viewing classification predictions.')
%pip install pandas --quiet
import pandas as pd
pd.read_csv('classifications_104.csv')

Viewing classification predictions.

WARNING: You are using pip version 22.0.3; however, version 22.0.4 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.9.10/x64/bin/python3 -m pip install --upgrade pip' command.

Note: you may need to restart the kernel to use updated packages.

	PassengerId	Cabin_Class	Name	Sex	Age	Sibling_Spouse	Parent_Children	Ticket_Number	Fare	Cabin_Number	Port_of_Embarkation	Prediction
0	881	2	Shelley, Mrs. William (Imanita Parrish Hall)	female	25.0	0	1	230433	26.0000	NaN	S	survived
1	882	3	Markun, Mr. Johann	male	33.0	0	0	349257	7.8958	NaN	S	died
2	883	3	Dahlberg, Miss. Gerda Ulrika	female	22.0	0	0	7552	10.5167	NaN	S	survived
3	884	2	Banfield, Mr. Frederick James	male	28.0	0	0	C.A./SOTON 34068	10.5000	NaN	S	died
4	885	3	Sutehall, Mr. Henry Jr	male	25.0	0	0	SOTON/OQ 392076	7.0500	NaN	S	died
5	886	3	Rice, Mrs. William (Margaret Norton)	female	39.0	0	5	382652	29.1250	NaN	Q	died
6	887	2	Montvila, Rev. Juozas	male	27.0	0	0	211536	13.0000	NaN	S	died
7	888	1	Graham, Miss. Margaret Edith	female	19.0	0	0	112053	30.0000	B42	S	survived
8	889	3	Johnston, Miss. Catherine Helen Carrie"	female	NaN	1	2	W./C. 6607	23.4500	NaN	S	survived
9	890	1	Behr, Mr. Karl Howell	male	26.0	0	0	111369	30.0000	C148	C	died
10	891	3	Dooley, Mr. Patrick	male	32.0	0	0	370376	7.7500	NaN	Q	died

4. Feature engineering predictions¶

While feature engineering, it is desired to only view the features that contributed to the prediction.

With the -trim parameter, the output will only show the features deemed important by the model.

!python3 predictor_104.py titanic_predict.csv -trim > trimmed_classifications_104.csv
print('Viewing important features classification predictions.')
# preview uses pandas to read and display csv data
import pandas as pd
pd.read_csv('trimmed_classifications_104.csv')

Viewing important features classification predictions.

	Cabin_Class	Sex	Sibling_Spouse	Parent_Children	Prediction
0	2	female	0	1	survived
1	3	male	0	0	died
2	3	female	0	0	survived
3	2	male	0	0	died
4	3	male	0	0	died
5	3	female	0	5	died
6	2	male	0	0	died
7	1	female	0	0	survived
8	3	female	1	2	survived
9	1	male	0	0	died
10	3	male	0	0	died

Advanced Predictor Usage¶

See notebook 300 Put your model to work for integrating the predictor within your python program.

Next Steps¶

Check out 106 Describe Your CSV
Check out Using Measurement to Create Better Models

Brainome Jupyter Tutorials

104 Using Brainome’s Predictor CLI

Contents