103 Model Type Selection¶

Brainome™ creates predictors using these three model types:

Random Forest
Neural Network
Decision Tree

Prerequisites¶

This notebook requires brainome as installed per notebook brainome_101_Quick_Start

The training data set used is titanic_train.csv.

!python3 -m pip install brainome  --quiet
!brainome --version

WARNING: You are using pip version 22.0.3; however, version 22.0.4 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.9.10/x64/bin/python3 -m pip install --upgrade pip' command.

/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index

brainome v1.8-120-prod

import urllib.request as request
print('Downloading titanic_train.csv')
request.urlretrieve('https://download.brainome.ai/data/public/titanic_train.csv', 'titanic_train.csv')
%ls -lh titanic_train.csv

Downloading titanic_train.csv

-rw-r--r-- 1 runner docker 57K Mar 12 21:05 titanic_train.csv

1. Automatic Model Selection¶

Brainome can automatically select the most appropriate model type for your data’s measurements. In titanic’s case, brainome selects Random Forest.

!brainome titanic_train.csv -y -o predictor_103_automatic.py

/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index

Brainome Table Compiler v1.8-120-prod
Copyright (c) 2019-2022 Brainome, Inc. All Rights Reserved.
Licensed to:                 Demo User  (Evaluation)
Expiration Date:             2022-12-12   275 days left
Maximum File Size:           100 MB
Maximum Instances:           20000
Maximum Attributes:          100
Maximum Classes:             unlimited
Connected to:                daimensions.brainome.ai  (local execution)

Command:
    btc titanic_train.csv -y -o predictor_103_automatic.py

Start Time:                 03/12/2022, 21:05 UTC

Cleaning...-done. 

Splitting into training and validation...-

done.

Pre-training measurements...-

done.

Pre-training Measurements
Data:
    Input:                      titanic_train.csv
    Target Column:              Survived
    Number of instances:        800
    Number of attributes:        11 out of 11
    Number of classes:            2

Class Balance:                
                            died: 61.50%
                        survived: 38.50%

Learnability:
    Best guess accuracy:          61.50%
    Data Sufficiency:             Maybe enough data to generalize. [yellow]

Capacity Progression:             at [ 5%, 10%, 20%, 40%, 80%, 100% ]
    Ideal Machine Learner:              6,   7,   8,   8,   9,   9


Expected Accuracy:              Training            Validation
    Decision Tree:               100.00%                52.50%
    Neural Network:                 ----                  ----
    Random Forest:               100.00%                80.25%
Recommendations:
    Warning: Data has high information density. Using effort 5 and larger ( -e 5 ) can improve results.
    If predictor accuracy is insufficient, try using the option -rank to automatically select the important attributes.
    We recommend using Random Forest -f RF.
    If predictor accuracy is insufficient, try using the effort option -e with a value of 5 or more to increase training time.
    Defaulting to RF model. Model can be forced with -f parameter. 


Building classifier...-

done.

Compiling predictor...-

done.

Validating predictor...-

done.

Predictor:                        predictor_103_automatic.py
    Classifier Type:              Random Forest
    System Type:                  Binary classifier
    Training / Validation Split:  60% : 40%
    Accuracy:
      Best-guess accuracy:        61.50%
      Training accuracy:          86.84% (416/479 correct)
      Validation Accuracy:        80.99% (260/321 correct)
      Combined Model Accuracy:    84.50% (676/800 correct)


    Model Capacity (MEC):         41    bits
    Generalization Ratio:          9.74 bits/bit
    Percent of Data Memorized:    20.84%
    Resilience to Noise:          -1.01 dB







    Training Confusion Matrix:
              Actual | Predicted
              ------ | ---------
                died |  279   16 
            survived |   47  137 

    Validation Confusion Matrix:
              Actual | Predicted
              ------ | ---------
                died |  175   22 
            survived |   39   85 

    Training Accuracy by Class:
            Survived |   TP   FP   TN   FN     TPR      TNR      PPV      NPV       F1       TS 
            -------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
                died |  279   47  137   16   94.58%   74.46%   85.58%   89.54%   89.86%   81.58%
            survived |  137   16  279   47   74.46%   94.58%   89.54%   85.58%   81.31%   68.50%

    Validation Accuracy by Class:
            Survived |   TP   FP   TN   FN     TPR      TNR      PPV      NPV       F1       TS 
            -------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
                died |  175   39   85   22   88.83%   68.55%   81.78%   79.44%   85.16%   74.15%
            survived |   85   22  175   39   68.55%   88.83%   79.44%   81.78%   73.59%   58.22%


    Attribute Ranking:
                                      Feature | Relative Importance
                                          Sex :   0.4912
                                  Cabin_Class :   0.1242
                                 Cabin_Number :   0.0664
                              Parent_Children :   0.0599
                                          Age :   0.0599
                                Ticket_Number :   0.0414
                                         Fare :   0.0379
                                  PassengerId :   0.0332
                               Sibling_Spouse :   0.0298
                                         Name :   0.0288
                          Port_of_Embarkation :   0.0273
         



End Time:           03/12/2022, 21:05 UTC
Runtime Duration:   8s

The predictor filename is predictor_103_automatic.py. The source code is approximately 39K bytes.

%ls -lh predictor_103_automatic.py
%pycat predictor_103_automatic.py

-rw-r--r-- 1 runner docker 35K Mar 12 21:05 predictor_103_automatic.py

2. Random Forest¶

You can select the Random Forest model type by using the -f RF parameter.

Note: The -modelonly parameter bypasses the measurements phase which do not change from the previous runs.

!brainome titanic_train.csv -f RF -y -o predictor_103_RF.py -modelonly

/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index

Brainome Table Compiler v1.8-120-prod
Copyright (c) 2019-2022 Brainome, Inc. All Rights Reserved.
Licensed to:                 Demo User  (Evaluation)
Expiration Date:             2022-12-12   275 days left
Maximum File Size:           100 MB
Maximum Instances:           20000
Maximum Attributes:          100
Maximum Classes:             unlimited
Connected to:                daimensions.brainome.ai  (local execution)

Command:
    btc titanic_train.csv -f RF -y -o predictor_103_RF.py -modelonly

Start Time:                 03/12/2022, 21:05 UTC

Cleaning...-done. 

Splitting into training and validation...-

done.

Building classifier...-

done.

Compiling predictor...-done.

Validating predictor...-

done.

Predictor:                        predictor_103_RF.py
    Classifier Type:              Random Forest
    System Type:                  Binary classifier
    Training / Validation Split:  60% : 40%
    Accuracy:
      Best-guess accuracy:        61.50%
      Training accuracy:          86.84% (416/479 correct)
      Validation Accuracy:        80.99% (260/321 correct)
      Combined Model Accuracy:    84.50% (676/800 correct)


    Model Capacity (MEC):         41    bits
    Generalization Ratio:          9.74 bits/bit
    Percent of Data Memorized:    20.84%
    Resilience to Noise:          -1.01 dB







    Training Confusion Matrix:
              Actual | Predicted
              ------ | ---------
                died |  279   16 
            survived |   47  137 

    Validation Confusion Matrix:
              Actual | Predicted
              ------ | ---------
                died |  175   22 
            survived |   39   85 

    Training Accuracy by Class:
            Survived |   TP   FP   TN   FN     TPR      TNR      PPV      NPV       F1       TS 
            -------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
                died |  279   47  137   16   94.58%   74.46%   85.58%   89.54%   89.86%   81.58%
            survived |  137   16  279   47   74.46%   94.58%   89.54%   85.58%   81.31%   68.50%

    Validation Accuracy by Class:
            Survived |   TP   FP   TN   FN     TPR      TNR      PPV      NPV       F1       TS 
            -------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
                died |  175   39   85   22   88.83%   68.55%   81.78%   79.44%   85.16%   74.15%
            survived |   85   22  175   39   68.55%   88.83%   79.44%   81.78%   73.59%   58.22%


    Attribute Ranking:
                                      Feature | Relative Importance
                                          Sex :   0.4912
                                  Cabin_Class :   0.1242
                                 Cabin_Number :   0.0664
                              Parent_Children :   0.0599
                                          Age :   0.0599
                                Ticket_Number :   0.0414
                                         Fare :   0.0379
                                  PassengerId :   0.0332
                               Sibling_Spouse :   0.0298
                                         Name :   0.0288
                          Port_of_Embarkation :   0.0273
         



End Time:           03/12/2022, 21:05 UTC
Runtime Duration:   7s

Open predictor_103_RF.py to view the Random Forest Predictor

%ls -lh predictor_103_RF.py
%pycat predictor_103_RF.py

-rw-r--r-- 1 runner docker 35K Mar 12 21:05 predictor_103_RF.py

3. Neural Network¶

You can select the Neural Network model type by using the -f NN parameter.

!brainome titanic_train.csv -f NN -y -o predictor_103_NN.py -modelonly

/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index

Brainome Table Compiler v1.8-120-prod
Copyright (c) 2019-2022 Brainome, Inc. All Rights Reserved.
Licensed to:                 Demo User  (Evaluation)
Expiration Date:             2022-12-12   275 days left
Maximum File Size:           100 MB
Maximum Instances:           20000
Maximum Attributes:          100
Maximum Classes:             unlimited
Connected to:                daimensions.brainome.ai  (local execution)

Command:
    btc titanic_train.csv -f NN -y -o predictor_103_NN.py -modelonly

Start Time:                 03/12/2022, 21:05 UTC

Cleaning...-done. 

Splitting into training and validation...-

done.

Architecting model...-WARNING: Could not detect a GPU. Neural Network generation will be slow.

done.

Priming model...-

/done.

Compiling predictor...-done.

Validating predictor...-

done.

Predictor:                        predictor_103_NN.py
    Classifier Type:              Neural Network
    System Type:                  Binary classifier
    Training / Validation Split:  60% : 40%
    Accuracy:
      Best-guess accuracy:        61.50%
      Training accuracy:          63.25% (303/479 correct)
      Validation Accuracy:        61.68% (198/321 correct)
      Combined Model Accuracy:    62.62% (501/800 correct)


    Model Capacity (MEC):         27    bits
    Generalization Ratio:         10.78 bits/bit
    Percent of Data Memorized:    18.83%
    Resilience to Noise:          -1.05 dB







    Training Confusion Matrix:
              Actual | Predicted
              ------ | ---------
                died |  295    0 
            survived |  176    8 

    Validation Confusion Matrix:
              Actual | Predicted
              ------ | ---------
                died |  195    2 
            survived |  121    3 

    Training Accuracy by Class:
            Survived |   TP   FP   TN   FN     TPR      TNR      PPV      NPV       F1       TS 
            -------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
                died |  295  176    8    0  100.00%    4.35%   62.63%  100.00%   77.02%   62.63%
            survived |    8    0  295  176    4.35%  100.00%  100.00%   62.63%    8.33%    4.35%

    Validation Accuracy by Class:
            Survived |   TP   FP   TN   FN     TPR      TNR      PPV      NPV       F1       TS 
            -------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
                died |  195  121    3    2   98.98%    2.42%   61.71%   60.00%   76.02%   61.32%
            survived |    3    2  195  121    2.42%   98.98%   60.00%   61.71%    4.65%    2.38%






End Time:           03/12/2022, 21:06 UTC
Runtime Duration:   42s

Open predictor_103_NN.py to view the Neural Network Predictor. The source code is approximately 57K bytes.

%ls -lh predictor_103_NN.py
%pycat predictor_103_NN.py

-rw-r--r-- 1 runner docker 53K Mar 12 21:06 predictor_103_NN.py

4. Decision Tree¶

You can select the Decision Tree model type by using the -f DT parameter

!brainome titanic_train.csv -f DT -y -o predictor_103_DT.py -modelonly

/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index

Brainome Table Compiler v1.8-120-prod
Copyright (c) 2019-2022 Brainome, Inc. All Rights Reserved.
Licensed to:                 Demo User  (Evaluation)
Expiration Date:             2022-12-12   275 days left
Maximum File Size:           100 MB
Maximum Instances:           20000
Maximum Attributes:          100
Maximum Classes:             unlimited
Connected to:                daimensions.brainome.ai  (local execution)

Command:
    btc titanic_train.csv -f DT -y -o predictor_103_DT.py -modelonly

Start Time:                 03/12/2022, 21:06 UTC

Cleaning...-done. 

Splitting into training and validation...-

done.

Building classifier...-

done.

Compiling predictor...-done.

Validating predictor...-

/done.

Predictor:                        predictor_103_DT.py
    Classifier Type:              Decision Tree
    System Type:                  Binary classifier
    Training / Validation Split:  60% : 40%
    Accuracy:
      Best-guess accuracy:        61.50%
      Training accuracy:         100.00% (479/479 correct)
      Validation Accuracy:        54.82% (176/321 correct)
      Combined Model Accuracy:    81.87% (655/800 correct)


    Model Capacity (MEC):        236    bits
    Generalization Ratio:          1.94 bits/bit
    Percent of Data Memorized:   104.60%
    Resilience to Noise:          -0.31 dB







    Training Confusion Matrix:
              Actual | Predicted
              ------ | ---------
                died |  295    0 
            survived |    0  184 

    Validation Confusion Matrix:
              Actual | Predicted
              ------ | ---------
                died |  124   73 
            survived |   72   52 

    Training Accuracy by Class:
            Survived |   TP   FP   TN   FN     TPR      TNR      PPV      NPV       F1       TS 
            -------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
                died |  295    0  184    0  100.00%  100.00%  100.00%  100.00%  100.00%  100.00%
            survived |  184    0  295    0  100.00%  100.00%  100.00%  100.00%  100.00%  100.00%

    Validation Accuracy by Class:
            Survived |   TP   FP   TN   FN     TPR      TNR      PPV      NPV       F1       TS 
            -------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
                died |  124   72   52   73   62.94%   41.94%   63.27%   41.60%   63.10%   46.10%
            survived |   52   73  124   72   41.94%   62.94%   41.60%   63.27%   41.77%   26.40%






End Time:           03/12/2022, 21:06 UTC
Runtime Duration:   7s

Open predictor_103_DT.py to view the Decision Tree Predictor. The source code is approximately 33K bytes

%ls -lh predictor_103_DT.py
%pycat predictor_103_DT.py

-rw-r--r-- 1 runner docker 29K Mar 12 21:06 predictor_103_DT.py

Next Steps¶

Check out 104_Using_Predictor
Check out Using Measurement to Create Better Models

Brainome Jupyter Tutorials

103 Model Type Selection

Contents