103 Model Type Selection
Contents
103 Model Type Selection¶
Brainome™ creates predictors using these three model types:
Random Forest
Neural Network
Decision Tree
Prerequisites¶
This notebook requires brainome as installed per notebook brainome_101_Quick_Start
The training data set used is titanic_train.csv.
!python3 -m pip install brainome --quiet
!brainome --version
WARNING: You are using pip version 22.0.3; however, version 22.0.4 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.9.10/x64/bin/python3 -m pip install --upgrade pip' command.
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
brainome v1.8-120-prod
import urllib.request as request
print('Downloading titanic_train.csv')
request.urlretrieve('https://download.brainome.ai/data/public/titanic_train.csv', 'titanic_train.csv')
%ls -lh titanic_train.csv
Downloading titanic_train.csv
-rw-r--r-- 1 runner docker 57K Mar 12 21:05 titanic_train.csv
1. Automatic Model Selection¶
Brainome can automatically select the most appropriate model type for your data’s measurements. In titanic’s case, brainome selects Random Forest.
!brainome titanic_train.csv -y -o predictor_103_automatic.py
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
Brainome Table Compiler v1.8-120-prod
Copyright (c) 2019-2022 Brainome, Inc. All Rights Reserved.
Licensed to: Demo User (Evaluation)
Expiration Date: 2022-12-12 275 days left
Maximum File Size: 100 MB
Maximum Instances: 20000
Maximum Attributes: 100
Maximum Classes: unlimited
Connected to: daimensions.brainome.ai (local execution)
Command:
btc titanic_train.csv -y -o predictor_103_automatic.py
Start Time: 03/12/2022, 21:05 UTC
Cleaning...-done.
Splitting into training and validation...-
done.
Pre-training measurements...-
/
|
\
-
/
|
done.
Pre-training Measurements
Data:
Input: titanic_train.csv
Target Column: Survived
Number of instances: 800
Number of attributes: 11 out of 11
Number of classes: 2
Class Balance:
died: 61.50%
survived: 38.50%
Learnability:
Best guess accuracy: 61.50%
Data Sufficiency: Maybe enough data to generalize. [yellow]
Capacity Progression: at [ 5%, 10%, 20%, 40%, 80%, 100% ]
Ideal Machine Learner: 6, 7, 8, 8, 9, 9
Expected Accuracy: Training Validation
Decision Tree: 100.00% 52.50%
Neural Network: ---- ----
Random Forest: 100.00% 80.25%
Recommendations:
Warning: Data has high information density. Using effort 5 and larger ( -e 5 ) can improve results.
If predictor accuracy is insufficient, try using the option -rank to automatically select the important attributes.
We recommend using Random Forest -f RF.
If predictor accuracy is insufficient, try using the effort option -e with a value of 5 or more to increase training time.
Defaulting to RF model. Model can be forced with -f parameter.
Building classifier...-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
done.
Compiling predictor...-
done.
Validating predictor...-
/
|
\
-
/
done.
Predictor: predictor_103_automatic.py
Classifier Type: Random Forest
System Type: Binary classifier
Training / Validation Split: 60% : 40%
Accuracy:
Best-guess accuracy: 61.50%
Training accuracy: 86.84% (416/479 correct)
Validation Accuracy: 80.99% (260/321 correct)
Combined Model Accuracy: 84.50% (676/800 correct)
Model Capacity (MEC): 41 bits
Generalization Ratio: 9.74 bits/bit
Percent of Data Memorized: 20.84%
Resilience to Noise: -1.01 dB
Training Confusion Matrix:
Actual | Predicted
------ | ---------
died | 279 16
survived | 47 137
Validation Confusion Matrix:
Actual | Predicted
------ | ---------
died | 175 22
survived | 39 85
Training Accuracy by Class:
Survived | TP FP TN FN TPR TNR PPV NPV F1 TS
-------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
died | 279 47 137 16 94.58% 74.46% 85.58% 89.54% 89.86% 81.58%
survived | 137 16 279 47 74.46% 94.58% 89.54% 85.58% 81.31% 68.50%
Validation Accuracy by Class:
Survived | TP FP TN FN TPR TNR PPV NPV F1 TS
-------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
died | 175 39 85 22 88.83% 68.55% 81.78% 79.44% 85.16% 74.15%
survived | 85 22 175 39 68.55% 88.83% 79.44% 81.78% 73.59% 58.22%
Attribute Ranking:
Feature | Relative Importance
Sex : 0.4912
Cabin_Class : 0.1242
Cabin_Number : 0.0664
Parent_Children : 0.0599
Age : 0.0599
Ticket_Number : 0.0414
Fare : 0.0379
PassengerId : 0.0332
Sibling_Spouse : 0.0298
Name : 0.0288
Port_of_Embarkation : 0.0273
End Time: 03/12/2022, 21:05 UTC
Runtime Duration: 8s
The predictor filename is predictor_103_automatic.py
. The source code is approximately 39K bytes.
%ls -lh predictor_103_automatic.py
%pycat predictor_103_automatic.py
-rw-r--r-- 1 runner docker 35K Mar 12 21:05 predictor_103_automatic.py
2. Random Forest¶
You can select the Random Forest model type by using the -f RF
parameter.
Note: The
-modelonly
parameter bypasses the measurements phase which do not change from the previous runs.
!brainome titanic_train.csv -f RF -y -o predictor_103_RF.py -modelonly
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
Brainome Table Compiler v1.8-120-prod
Copyright (c) 2019-2022 Brainome, Inc. All Rights Reserved.
Licensed to: Demo User (Evaluation)
Expiration Date: 2022-12-12 275 days left
Maximum File Size: 100 MB
Maximum Instances: 20000
Maximum Attributes: 100
Maximum Classes: unlimited
Connected to: daimensions.brainome.ai (local execution)
Command:
btc titanic_train.csv -f RF -y -o predictor_103_RF.py -modelonly
Start Time: 03/12/2022, 21:05 UTC
Cleaning...-done.
Splitting into training and validation...-
done.
Building classifier...-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
done.
Compiling predictor...-done.
Validating predictor...-
/
|
\
-
/
done.
Predictor: predictor_103_RF.py
Classifier Type: Random Forest
System Type: Binary classifier
Training / Validation Split: 60% : 40%
Accuracy:
Best-guess accuracy: 61.50%
Training accuracy: 86.84% (416/479 correct)
Validation Accuracy: 80.99% (260/321 correct)
Combined Model Accuracy: 84.50% (676/800 correct)
Model Capacity (MEC): 41 bits
Generalization Ratio: 9.74 bits/bit
Percent of Data Memorized: 20.84%
Resilience to Noise: -1.01 dB
Training Confusion Matrix:
Actual | Predicted
------ | ---------
died | 279 16
survived | 47 137
Validation Confusion Matrix:
Actual | Predicted
------ | ---------
died | 175 22
survived | 39 85
Training Accuracy by Class:
Survived | TP FP TN FN TPR TNR PPV NPV F1 TS
-------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
died | 279 47 137 16 94.58% 74.46% 85.58% 89.54% 89.86% 81.58%
survived | 137 16 279 47 74.46% 94.58% 89.54% 85.58% 81.31% 68.50%
Validation Accuracy by Class:
Survived | TP FP TN FN TPR TNR PPV NPV F1 TS
-------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
died | 175 39 85 22 88.83% 68.55% 81.78% 79.44% 85.16% 74.15%
survived | 85 22 175 39 68.55% 88.83% 79.44% 81.78% 73.59% 58.22%
Attribute Ranking:
Feature | Relative Importance
Sex : 0.4912
Cabin_Class : 0.1242
Cabin_Number : 0.0664
Parent_Children : 0.0599
Age : 0.0599
Ticket_Number : 0.0414
Fare : 0.0379
PassengerId : 0.0332
Sibling_Spouse : 0.0298
Name : 0.0288
Port_of_Embarkation : 0.0273
End Time: 03/12/2022, 21:05 UTC
Runtime Duration: 7s
Open predictor_103_RF.py
to view the Random Forest Predictor
%ls -lh predictor_103_RF.py
%pycat predictor_103_RF.py
-rw-r--r-- 1 runner docker 35K Mar 12 21:05 predictor_103_RF.py
3. Neural Network¶
You can select the Neural Network model type by using the -f NN
parameter.
!brainome titanic_train.csv -f NN -y -o predictor_103_NN.py -modelonly
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
Brainome Table Compiler v1.8-120-prod
Copyright (c) 2019-2022 Brainome, Inc. All Rights Reserved.
Licensed to: Demo User (Evaluation)
Expiration Date: 2022-12-12 275 days left
Maximum File Size: 100 MB
Maximum Instances: 20000
Maximum Attributes: 100
Maximum Classes: unlimited
Connected to: daimensions.brainome.ai (local execution)
Command:
btc titanic_train.csv -f NN -y -o predictor_103_NN.py -modelonly
Start Time: 03/12/2022, 21:05 UTC
Cleaning...-done.
Splitting into training and validation...-
done.
Architecting model...-WARNING: Could not detect a GPU. Neural Network generation will be slow.
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
done.
Priming model...-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/done.
Compiling predictor...-done.
Validating predictor...-
/
|
\
-
/
done.
Predictor: predictor_103_NN.py
Classifier Type: Neural Network
System Type: Binary classifier
Training / Validation Split: 60% : 40%
Accuracy:
Best-guess accuracy: 61.50%
Training accuracy: 63.25% (303/479 correct)
Validation Accuracy: 61.68% (198/321 correct)
Combined Model Accuracy: 62.62% (501/800 correct)
Model Capacity (MEC): 27 bits
Generalization Ratio: 10.78 bits/bit
Percent of Data Memorized: 18.83%
Resilience to Noise: -1.05 dB
Training Confusion Matrix:
Actual | Predicted
------ | ---------
died | 295 0
survived | 176 8
Validation Confusion Matrix:
Actual | Predicted
------ | ---------
died | 195 2
survived | 121 3
Training Accuracy by Class:
Survived | TP FP TN FN TPR TNR PPV NPV F1 TS
-------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
died | 295 176 8 0 100.00% 4.35% 62.63% 100.00% 77.02% 62.63%
survived | 8 0 295 176 4.35% 100.00% 100.00% 62.63% 8.33% 4.35%
Validation Accuracy by Class:
Survived | TP FP TN FN TPR TNR PPV NPV F1 TS
-------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
died | 195 121 3 2 98.98% 2.42% 61.71% 60.00% 76.02% 61.32%
survived | 3 2 195 121 2.42% 98.98% 60.00% 61.71% 4.65% 2.38%
End Time: 03/12/2022, 21:06 UTC
Runtime Duration: 42s
Open predictor_103_NN.py
to view the Neural Network Predictor. The source code is approximately 57K bytes.
%ls -lh predictor_103_NN.py
%pycat predictor_103_NN.py
-rw-r--r-- 1 runner docker 53K Mar 12 21:06 predictor_103_NN.py
4. Decision Tree¶
You can select the Decision Tree model type by using the -f DT
parameter
!brainome titanic_train.csv -f DT -y -o predictor_103_DT.py -modelonly
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
from pandas import MultiIndex, Int64Index
Brainome Table Compiler v1.8-120-prod
Copyright (c) 2019-2022 Brainome, Inc. All Rights Reserved.
Licensed to: Demo User (Evaluation)
Expiration Date: 2022-12-12 275 days left
Maximum File Size: 100 MB
Maximum Instances: 20000
Maximum Attributes: 100
Maximum Classes: unlimited
Connected to: daimensions.brainome.ai (local execution)
Command:
btc titanic_train.csv -f DT -y -o predictor_103_DT.py -modelonly
Start Time: 03/12/2022, 21:06 UTC
Cleaning...-done.
Splitting into training and validation...-
done.
Building classifier...-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
|
\
-
/
done.
Compiling predictor...-done.
Validating predictor...-
/
|
\
-
/done.
Predictor: predictor_103_DT.py
Classifier Type: Decision Tree
System Type: Binary classifier
Training / Validation Split: 60% : 40%
Accuracy:
Best-guess accuracy: 61.50%
Training accuracy: 100.00% (479/479 correct)
Validation Accuracy: 54.82% (176/321 correct)
Combined Model Accuracy: 81.87% (655/800 correct)
Model Capacity (MEC): 236 bits
Generalization Ratio: 1.94 bits/bit
Percent of Data Memorized: 104.60%
Resilience to Noise: -0.31 dB
Training Confusion Matrix:
Actual | Predicted
------ | ---------
died | 295 0
survived | 0 184
Validation Confusion Matrix:
Actual | Predicted
------ | ---------
died | 124 73
survived | 72 52
Training Accuracy by Class:
Survived | TP FP TN FN TPR TNR PPV NPV F1 TS
-------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
died | 295 0 184 0 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
survived | 184 0 295 0 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
Validation Accuracy by Class:
Survived | TP FP TN FN TPR TNR PPV NPV F1 TS
-------- | ---- ---- ---- ---- -------- -------- -------- -------- -------- --------
died | 124 72 52 73 62.94% 41.94% 63.27% 41.60% 63.10% 46.10%
survived | 52 73 124 72 41.94% 62.94% 41.60% 63.27% 41.77% 26.40%
End Time: 03/12/2022, 21:06 UTC
Runtime Duration: 7s
Open predictor_103_DT.py
to view the Decision Tree Predictor. The source code is approximately 33K bytes
%ls -lh predictor_103_DT.py
%pycat predictor_103_DT.py
-rw-r--r-- 1 runner docker 29K Mar 12 21:06 predictor_103_DT.py
Next Steps¶
Check out 104_Using_Predictor