brainome logo

302 Generating Probabilities

Brainome’s Random Forest and Neural Network model predictors can also generate probabilities.

Prerequisites

This notebook assumes brainome is installed as per notebook brainome_101_Quick_Start

The data sets are:

Predictors require numpy to run and optionally scipy to generate a confusion matrix.

!python3 -m pip install brainome --quiet
!brainome --version

import urllib.request as request
response2 = request.urlretrieve('https://download.brainome.ai/data/public/titanic_predict.csv', 'titanic_predict.csv')
%ls -lh titanic_predict.csv
WARNING: You are using pip version 22.0.3; however, version 22.0.4 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.9.10/x64/bin/python3 -m pip install --upgrade pip' command.

/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index
brainome v1.8-120-prod
-rw-r--r-- 1 runner docker 858 Mar 12 21:09 titanic_predict.csv

Generate a predictor

The predictor filename is predictor_302.py

!brainome https://download.brainome.ai/data/public/titanic_train.csv -y -o predictor_302.py -modelonly -q
print('\nCreated predictor_302.py')
!ls -lh predictor_302.py
/opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages/xgboost/compat.py:31: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import MultiIndex, Int64Index

Created predictor_302.py
-rw-r--r-- 1 runner docker 35K Mar 12 21:09 predictor_302.py

Generating classification probabilities for a data set

Rather than picking a single class, this feature outputs the probabilities for each class.

# using pandas to read csv data
%pip install pandas
import pandas as pd
import predictor_302 as predictor
# reading csv file
predict_data = pd.read_csv('titanic_predict.csv', na_values=[], na_filter=False)
# REQUIRED: strip the headers from dataset
predict_values = predict_data.values
probabilities_output = predictor.predict(predict_values, return_probabilities=True)
print(' Prediction Probabilities '.center(80, '-'))
print(probabilities_output)
Requirement already satisfied: pandas in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (1.4.1)
Requirement already satisfied: python-dateutil>=2.8.1 in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (from pandas) (2021.3)
Requirement already satisfied: numpy>=1.18.5 in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (from pandas) (1.22.3)
Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.9.10/x64/lib/python3.9/site-packages (from python-dateutil>=2.8.1->pandas) (1.16.0)
WARNING: You are using pip version 22.0.3; however, version 22.0.4 is available.
You should consider upgrading via the '/opt/hostedtoolcache/Python/3.9.10/x64/bin/python3 -m pip install --upgrade pip' command.

Note: you may need to restart the kernel to use updated packages.
--------------------------- Prediction Probabilities ---------------------------
[['died' 'survived']
 ['0.10347254522374348' '0.8965274547762565']
 ['0.8357493590851747' '0.16425064091482533']
 ['0.5723091317715159' '0.4276908682284841']
 ['0.8357493590851747' '0.16425064091482533']
 ['0.8833638515749265' '0.11663614842507353']
 ['0.8204901964380503' '0.17950980356194968']
 ['0.8215620538923634' '0.17843794610763664']
 ['0.10347254522374348' '0.8965274547762565']
 ['0.42322660063987716' '0.5767733993601228']
 ['0.7495210975955352' '0.2504789024044648']
 ['0.8833638515749265' '0.11663614842507353']]

Combining probabilities into the source data set

import numpy as np
predict_header = predict_data.columns.values
full_output = np.concatenate((
    np.concatenate((predict_header.reshape(1, -1), predict_data)), probabilities_output), axis=1)
pd.DataFrame(full_output)
0 1 2 3 4 5 6 7 8 9 10 11 12
0 PassengerId Cabin_Class Name Sex Age Sibling_Spouse Parent_Children Ticket_Number Fare Cabin_Number Port_of_Embarkation died survived
1 881 2 Shelley, Mrs. William (Imanita Parrish Hall) female 25 0 1 230433 26.0 S 0.10347254522374348 0.8965274547762565
2 882 3 Markun, Mr. Johann male 33 0 0 349257 7.8958 S 0.8357493590851747 0.16425064091482533
3 883 3 Dahlberg, Miss. Gerda Ulrika female 22 0 0 7552 10.5167 S 0.5723091317715159 0.4276908682284841
4 884 2 Banfield, Mr. Frederick James male 28 0 0 C.A./SOTON 34068 10.5 S 0.8357493590851747 0.16425064091482533
5 885 3 Sutehall, Mr. Henry Jr male 25 0 0 SOTON/OQ 392076 7.05 S 0.8833638515749265 0.11663614842507353
6 886 3 Rice, Mrs. William (Margaret Norton) female 39 0 5 382652 29.125 Q 0.8204901964380503 0.17950980356194968
7 887 2 Montvila, Rev. Juozas male 27 0 0 211536 13.0 S 0.8215620538923634 0.17843794610763664
8 888 1 Graham, Miss. Margaret Edith female 19 0 0 112053 30.0 B42 S 0.10347254522374348 0.8965274547762565
9 889 3 Johnston, Miss. Catherine Helen Carrie"" female 1 2 W./C. 6607 23.45 S 0.42322660063987716 0.5767733993601228
10 890 1 Behr, Mr. Karl Howell male 26 0 0 111369 30.0 C148 C 0.7495210975955352 0.2504789024044648
11 891 3 Dooley, Mr. Patrick male 32 0 0 370376 7.75 Q 0.8833638515749265 0.11663614842507353