brainome logo

106 Describing Your Data Set

Brainome assumes your CSV file has certain characteristics:

  • the first row is the column headers

  • the target is the last column

  • we train using all columns

Use these parameters to change our assumptions.

  1. -headerless CSV file

  2. Selecting the -target column

  3. -ignorecolumns to omit unique identifiers

Prerequisites

This notebook assumes brainome is installed as per notebook brainome_101_Quick_Start

!python3 -m pip install brainome  --quiet
!brainome -version
brainome v1.006-19-prod

1. -headerless CSV file

Brainome assumes your CSV file has a header row.

Use -headerless when your CSV file omits the header row.

In this example, we use bank.csv

import urllib.request as request
response1 = request.urlretrieve('https://download.brainome.ai/data/public/bank.csv', 'bank.csv')
print(" Headerless data set bank.csv ".center(80,"-"))
!head -4 bank.csv
print("\n"," Ranking an headerless data file ".center(80,"-"))
!brainome bank.csv -headerless -y -o predictor_106_headerless.py | grep -A 6 "Attribute Ranking:"
------------------------- Headerless data set bank.csv -------------------------
3.6216,8.6661,-2.8073,-0.44699,0
4.5459,8.1674,-2.4586,-1.4621,0
3.866,-2.6383,1.9242,0.10645,0
3.4566,9.5228,-4.0112,-3.5944,0
 ----------------------- Ranking an headerless data file ------------------------
    Attribute Ranking:
                                      Feature | Relative Importance
                                            0 :   0.5880
                                            1 :   0.2494
                                            2 :   0.1482
                                            3 :   0.0144
         

2. Selecting the -target column

Brainome assumes the last column is the target.

Use -target to specify a different column.

In this example, we use titanic_train.csv but rather than predicting Survived, we predict Cabin_Class

!brainome https://download.brainome.ai/data/public/titanic_train.csv -target Cabin_Class -y -o predictor_106_target.py | grep "Target Column:"
    Target Column:              Cabin_Class

3. -ignorecolumns to omit unique identifiers

Brainome will use all the columns in your data set. Most data sets include unique identifiers to tie the predictions to an external source.

Use -ignorecolumns to omit features from your model.

In this example, we ignore PassengerId and Ticket_Number from titanic_train.csv

!brainome https://download.brainome.ai/data/public/titanic_train.csv -ignorecolumns "PassengerId,Ticket_Number" -y -o predictor_106_ignorecolumns.py | grep -A 10 "Attribute Ranking:"
    Attribute Ranking:
                                      Feature | Relative Importance
                                          Sex :   0.5270
                                  Cabin_Class :   0.1876
                                 Cabin_Number :   0.0661
                                          Age :   0.0522
                               Sibling_Spouse :   0.0502
                                         Fare :   0.0331
                                         Name :   0.0304
                          Port_of_Embarkation :   0.0289
                              Parent_Children :   0.0246