4 min to read
TRAINING & TESTING of the MODELS
We need to train and test the models and finding out the model with best accuracy.
This post will be updated soon.
Let’s get started with part 5 of the series Machine Learning in Bioinformatics With Python. Previously videos we have downloaded the data from UCI Repository and we are also preprocessed our data.
In this video we will be doing the training and testing of the models. We will split the data into the training dataset and the testing dataset. Eventually we will train our models by using two classifiers SVC and logistic regression.
First of all let’s import a few things we will be needing
SVC . See the code block below.
Let’s move on to splitting our datasets into training and testing. So we need
y_test ( by convention y variable for the labels is kept small).
Usually we show shome of the data to the model and keep some of the data to test the accuracy of the model after it is trained. Over here we will use 90% of the data for training purpose and 10% for the testing of the model. We will split the data by using the
test_size = 0.1 means that the training data will be 10% and
random_state = 0 is a random seed which could be a number to ensure that the random numbers are generated in the same order. Here is the code for the split.
The above line of code will create 4 variables. It will store the training & testing lables in
y_test and training & testing features in
Now it is time to define and train our very first classifier the SVC Classifier. Defining a classifier is very easy we just have set up a variable it could be anything like
Classifier for this example and then we just have to call a function like so.
Note: You can just ignore the kernal for the time being or you can study more about it here).
Let’s train out model by simply using the
.fit function and we will be using the training data here. I am going with all the defaults in this beginner tutorieal. However, you can learn more about it in sklearn.svm.SVC the official documentation.
Once our model is trained we need to test the accuracy of our model. We will test this model on out testing data and print the results just like that:
The accuracy for SVC turned out to be 0.9857142857142858 which is pretty darn good.
Let’s try another classifier logistic regression. Now you might get confused that this is logistic regression (regression is mostly when you are trying to predict the continuous values) but here we are doing binary classification. The thing is the naming here is just a bit confusing but logictic regression is used for classification in scikit-learn. We will be using the similar code with slight modifications:
This time accuracy is 0.9714285714285714 which is approximately 97%, not to shabby.
Note: Don’t get worried about
solver = 'liblinear'. You can read about its details here.
That’s all for now, we are done with training of our models and in the next we will be making some predictions with the help of our trained models.