# Classification

In this tutorial, we will be exploring several classification techniques.

The code in sections 1-6 was provided by Professor Kucheryavyy; I have broken the code down into a few smaller pieces and added some comments and explanations that should help your understanding. Sections 1 - 5 provide in-depth examples of several new classifications techniques for classification problems just involving one class. Section 6 provides a few more examples.

Section 7 is a continuation of my previous tutorial on k-nearest neighbors classification; you can refer to this section for simple examples of the new techniques we learn here, but for a classification problem with multiple classes (in this case, three classes).

You can view the code for this tutorial here.

## Getting Started

### Importing Libraries

import itertools
import pandas as pd
import numpy as np
import copy

import statsmodels.api as sm
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.decomposition import PCA
from sklearn.cross_decomposition import PLSRegression
from sklearn.model_selection import train_test_split, GridSearchCV, KFold
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Markdown, display


### Plot and Output Settings

We’ll also introduce a few extra settings just to make the output of each of our cells a bit nicer:

# Reset all styles to the default:
plt.rcParams.update(plt.rcParamsDefault)
# Then make graphs inline:
%matplotlib inline

# Useful function for Jupyter to display text in bold:
def displaybd(text):
display(Markdown("**" + text + "**"))


If you would like your plots to be a bit larger, please use the following code:

plt.rcParams['figure.figsize'] = (7, 6)
plt.rcParams['font.size'] = 24
plt.rcParams['legend.fontsize'] = 'large'
plt.rcParams['figure.titlesize'] = 'large'
plt.rcParams['lines.markersize'] = 10


### Our Dataset

In this tutorial, we we will be using a dataset on the stock market, which can be downloaded here. This dataset is from An Introduction to Statistical Learning, with applications in R (Springer, 2013).

As usual, we can use read_csv to create a pandas dataframe:

smarket = pd.read_csv('Smarket.csv', parse_dates=False)


Note that this dataset contains a column Direction, which takes on two different values, either Up or Down. To make this column easier to work with in our regressions, we want to represent these values numerically. Let’s have Up be 1 and Down be 0. To do this, we can use np.where:

smarket["DirectionCode"] = np.where(smarket["Direction"].str.contains("Up"), 1, 0)


Now, let’s get a bit more familiar with our data:

display(smarket[1:10])

YearLag1Lag2Lag3Lag4Lag5VolumeTodayDirectionDirectionCode
120010.9590.381-0.192-2.624-1.0551.29651.032Up1
220011.0320.9590.381-0.192-2.6241.4112-0.623Down0
32001-0.6231.0320.9590.381-0.1921.27600.614Up1
420010.614-0.6231.0320.9590.3811.20570.213Up1
520010.2130.614-0.6231.0320.9591.34911.392Up1
620011.3920.2130.614-0.6231.0321.4450-0.403Down0
72001-0.4031.3920.2130.614-0.6231.40780.027Up1
820010.027-0.4031.3920.2130.6141.16401.303Up1
920011.3030.027-0.4031.3920.2131.23260.287Up1
display(smarket.describe())

YearLag1Lag2Lag3Lag4Lag5VolumeTodayDirectionCode
count1250.0000001250.0000001250.0000001250.0000001250.0000001250.000001250.0000001250.0000001250.000000
mean2003.0160000.0038340.0039190.0017160.0016360.005611.4783050.0031380.518400
std1.4090181.1362991.1362801.1387031.1387741.147550.3603571.1363340.499861
min2001.000000-4.922000-4.922000-4.922000-4.922000-4.922000.356070-4.9220000.000000
25%2002.000000-0.639500-0.639500-0.640000-0.640000-0.640001.257400-0.6395000.000000
50%2003.0000000.0390000.0390000.0385000.0385000.038501.4229500.0385001.000000
75%2004.0000000.5967500.5967500.5967500.5967500.597001.6416750.5967501.000000
max2005.0000005.7330005.7330005.7330005.7330005.733003.1524705.7330001.000000
displaybd("Correlations matrix:")
display(smarket.corr())


Correlations matrix:

YearLag1Lag2Lag3Lag4Lag5VolumeTodayDirectionCode
Year1.0000000.0297000.0305960.0331950.0356890.0297880.5390060.0300950.074608
Lag10.0297001.000000-0.026294-0.010803-0.002986-0.0056750.040910-0.026155-0.039757
Lag20.030596-0.0262941.000000-0.025897-0.010854-0.003558-0.043383-0.010250-0.024081
Lag30.033195-0.010803-0.0258971.000000-0.024051-0.018808-0.041824-0.0024480.006132
Lag40.035689-0.002986-0.010854-0.0240511.000000-0.027084-0.048414-0.0069000.004215
Lag50.029788-0.005675-0.003558-0.018808-0.0270841.000000-0.022002-0.0348600.005423
Volume0.5390060.040910-0.043383-0.041824-0.048414-0.0220021.0000000.0145920.022951
Today0.030095-0.026155-0.010250-0.002448-0.006900-0.0348600.0145921.0000000.730563
DirectionCode0.074608-0.039757-0.0240810.0061320.0042150.0054230.0229510.7305631.000000

smarket["Volume"].plot()
plt.xlabel("Day");
plt.ylabel("Volume");


## Logit

### Running Logit via GLM

A generalized linear model usually refers to a model in which the dependent variable $$y$$ follows some non-normal distribution with a mean $$\mu$$ that is assumed to be some (often nonlinear) function of the independent variable $$x$$. Note that generalized linear models are different from general linear models. We will use the generalized linear models from the statsmodels package to run logit:

model = smf.glm("DirectionCode~Lag1+Lag2+Lag3+Lag4+Lag5+Volume", data=smarket,
family=sm.families.Binomial())
res = model.fit()
display(res.summary())

Dep. Variable: No. Observations: DirectionCode 1250 GLM 1243 Binomial 6 logit 1.0000 IRLS -863.79 Sat, 27 Jun 2020 1727.6 20:46:14 1.25e+03 4 nonrobust
coef std err z P>|z| [0.025 0.975] -0.1260 0.241 -0.523 0.601 -0.598 0.346 -0.0731 0.050 -1.457 0.145 -0.171 0.025 -0.0423 0.050 -0.845 0.398 -0.140 0.056 0.0111 0.050 0.222 0.824 -0.087 0.109 0.0094 0.050 0.187 0.851 -0.089 0.107 0.0103 0.050 0.208 0.835 -0.087 0.107 0.1354 0.158 0.855 0.392 -0.175 0.446

### Predicted Probabilities and Confusion Matrix

displaybd("Predicted probabilities for the first observations:")
DirectionProbs = res.predict()
print(DirectionProbs[0:10])

DirectionHat = np.where(DirectionProbs > 0.5, "Up", "Down")
confusionDF = pd.crosstab(DirectionHat, smarket["Direction"],
rownames=['Predicted'], colnames=['Actual'],
margins=True)
display(Markdown("***"))
displaybd("Confusion matrix:")
display(confusionDF)

displaybd("Share of correctly predicted market movements:")
print(np.mean(smarket['Direction'] == DirectionHat))


Predicted probabilities for the first observations:

[0.50708413 0.48146788 0.48113883 0.51522236 0.51078116 0.50695646
0.49265087 0.50922916 0.51761353 0.48883778]


Confusion matrix:

ActualDownUpAll
Predicted
Down145141286
Up457507964
All6026481250

Share of correctly predicted market movements:

0.5216


### Estimation of Test Error

Here, we’ll first train a model on the data from before 2005, and then test it on the data from after 2005.

train = (smarket['Year'] < 2005)
smarket2005 = smarket[~train]
displaybd("Dimensions of the validation set:")
print(smarket2005.shape)

model = smf.glm("DirectionCode~Lag1+Lag2+Lag3+Lag4+Lag5+Volume", data=smarket,
family=sm.families.Binomial(), subset=train)
res = model.fit()

DirectionProbsTets = res.predict(smarket2005)
DirectionTestHat = np.where(DirectionProbsTets > 0.5, "Up", "Down")
displaybd("Share of correctly predicted market movements in 2005:")
print(np.mean(smarket2005['Direction'] == DirectionTestHat))


Dimensions of the validation set:

(252, 10)


Share of correctly predicted market movements in 2005:

0.4801587301587302


## Linear Discriminant Analysis

Linear discriminant analysis is a robust classification method that relies on the following assumptions:

• the class conditional distributions are Gaussian
• these Gaussians have the same covariance matrix (assume homoskedasticity)

Without these assupmtions, linear discriminant analysis is a form of dimenstionality reduction, so it is especially well-suited for high-dimensional data. Thus, we would want to use linear discriminant analysis when we want to reduce the number of features (reduce the dimensionality) while preserving the distinction between our classes.

### Custom Output Functions

Before getting started with linear discriminat analysis, we’ll write a couple of our own functions that’ll help display some of our calculations nicely:

def printPriorProbabilities(ldaClasses, ldaPriors):
priorsDF = pd.DataFrame()
for cIdx, cName in enumerate(ldaClasses):
priorsDF[cName] = [ldaPriors[cIdx]];
displaybd('Prior probablities of groups:')
display(Markdown(priorsDF.to_html(index=False)))

def printGroupMeans(ldaClasses, featuresNames, ldaGroupMeans):
displaybd("Group means:")
groupMeansDF = pd.DataFrame(index=ldaClasses)
for fIdx, fName in enumerate(featuresNames):
groupMeansDF[fName] = ldaGroupMeans[:, fIdx]
display(groupMeansDF)

def printLDACoeffs(featuresNames, ldaCoeffs):
coeffDF = pd.DataFrame(index=featuresNames)
for cIdx in range(ldaCoeffs.shape[0]):
colName = "LDA" + str(cIdx + 1)
coeffDF[colName] = ldaCoeffs[cIdx]
displaybd("Coefficients of linear discriminants:")
display(coeffDF)


### Fitting an LDA Model

Here, we’ll be using scikit-learn’s Linear Discriminant Analysis class to fit our model:

outcomeName = 'Direction'
featuresNames = ['Lag1', 'Lag2'];

X_train = smarket.loc[train, featuresNames]
y_train = smarket.loc[train, outcomeName]

lda = LinearDiscriminantAnalysis()
ldaFit = lda.fit(X_train, y_train);

printPriorProbabilities(ldaFit.classes_, ldaFit.priors_)
printGroupMeans(ldaFit.classes_, featuresNames, ldaFit.means_)
printLDACoeffs(featuresNames, ldaFit.coef_)
# Coefficients calcualted by Python's LDA are different from R's LDA
# But they are proportional:
printLDACoeffs(featuresNames, 11.580267503964166 * ldaFit.coef_)
# See this: https://stats.stackexchange.com/questions/87479/what-are-coefficients-of-linear-discriminants-in-lda


Prior probablities of groups:

DownUp
0.4919840.508016

Group means:

Lag1Lag2
Down0.0427900.033894
Up-0.039546-0.031325

Coefficients of linear discriminants:

LDA1
Lag1-0.055441
Lag2-0.044345

Coefficients of linear discriminants:

LDA1
Lag1-0.642019
Lag2-0.513529

### LDA Predictions

X_test = smarket2005.loc[~train, featuresNames]
y_test = smarket.loc[~train, outcomeName]
y_hat = ldaFit.predict(X_test)

confusionDF = pd.crosstab(y_hat, y_test,
rownames=['Predicted'], colnames=['Actual'],
margins=True)
displaybd("Confusion matrix:")
display(confusionDF)

displaybd("Share of correctly predicted market movements:")
print(np.mean(y_test == y_hat))


Confusion matrix:

ActualDownUpAll
Predicted
Down353570
Up76106182
All111141252

Share of correctly predicted market movements:

0.5595238095238095


### Posterior Probabilities

Here, we’ll estimate posterior propbabilities, using scikit-learn’s predict_proba function:

pred_p = lda.predict_proba(X_test)
# pred_p is an array of shape (number of observations) x (number of classes)

upNmb = np.sum(pred_p[:, 1] > 0.5)
displaybd("Number of upward movements with threshold 0.5: " + str(upNmb))

upNmb = np.sum(pred_p[:, 1] > 0.9)
displaybd("Number of upward movements with threshold 0.9: " + str(upNmb))


Number of upward movements with threshold 0.5: 182

Number of upward movements with threshold 0.9: 0

Quadratic discriminant analysis is a generalization of linear discriminant analysis as a classifier, but it does not make the same covariance assumption.

### Fitting a QDA Model

Here, we’ll be using scikit-learn’s Quadratic Discriminant Analysis class to fit our model:

qda = QuadraticDiscriminantAnalysis()
qdaFit = qda.fit(X_train, y_train);
printPriorProbabilities(qdaFit.classes_, qdaFit.priors_)
printGroupMeans(qdaFit.classes_, featuresNames, qdaFit.means_)


Prior probablities of groups:

DownUp
0.4919840.508016

Group means:

Lag1Lag2
Down0.0427900.033894
Up-0.039546-0.031325

### QDA Predictions

y_hat = qdaFit.predict(X_test)
confusionDF = pd.crosstab(y_hat, y_test,
rownames=['Predicted'], colnames=['Actual'],
margins=True)
displaybd("Confusion matrix:")
display(confusionDF)
displaybd("Share of correctly predicted market movements:")
print(np.mean(y_test == y_hat))


Confusion matrix:

ActualDownUpAll
Predicted
Down302050
Up81121202
All111141252

Share of correctly predicted market movements:

0.5992063492063492


## k-Nearest Neighbors

Here, we’ll be looking at k-nearest neighbors, which we talked about in lecture 02 of this course. Tutorial 02 was also on k-nearest neighbors classification, so please refer to that tutorial for an additional examples and explanations.

### One Neighbor

knn = neighbors.KNeighborsClassifier(n_neighbors=1)
y_hat = knn.fit(X_train, y_train).predict(X_test)
confusionDF = pd.crosstab(y_hat, y_test,
rownames=['Predicted'], colnames=['Actual'],
margins=True)
displaybd("Confusion matrix:")
display(confusionDF)
displaybd("Share of correctly predicted market movements:")
print(np.mean(y_test == y_hat))


Confusion matrix:

ActualDownUpAll
Predicted
Down4358101
Up6883151
All111141252

Share of correctly predicted market movements:

0.5


### Three Neighbors

knn = neighbors.KNeighborsClassifier(n_neighbors=3)
y_hat = knn.fit(X_train, y_train).predict(X_test)
confusionDF = pd.crosstab(y_hat, y_test,
rownames=['Predicted'], colnames=['Actual'],
margins=True)
displaybd("Confusion matrix:")
display(confusionDF)
displaybd("Share of correctly predicted market movements:")
print(np.mean(y_test == y_hat))


Confusion matrix:

ActualDownUpAll
Predicted
Down4855103
Up6386149
All111141252

Share of correctly predicted market movements:

0.5317460317460317


## An Application to Caravan Insurance Data

This section will demonstrate the use of two techniques we learned above, KNN and logit.

### A New Dataset

We’ll be using a new dataset that contains information on customers of an insurance company. You can see a detailed description of this dataset here.

caravan = pd.read_csv('Caravan.csv', index_col=0)

display(caravan.describe())
display(caravan.describe(include=[np.object]))

MOSTYPEMAANTHUIMGEMOMVMGEMLEEFMOSHOOFDMGODRKMGODPRMGODOVMGODGEMRELGE...ALEVENAPERSONGAGEZONGAWAOREGABRANDAZEILPLAPLEZIERAFIETSAINBOEDABYSTAND
count5822.0000005822.0000005822.0000005822.0000005822.0000005822.0000005822.0000005822.0000005822.0000005822.000000...5822.0000005822.0000005822.0000005822.0000005822.0000005822.0000005822.0000005822.0000005822.0000005822.000000
mean24.2533491.1106152.6788052.9912405.7736170.6964964.6269321.0699073.2585026.183442...0.0766060.0053250.0065270.0046380.5700790.0005150.0060120.0317760.0079010.014256
std12.8467060.4058420.7898350.8145892.8567601.0032341.7158431.0175031.5976471.909482...0.3775690.0727820.0805320.0774030.5620580.0226960.0816320.2109860.0904630.119996
min1.0000001.0000001.0000001.0000001.0000000.0000000.0000000.0000000.0000000.000000...0.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000
25%10.0000001.0000002.0000002.0000003.0000000.0000004.0000000.0000002.0000005.000000...0.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000
50%30.0000001.0000003.0000003.0000007.0000000.0000005.0000001.0000003.0000006.000000...0.0000000.0000000.0000000.0000001.0000000.0000000.0000000.0000000.0000000.000000
75%35.0000001.0000003.0000003.0000008.0000001.0000006.0000002.0000004.0000007.000000...0.0000000.0000000.0000000.0000001.0000000.0000000.0000000.0000000.0000000.000000
max41.00000010.0000005.0000006.00000010.0000009.0000009.0000005.0000009.0000009.000000...8.0000001.0000001.0000002.0000007.0000001.0000002.0000003.0000002.0000002.000000

8 rows × 85 columns

Purchase
count5822
unique2
topNo
freq5474

#### Standardizing Our Data

y = caravan.Purchase
X = caravan.drop('Purchase', axis=1).astype('float64')
X_scaled = preprocessing.scale(X)


#### Splitting Data into Train and Test Data

X_train = X_scaled[1000:,:]
y_train = y[1000:]
X_test = X_scaled[:1000,:]
y_test = y[:1000]


### Using KNN for Prediction

knn = neighbors.KNeighborsClassifier(n_neighbors=1)
y_hat = knn.fit(X_train, y_train).predict(X_test)
confusionDF = pd.crosstab(y_hat, y_test,
rownames=['Predicted'], colnames=['Actual'],
margins=True)
displaybd("Confusion matrix:")
display(confusionDF)
displaybd("Share of correctly predicted purchases:")
print(np.mean(y_test == y_hat))


Confusion matrix:

ActualNoYesAll
Predicted
No87350923
Yes68977
All941591000

Share of correctly predicted purchases:

0.882


### Logit

X_train_w_constant = sm.add_constant(X_train)

y_train_code = np.where(y_train == "No", 0, 1)

res = sm.GLM(y_train_code, X_train_w_constant, family=sm.families.Binomial()).fit()
y_hat_code = res.predict(X_test_w_constant)
PurchaseHat = np.where(y_hat_code > 0.25, "Yes", "No")

confusionDF = pd.crosstab(PurchaseHat, y_test,
rownames=['Predicted'], colnames=['Actual'],
margins=True)
displaybd("Confusion matrix:")
display(confusionDF)


Confusion matrix:

ActualNoYesAll
Predicted
No91948967
Yes221133
All941591000

## More Iris Classification

Here, we will apply some of the new techniques we learned above to the iris classification problem we explored using k-nearest neighbors in Tutorial 02.

### Our Dataset

I’ve included some of the important descriptions from Tutorial 02 in this tutorial as well, but please review tutorial 02 for more details on how we initially set up and process our dataset.
As a reminder, we are using the iris data set from the University of California, Irvine and are attempting to classify types of irises using the following four attributes:

1. Sepal length
2. Sepal width
3. Petal length
4. Petal width

There are three types of irises:

1. Iris Setosa
2. Iris Versicolor
3. Iris Virginica

#### Importing Our Dataset

Let’s import the data set as a pandas dataframe:

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'type']


#### Splitting Data into Train and Test Data

Let’s first define our $$X$$ and $$y$$ variables:

X = iris_df.iloc[:, :-1] #attributes, iloc[:, :-1] means until the last column
y = iris_df['type'] #labels


Now, et’s split our data into 80% training data and 20% testing data. We can do this using train_test_split and its train_size parameter:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.80)


#### Feature Scaling

Now, we want to perform some feature scaling to normalize the range of our independent variables.

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)


### Logit

Let’s first take a look at how we might apply logit to our iris classification problem. You may apply logit using the techniques we learned above (using GLM), but I will show you one other method we can employ using scikit-learn’s Logistic Regression class, as we can consider logit and logistic regression to be the same thing

#### Fitting Our Model

Let’s import the Logistic Regression class and fit our model as follows:

from sklearn.linear_model import LogisticRegression
logit_model = LogisticRegression()
logit_model.fit(X_train, y_train)


#### Making Predictions

Then, we’ll make some predictions and store them in a variable called y_pred:

y_pred = logit_model.predict(X_test)


#### Evaluating Our Predictions

Like we did in Tutorial 02, let’s make a classification report and confusion matrix.

from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

precisionrecallf1-scoresupport
Iris-setosa1.001.001.009
Iris-versicolor1.000.700.8210
Iris-virginica0.791.000.8811
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)
cm_df = pd.DataFrame(cm,
index = ['setosa','versicolor','virginica'],
columns = ['setosa','versicolor','virginica'])

sns.heatmap(cm_df, annot=True)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()


Using this heat map, we can make the following observations:

1. All setosa flowers were correctly classified by our model.
2. Seven versicolor flowers were correctly classified, and three versicolor flowers were incorrectly classified as virginica flowers.
3. All virginica flowers were correctly classified by our model.

Again, you may not get the same exact classification report or confusion matrix, but this is normal, as your results will vary each time you run your model.

### Linear Discriminant Analysis

Let’s now try using linear discriminant analysis for our classification.

#### Fitting Our Model

Again, let’s use the Linear Discriminant Analysis class to fit our model:

lda_model = LinearDiscriminantAnalysis()
lda_model.fit(X_train, y_train)


#### Making Predictions

Then, we’ll make some predictions and store them in a variable called y_pred:

y_pred = lda_model.predict(X_test)


#### Evaluating Our Predictions

Like we did in Tutorial 02, let’s make a classification report and confusion matrix. If you want, you can also use the functions printPriorProbabilities(), printGroupMeans(), and printLDACoeffs() that we wrote earlier, but here I’ll keep it simple and just look at our classification report and heatmap like we did just earlier.

print(classification_report(y_test, y_pred))

precisionrecallf1-scoresupport
Iris-setosa1.001.001.009
Iris-versicolor1.001.001.0010
Iris-virginica1.001.001.0011

In thise case, we can see our model did very well. Let’s also take a look at the heatmap to see that a little bit more easily:

cm = confusion_matrix(y_test, y_pred)
cm_df = pd.DataFrame(cm,
index = ['setosa','versicolor','virginica'],
columns = ['setosa','versicolor','virginica'])

sns.heatmap(cm_df, annot=True)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()


Using this heat map, we can make the following observations:

1. All setosa flowers were correctly classified by our model.
2. All versicolors were correctly classified by our model.
3. All virginica flowers were correctly classified by our model.

Again, you may not get the same exact classification report or confusion matrix, but this is normal, as your results will vary each time you run your model.

Let’s now try using quadratic discriminant analysis for our classification.

#### Fitting Our Model

Again, let’s use the Linear Discriminant Analysis class to fit our model:

qda_model = QuadraticDiscriminantAnalysis()
qda_model.fit(X_train, y_train)


#### Making Predictions

Then, we’ll make some predictions and store them in a variable called y_pred:

y_pred = qda_model.predict(X_test)


#### Evaluating Our Predictions

Like we did in Tutorial 02, let’s make a classification report and confusion matrix. If you want, you can also use the functions printPriorProbabilities() and printGroupMeans() that we wrote earlier, but here I’ll keep it simple and just look at our classification report and heatmap like we did just earlier.

print(classification_report(y_test, y_pred))

precisionrecallf1-scoresupport
Iris-setosa1.001.001.009
Iris-versicolor1.000.800.8910
Iris-virginica0.851.000.9211
cm = confusion_matrix(y_test, y_pred)
cm_df = pd.DataFrame(cm,
index = ['setosa','versicolor','virginica'],
columns = ['setosa','versicolor','virginica'])

sns.heatmap(cm_df, annot=True)
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()


Using this heat map, we can make the following observations:

1. All setosa flowers were correctly classified by our model.
2. Eight versicolor flowers were correctly classified, and two versicolor flowers were incorrectly classified as virginica flowers.
3. All virginica flowers were correctly classified by our model.

Again, you may not get the same exact classification report or confusion matrix, but this is normal, as your results will vary each time you run your model.