18. Model Optimization#

Today we will learn how to find the best hyper parameter values for a model.

We have seen a couple hyperparameters so far:

  • depth of the decision tree

  • number of clusters in KMeans

but most models have some hyperparameters, or things that we can control to change how the the fit method works.

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
from sklearn import datasets
from sklearn import cluster
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn import tree
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
from sklearn import datasets
from sklearn import cluster
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn import tree

We will go back to our familiar iris data.

Load it in from sklearn

now we will make training and test data

iris_X_train, iris_X_test, iris_y_train, iris_y_test = train_test_split(iris_X,iris_y)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 iris_X_train, iris_X_test, iris_y_train, iris_y_test = train_test_split(iris_X,iris_y)

NameError: name 'iris_X' is not defined
iris_X_train, iris_X_test, iris_y_train, iris_y_test = train_test_split(iris_X,iris_y)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 1
----> 1 iris_X_train, iris_X_test, iris_y_train, iris_y_test = train_test_split(iris_X,iris_y)

NameError: name 'iris_X' is not defined
svm_clf = svm.SVC(kernel='linear')
svm_clf.fit(iris_X_train,iris_y_train)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 svm_clf.fit(iris_X_train,iris_y_train)

NameError: name 'iris_X_train' is not defined
svm_clf.fit(iris_X_train,iris_y_train)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 svm_clf.fit(iris_X_train,iris_y_train)

NameError: name 'iris_X_train' is not defined
svm_clf.score(iris_X_test, iris_y_test)
param_grid = {'kernel':['linear','rbf'], 'C':[.5, 1, 10]}
svm_opt = GridSearchCV(svm_clf,param_grid,)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 svm_clf.score(iris_X_test, iris_y_test)
      2 param_grid = {'kernel':['linear','rbf'], 'C':[.5, 1, 10]}
      3 svm_opt = GridSearchCV(svm_clf,param_grid,)

NameError: name 'iris_X_test' is not defined
svm_clf.score(iris_X_test, iris_y_test)
param_grid = {'kernel':['linear','rbf'], 'C':[.5, 1, 10]}
svm_opt = GridSearchCV(svm_clf,param_grid,)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 svm_clf.score(iris_X_test, iris_y_test)
      2 param_grid = {'kernel':['linear','rbf'], 'C':[.5, 1, 10]}
      3 svm_opt = GridSearchCV(svm_clf,param_grid,)

NameError: name 'iris_X_test' is not defined
svm_opt.fit(iris_X, iris_y)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 svm_opt.fit(iris_X, iris_y)

NameError: name 'svm_opt' is not defined
svm_opt.cv_results_
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 svm_opt.cv_results_

NameError: name 'svm_opt' is not defined
svm_opt.cv_results_
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 svm_opt.cv_results_

NameError: name 'svm_opt' is not defined
pd.DataFrame(svm_opt.cv_results_)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 1
----> 1 pd.DataFrame(svm_opt.cv_results_)

NameError: name 'svm_opt' is not defined
pd.DataFrame(svm_opt.cv_results_)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 pd.DataFrame(svm_opt.cv_results_)

NameError: name 'svm_opt' is not defined
svm_opt.best_estimator_.predict(iris_X_test)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 1
----> 1 svm_opt.best_estimator_.predict(iris_X_test)

NameError: name 'svm_opt' is not defined
svm_opt.best_estimator_.predict(iris_X_test)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 svm_opt.best_estimator_.predict(iris_X_test)

NameError: name 'svm_opt' is not defined

18.1. What does an SVM learn?#

Find the optimal criterion, max_depth and min_samples_leaf for a decision tree on the iris data using GridSearchCV

dt = tree.DecisionTreeClassifier()
params_dt = {'criterion':['gini','entropy'],'max_depth':[2,3,4],
             'min_samples_leaf':list(range(2,20,2))}
dt = tree.DecisionTreeClassifier()
params_dt = {'criterion':['gini','entropy'],'max_depth':[2,3,4],
             'min_samples_leaf':list(range(2,20,2))}
dt_opt = GridSearchCV(dt,params_dt)
dt_opt.fit(iris_X,iris_y)
dt_opt.best_params_
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 2
      1 dt_opt = GridSearchCV(dt,params_dt)
----> 2 dt_opt.fit(iris_X,iris_y)
      3 dt_opt.best_params_

NameError: name 'iris_X' is not defined
dt_opt = GridSearchCV(dt,params_dt)
dt_opt.fit(iris_X,iris_y)
dt_opt.best_params_
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 2
      1 dt_opt = GridSearchCV(dt,params_dt)
----> 2 dt_opt.fit(iris_X,iris_y)
      3 dt_opt.best_params_

NameError: name 'iris_X' is not defined