18. Model Optimization#
Today we will learn how to find the best hyper parameter values for a model.
We have seen a couple hyperparameters so far:
depth of the decision tree
number of clusters in KMeans
but most models have some hyperparameters, or things that we can control to change how the the fit method works.
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
from sklearn import datasets
from sklearn import cluster
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn import tree
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
from sklearn import datasets
from sklearn import cluster
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn import tree
We will go back to our familiar iris data.
Load it in from sklearn
now we will make training and test data
iris_X_train, iris_X_test, iris_y_train, iris_y_test = train_test_split(iris_X,iris_y)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 iris_X_train, iris_X_test, iris_y_train, iris_y_test = train_test_split(iris_X,iris_y)
NameError: name 'iris_X' is not defined
iris_X_train, iris_X_test, iris_y_train, iris_y_test = train_test_split(iris_X,iris_y)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 1
----> 1 iris_X_train, iris_X_test, iris_y_train, iris_y_test = train_test_split(iris_X,iris_y)
NameError: name 'iris_X' is not defined
svm_clf = svm.SVC(kernel='linear')
svm_clf.fit(iris_X_train,iris_y_train)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 1
----> 1 svm_clf.fit(iris_X_train,iris_y_train)
NameError: name 'iris_X_train' is not defined
svm_clf.fit(iris_X_train,iris_y_train)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 1
----> 1 svm_clf.fit(iris_X_train,iris_y_train)
NameError: name 'iris_X_train' is not defined
svm_clf.score(iris_X_test, iris_y_test)
param_grid = {'kernel':['linear','rbf'], 'C':[.5, 1, 10]}
svm_opt = GridSearchCV(svm_clf,param_grid,)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 1
----> 1 svm_clf.score(iris_X_test, iris_y_test)
2 param_grid = {'kernel':['linear','rbf'], 'C':[.5, 1, 10]}
3 svm_opt = GridSearchCV(svm_clf,param_grid,)
NameError: name 'iris_X_test' is not defined
svm_clf.score(iris_X_test, iris_y_test)
param_grid = {'kernel':['linear','rbf'], 'C':[.5, 1, 10]}
svm_opt = GridSearchCV(svm_clf,param_grid,)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 1
----> 1 svm_clf.score(iris_X_test, iris_y_test)
2 param_grid = {'kernel':['linear','rbf'], 'C':[.5, 1, 10]}
3 svm_opt = GridSearchCV(svm_clf,param_grid,)
NameError: name 'iris_X_test' is not defined
svm_opt.fit(iris_X, iris_y)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 1
----> 1 svm_opt.fit(iris_X, iris_y)
NameError: name 'svm_opt' is not defined
svm_opt.cv_results_
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 1
----> 1 svm_opt.cv_results_
NameError: name 'svm_opt' is not defined
svm_opt.cv_results_
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[12], line 1
----> 1 svm_opt.cv_results_
NameError: name 'svm_opt' is not defined
pd.DataFrame(svm_opt.cv_results_)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[13], line 1
----> 1 pd.DataFrame(svm_opt.cv_results_)
NameError: name 'svm_opt' is not defined
pd.DataFrame(svm_opt.cv_results_)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[14], line 1
----> 1 pd.DataFrame(svm_opt.cv_results_)
NameError: name 'svm_opt' is not defined
svm_opt.best_estimator_.predict(iris_X_test)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[15], line 1
----> 1 svm_opt.best_estimator_.predict(iris_X_test)
NameError: name 'svm_opt' is not defined
svm_opt.best_estimator_.predict(iris_X_test)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[16], line 1
----> 1 svm_opt.best_estimator_.predict(iris_X_test)
NameError: name 'svm_opt' is not defined
18.1. What does an SVM learn?#
Find the optimal criterion, max_depth and min_samples_leaf for a decision tree on the iris data using GridSearchCV
dt = tree.DecisionTreeClassifier()
params_dt = {'criterion':['gini','entropy'],'max_depth':[2,3,4],
'min_samples_leaf':list(range(2,20,2))}
dt = tree.DecisionTreeClassifier()
params_dt = {'criterion':['gini','entropy'],'max_depth':[2,3,4],
'min_samples_leaf':list(range(2,20,2))}
dt_opt = GridSearchCV(dt,params_dt)
dt_opt.fit(iris_X,iris_y)
dt_opt.best_params_
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[19], line 2
1 dt_opt = GridSearchCV(dt,params_dt)
----> 2 dt_opt.fit(iris_X,iris_y)
3 dt_opt.best_params_
NameError: name 'iris_X' is not defined
dt_opt = GridSearchCV(dt,params_dt)
dt_opt.fit(iris_X,iris_y)
dt_opt.best_params_
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[20], line 2
1 dt_opt = GridSearchCV(dt,params_dt)
----> 2 dt_opt.fit(iris_X,iris_y)
3 dt_opt.best_params_
NameError: name 'iris_X' is not defined