# SVM and Parameter Optimizing

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
from sklearn import datasets
from sklearn import cluster
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn import tree

## Support Vectors

### Basic Idea
Imagine we have data that is like this
![very separable data](../img/svm01.svg)

We might want to choose a decision boundary to separate it. We could choose any one of these three gray lines and get 100% training accuracy.

![very separable data with 3 decision boundaries](../img/svm03.svg)

We could say that the best one is the solid one because it best seaparates the data.

![very separable data with a decision boundary](../img/svm02.svg)

SVM does this, it finds the 'support vectors' which are the points of each class closes to the others and then finds the decison  boundary that has the maximum margin, where the margin is the space between the boundary and each class.  
![very separable data with a decision boundary and margin highlighted](../img/svm05.svg)

When SVM is looking only for straight lines, it's called linear SVM, but SVM can look for different type of boundaries.  We do this by changing the kernel function.  A popular one is called the radial basis function or `rbf` it allows smooth curvy lines.  
````{margin}
```{note}
Additional parameters control how smooth or wavy that line can be.
```
````

So that the SVM can work on data like this:
![very separable data with a decision boundary](../img/svm07.svg)

It can also allow handle data that is not perfectly separable like the following by minimizing the number of errors and maximizing the margin.
![very separable data with a decision boundary](../img/svm08.svg)



### SVM in Sklearn
First we'll load the data and separate the featurs and target ($X$ and $y$)

In [2]:
iris_df = sns.load_dataset('iris')
iris_X = iris_df.drop(columns='species')
iris_y = iris_df['species']

Next, we will split the data into test and train.

In [3]:
iris_X_train, iris_X_test, iris_y_train, iris_y_test = train_test_split(iris_X,iris_y)

Fitting the model is just like other models we have seen:

1. instantiate the object
1. fit the model
1. score the model on the test dat

In [4]:
svm_clf = svm.SVC()
svm_clf.fit(iris_X_train, iris_y_train)
svm_clf.score(iris_X_test, iris_y_test)

0.9473684210526315

We see that this fits pretty well with the default parameters.

## Grid Search Optimization

We can optimize, however to determing the different parameter settings.

A simple way to do this is to fit the model for different parameters and score for each and compare.

We'll focus on the kernel, which controls the type of line, and $C$ which controls the regularization.

In [5]:
param_grid = {'kernel':['linear','rbf'], 'C':[.5, 1, 10]}
svm_opt = GridSearchCV(svm_clf,param_grid,)

The `GridSearchCV` object is constructed first and requires an estimator object and a dictionary that describes the parameter grid to search over.
The dictionary has the parameter names as the keys and the values are the values for that parameter to test.

The `fit` method on the Grid Seearch object fits all of the separate models.

In [6]:
svm_opt.fit(iris_X_train,iris_y_train)

Then we can look at the output.

In [7]:
svm_opt.cv_results_

{'mean_fit_time': array([0.00163226, 0.00164022, 0.00154209, 0.00159893, 0.00152874,
        0.00155783]),
 'std_fit_time': array([1.39948448e-04, 2.34362431e-05, 8.18578171e-06, 1.62563012e-05,
        2.04490324e-05, 1.54284640e-05]),
 'mean_score_time': array([0.00117993, 0.00117402, 0.00112467, 0.0011734 , 0.00112076,
        0.00113878]),
 'std_score_time': array([8.47353181e-05, 7.42611683e-06, 1.67697602e-05, 3.05814373e-05,
        7.17223956e-06, 1.47908707e-05]),
 'param_C': masked_array(data=[0.5, 0.5, 1, 1, 10, 10],
              mask=[False, False, False, False, False, False],
        fill_value='?',
             dtype=object),
 'param_kernel': masked_array(data=['linear', 'rbf', 'linear', 'rbf', 'linear', 'rbf'],
              mask=[False, False, False, False, False, False],
        fill_value='?',
             dtype=object),
 'params': [{'C': 0.5, 'kernel': 'linear'},
  {'C': 0.5, 'kernel': 'rbf'},
  {'C': 1, 'kernel': 'linear'},
  {'C': 1, 'kernel': 'rbf'},
  {'C': 10, 

We note that this is a dictionary, so to make it more readable, we can make it a DataFrame.

In [8]:
pd.DataFrame(svm_opt.cv_results_)

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_C,param_kernel,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
0,0.001632,0.00014,0.00118,8.5e-05,0.5,linear,"{'C': 0.5, 'kernel': 'linear'}",1.0,1.0,1.0,1.0,0.954545,0.990909,0.018182,1
1,0.00164,2.3e-05,0.001174,7e-06,0.5,rbf,"{'C': 0.5, 'kernel': 'rbf'}",1.0,1.0,0.909091,0.954545,0.909091,0.954545,0.040656,6
2,0.001542,8e-06,0.001125,1.7e-05,1.0,linear,"{'C': 1, 'kernel': 'linear'}",1.0,1.0,1.0,1.0,0.954545,0.990909,0.018182,1
3,0.001599,1.6e-05,0.001173,3.1e-05,1.0,rbf,"{'C': 1, 'kernel': 'rbf'}",1.0,1.0,0.909091,1.0,0.954545,0.972727,0.036364,3
4,0.001529,2e-05,0.001121,7e-06,10.0,linear,"{'C': 10, 'kernel': 'linear'}",0.956522,0.956522,0.954545,0.954545,1.0,0.964427,0.017809,4
5,0.001558,1.5e-05,0.001139,1.5e-05,10.0,rbf,"{'C': 10, 'kernel': 'rbf'}",0.956522,0.956522,0.954545,1.0,0.954545,0.964427,0.017809,4


It also has a `best_estimator_` attribute, which is an estimator object.

In [9]:
type(svm_opt.best_estimator_)

sklearn.svm._classes.SVC

This is the model that had the best cross validated score among all of the parameter settings tested.

In [10]:
svm_opt.best_estimator_.score(iris_X_test,iris_y_test)

0.9736842105263158

We can then use this model on the test data.

```{admonition} Try it Yourself
Find the best criterion, max depth, and minimum number of samples per leaf
```

In [11]:
dt = tree.DecisionTreeClassifier()
params_dt = {'criterion':['gini','entropy'],'max_depth':[2,3,4],
       'min_samples_leaf':list(range(2,20,2))}

To do this, we do just as we did above, instantiate and fit the model.

In [12]:
dt_opt = GridSearchCV(dt,params_dt)
dt_opt.fit(iris_X_train,iris_y_train)

Then we can use the `best_params_` attribute to see the best parameter settings.

In [13]:
dt_opt.best_params_

{'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 2}

## Questions after class

### Can this be used on more types of machine learning than just decision trees and svm?
```{toggle}
Yes, this can be used on any estimator in scikit learn. It can even be used on other models that adhere to the required [API](https://scikit-learn.org/stable/developers/develop.html).  

GridSearchCV repeatedly:
- sets the parameter values from param_grid
- runs cross_val_score on the data
```