{ "cells": [ { "cell_type": "markdown", "id": "28a775e9", "metadata": {}, "source": [ "# Model Comparison" ] }, { "cell_type": "markdown", "id": "a6d185a2", "metadata": {}, "source": [ "To compare models, we will first optimize the parameters of two diffrent models and look at how the different parameters settings impact the model comparison. Later, we'll see how to compare across models of different classes." ] }, { "cell_type": "code", "execution_count": 1, "id": "28da5e4b", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import seaborn as sns\n", "import pandas as pd\n", "from sklearn import datasets\n", "from sklearn import cluster\n", "from sklearn import svm\n", "from sklearn import tree\n", "# import the whole model selection module\n", "from sklearn import model_selection\n", "sns.set_theme(palette='colorblind')" ] }, { "cell_type": "markdown", "id": "bf2b95e0", "metadata": {}, "source": [ "We'll use the iris data again." ] }, { "cell_type": "code", "execution_count": 2, "id": "e8e74039", "metadata": {}, "outputs": [], "source": [ "iris_X, iris_y = datasets.load_iris(return_X_y=True)" ] }, { "cell_type": "markdown", "id": "e80f934e", "metadata": {}, "source": [ "Remember, we need to split the data into training and test. The cross validation step will hep us optimize the parameters, but we don't want *data leakage* where the model has seen the test data multiple times. So, we split the data here for train and test annd the cross validation splits the training data into train and \"test\" again, but this test is better termed validation." ] }, { "cell_type": "code", "execution_count": 3, "id": "c042d51d", "metadata": {}, "outputs": [], "source": [ "iris_X_train, iris_X_test, iris_y_train, iris_y_test = model_selection.train_test_split(\n", " iris_X,iris_y, test_size =.2)" ] }, { "cell_type": "markdown", "id": "245eacdc", "metadata": {}, "source": [ "Then we can make the object, the parameter grid dictionary and the Grid Search object. We split these into separate cells, so that we can use the built in help to see more detail." ] }, { "cell_type": "code", "execution_count": 4, "id": "356a8948", "metadata": {}, "outputs": [], "source": [ "dt = tree.DecisionTreeClassifier()" ] }, { "cell_type": "code", "execution_count": 5, "id": "06e00905", "metadata": {}, "outputs": [], "source": [ "params_dt = {'criterion':['gini','entropy'],\n", " 'max_depth':[2,3,4],\n", " 'min_samples_leaf':list(range(2,20,2))}" ] }, { "cell_type": "code", "execution_count": 6, "id": "bc77f267", "metadata": {}, "outputs": [], "source": [ "dt_opt = model_selection.GridSearchCV(dt,params_dt)" ] }, { "cell_type": "markdown", "id": "a7b5fa39", "metadata": {}, "source": [ "Then we fit the Grid search using the training data, and remember this actually resets the parameters and then cross validates multiple times." ] }, { "cell_type": "code", "execution_count": 7, "id": "8f4f0d24", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
GridSearchCV(estimator=DecisionTreeClassifier(),\n",
       "             param_grid={'criterion': ['gini', 'entropy'],\n",
       "                         'max_depth': [2, 3, 4],\n",
       "                         'min_samples_leaf': [2, 4, 6, 8, 10, 12, 14, 16, 18]})
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "GridSearchCV(estimator=DecisionTreeClassifier(),\n", " param_grid={'criterion': ['gini', 'entropy'],\n", " 'max_depth': [2, 3, 4],\n", " 'min_samples_leaf': [2, 4, 6, 8, 10, 12, 14, 16, 18]})" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dt_opt.fit(iris_X_train,iris_y_train)" ] }, { "cell_type": "markdown", "id": "76aa865c", "metadata": {}, "source": [ "adn look at the results" ] }, { "cell_type": "code", "execution_count": 8, "id": "d820c781", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'mean_fit_time': array([0.00039306, 0.00034938, 0.00034437, 0.000349 , 0.00034513,\n", " 0.00034647, 0.00034838, 0.00034761, 0.00035162, 0.00038896,\n", " 0.00035572, 0.00034981, 0.00035625, 0.00034728, 0.00034947,\n", " 0.000349 , 0.00034842, 0.00034723, 0.00035934, 0.00035663,\n", " 0.00035768, 0.00035028, 0.00035329, 0.00034971, 0.00035257,\n", " 0.00035281, 0.0003489 , 0.00035853, 0.00035596, 0.00035648,\n", " 0.0003509 , 0.00035429, 0.00035157, 0.00035481, 0.000353 ,\n", " 0.000348 , 0.00036955, 0.00036836, 0.00036283, 0.00036693,\n", " 0.00035868, 0.00036645, 0.0003583 , 0.00035987, 0.00035768,\n", " 0.0003799 , 0.00037074, 0.00036554, 0.00036526, 0.00036535,\n", " 0.00036039, 0.00036349, 0.0003541 , 0.00035987]),\n", " 'std_fit_time': array([6.72941413e-05, 8.11268045e-06, 3.81648499e-06, 8.66270211e-06,\n", " 4.09525119e-06, 7.54458255e-06, 3.91587547e-06, 3.24809768e-06,\n", " 8.48636188e-06, 3.16120380e-05, 9.03854864e-06, 5.72204590e-07,\n", " 1.01033231e-05, 1.15430054e-06, 3.76187952e-06, 2.99839209e-06,\n", " 3.01050074e-06, 5.62304040e-06, 3.37646503e-06, 2.38609238e-06,\n", " 8.43530189e-06, 3.58548569e-06, 9.80059873e-06, 3.46618306e-06,\n", " 7.93385961e-06, 8.40289362e-06, 4.56769181e-06, 8.42451299e-06,\n", " 1.52289576e-06, 9.17140216e-06, 1.24709099e-06, 9.49133553e-06,\n", " 3.84911271e-06, 9.27249058e-06, 1.04203921e-05, 5.76164530e-07,\n", " 2.71839005e-06, 7.51680506e-06, 4.13338336e-06, 1.24590533e-05,\n", " 2.32430603e-06, 1.30709058e-05, 4.18259854e-06, 8.42208358e-06,\n", " 3.00218129e-06, 8.78804544e-06, 8.06545774e-06, 5.73474746e-06,\n", " 6.13435761e-06, 6.88794535e-06, 3.65208705e-06, 1.15504882e-05,\n", " 3.65768603e-06, 1.35648229e-05]),\n", " 'mean_score_time': array([0.00024743, 0.00022221, 0.00022173, 0.00022287, 0.00021906,\n", " 0.0002223 , 0.00025768, 0.00025663, 0.00026088, 0.00023623,\n", " 0.00022044, 0.00022583, 0.0002223 , 0.00022435, 0.00022206,\n", " 0.00022521, 0.0002264 , 0.00022044, 0.00022712, 0.00022411,\n", " 0.0002233 , 0.00022063, 0.00022211, 0.00022278, 0.00022445,\n", " 0.00022144, 0.00022278, 0.00022326, 0.00022783, 0.00021963,\n", " 0.00022578, 0.00021935, 0.00022659, 0.00022411, 0.00022068,\n", " 0.00022483, 0.00022292, 0.00022302, 0.00022135, 0.0002223 ,\n", " 0.00022497, 0.00022588, 0.00022087, 0.00022306, 0.00022216,\n", " 0.00022306, 0.00022664, 0.00022197, 0.00022216, 0.00022125,\n", " 0.00022173, 0.00022187, 0.00022726, 0.00022092]),\n", " 'std_score_time': array([4.32319092e-05, 3.19160472e-06, 3.24809768e-06, 3.58548569e-06,\n", " 8.03580262e-07, 3.42062119e-06, 1.14242063e-06, 2.86102295e-07,\n", " 6.98432161e-06, 1.09738869e-05, 9.84180805e-07, 7.53855268e-06,\n", " 3.78657946e-06, 8.77561758e-06, 3.19017957e-06, 1.06726340e-05,\n", " 1.00810187e-05, 1.04033586e-06, 8.75174819e-06, 5.33759511e-06,\n", " 6.38961792e-06, 1.78161065e-06, 2.86340614e-06, 2.90675176e-06,\n", " 3.54466773e-06, 1.61280961e-06, 3.74431046e-06, 4.77218475e-06,\n", " 7.51014740e-06, 1.01601008e-06, 8.88633093e-06, 5.43678010e-07,\n", " 6.88794535e-06, 8.95641852e-06, 8.86968386e-07, 8.29751462e-06,\n", " 1.60291064e-06, 1.86271906e-06, 1.24891289e-06, 1.30238536e-06,\n", " 9.11569983e-06, 9.07445041e-06, 9.36836372e-07, 2.85545443e-06,\n", " 5.09122765e-07, 2.26986508e-06, 7.90112121e-06, 1.28834306e-06,\n", " 2.66943646e-06, 1.16800773e-06, 7.83523403e-07, 1.68991519e-06,\n", " 8.43260596e-06, 1.16410786e-06]),\n", " 'param_criterion': masked_array(data=['gini', 'gini', 'gini', 'gini', 'gini', 'gini', 'gini',\n", " 'gini', 'gini', 'gini', 'gini', 'gini', 'gini', 'gini',\n", " 'gini', 'gini', 'gini', 'gini', 'gini', 'gini', 'gini',\n", " 'gini', 'gini', 'gini', 'gini', 'gini', 'gini',\n", " 'entropy', 'entropy', 'entropy', 'entropy', 'entropy',\n", " 'entropy', 'entropy', 'entropy', 'entropy', 'entropy',\n", " 'entropy', 'entropy', 'entropy', 'entropy', 'entropy',\n", " 'entropy', 'entropy', 'entropy', 'entropy', 'entropy',\n", " 'entropy', 'entropy', 'entropy', 'entropy', 'entropy',\n", " 'entropy', 'entropy'],\n", " mask=[False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False],\n", " fill_value='?',\n", " dtype=object),\n", " 'param_max_depth': masked_array(data=[2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3,\n", " 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n", " 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4],\n", " mask=[False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False],\n", " fill_value='?',\n", " dtype=object),\n", " 'param_min_samples_leaf': masked_array(data=[2, 4, 6, 8, 10, 12, 14, 16, 18, 2, 4, 6, 8, 10, 12, 14,\n", " 16, 18, 2, 4, 6, 8, 10, 12, 14, 16, 18, 2, 4, 6, 8, 10,\n", " 12, 14, 16, 18, 2, 4, 6, 8, 10, 12, 14, 16, 18, 2, 4,\n", " 6, 8, 10, 12, 14, 16, 18],\n", " mask=[False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False],\n", " fill_value='?',\n", " dtype=object),\n", " 'params': [{'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 2},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 4},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 6},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 8},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 10},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 12},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 14},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 16},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 18},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 2},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 4},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 6},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 8},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 10},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 12},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 14},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 16},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 18},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 2},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 4},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 6},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 8},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 10},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 12},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 14},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 16},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 18},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 2},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 4},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 6},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 8},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 10},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 12},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 14},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 16},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 18},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 2},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 4},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 6},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 8},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 10},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 12},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 14},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 16},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 18},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 2},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 4},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 6},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 8},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 10},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 12},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 14},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 16},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 18}],\n", " 'split0_test_score': array([0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333]),\n", " 'split1_test_score': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,\n", " 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,\n", " 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,\n", " 1., 1., 1.]),\n", " 'split2_test_score': array([0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667]),\n", " 'split3_test_score': array([0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.95833333,\n", " 0.95833333, 0.95833333, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.95833333,\n", " 0.95833333, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.95833333, 0.95833333, 0.95833333, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.95833333, 0.95833333, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667]),\n", " 'split4_test_score': array([0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 1. , 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333]),\n", " 'mean_test_score': array([0.95 , 0.95 , 0.95 , 0.95 , 0.95 ,\n", " 0.95 , 0.95 , 0.95 , 0.95 , 0.95833333,\n", " 0.95833333, 0.95833333, 0.95 , 0.95 , 0.95 ,\n", " 0.95 , 0.95 , 0.95 , 0.95833333, 0.95833333,\n", " 0.95833333, 0.95 , 0.95 , 0.95 , 0.95 ,\n", " 0.95 , 0.95 , 0.95 , 0.95 , 0.95 ,\n", " 0.95 , 0.95 , 0.95 , 0.95 , 0.95 ,\n", " 0.95 , 0.95833333, 0.95833333, 0.95833333, 0.95 ,\n", " 0.95 , 0.95 , 0.95 , 0.95 , 0.95 ,\n", " 0.95 , 0.95833333, 0.95833333, 0.95 , 0.95 ,\n", " 0.95 , 0.95 , 0.95 , 0.95 ]),\n", " 'std_test_score': array([0.03118048, 0.03118048, 0.03118048, 0.03118048, 0.03118048,\n", " 0.03118048, 0.03118048, 0.03118048, 0.03118048, 0.02635231,\n", " 0.02635231, 0.02635231, 0.03118048, 0.03118048, 0.03118048,\n", " 0.03118048, 0.03118048, 0.03118048, 0.0372678 , 0.02635231,\n", " 0.02635231, 0.03118048, 0.03118048, 0.03118048, 0.03118048,\n", " 0.03118048, 0.03118048, 0.03118048, 0.03118048, 0.03118048,\n", " 0.03118048, 0.03118048, 0.03118048, 0.03118048, 0.03118048,\n", " 0.03118048, 0.02635231, 0.02635231, 0.02635231, 0.03118048,\n", " 0.03118048, 0.03118048, 0.03118048, 0.03118048, 0.03118048,\n", " 0.03118048, 0.02635231, 0.02635231, 0.03118048, 0.03118048,\n", " 0.03118048, 0.03118048, 0.03118048, 0.03118048]),\n", " 'rank_test_score': array([12, 12, 12, 12, 12, 12, 12, 12, 12, 1, 1, 1, 12, 12, 12, 12, 12,\n", " 12, 11, 1, 1, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,\n", " 12, 12, 1, 1, 1, 12, 12, 12, 12, 12, 12, 12, 1, 1, 12, 12, 12,\n", " 12, 12, 12], dtype=int32)}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dt_opt.cv_results_" ] }, { "cell_type": "markdown", "id": "5ed77f59", "metadata": {}, "source": [ "We can reformat it into a dataframe for further analysis." ] }, { "cell_type": "code", "execution_count": 9, "id": "15225c88", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mean_fit_timestd_fit_timemean_score_timestd_score_timeparam_criterionparam_max_depthparam_min_samples_leafparamssplit0_test_scoresplit1_test_scoresplit2_test_scoresplit3_test_scoresplit4_test_scoremean_test_scorestd_test_scorerank_test_score
00.0003930.0000670.0002470.000043gini22{'criterion': 'gini', 'max_depth': 2, 'min_sam...0.9583331.00.9166670.9166670.9583330.950.0311812
10.0003490.0000080.0002220.000003gini24{'criterion': 'gini', 'max_depth': 2, 'min_sam...0.9583331.00.9166670.9166670.9583330.950.0311812
\n", "
" ], "text/plain": [ " mean_fit_time std_fit_time mean_score_time std_score_time \\\n", "0 0.000393 0.000067 0.000247 0.000043 \n", "1 0.000349 0.000008 0.000222 0.000003 \n", "\n", " param_criterion param_max_depth param_min_samples_leaf \\\n", "0 gini 2 2 \n", "1 gini 2 4 \n", "\n", " params split0_test_score \\\n", "0 {'criterion': 'gini', 'max_depth': 2, 'min_sam... 0.958333 \n", "1 {'criterion': 'gini', 'max_depth': 2, 'min_sam... 0.958333 \n", "\n", " split1_test_score split2_test_score split3_test_score split4_test_score \\\n", "0 1.0 0.916667 0.916667 0.958333 \n", "1 1.0 0.916667 0.916667 0.958333 \n", "\n", " mean_test_score std_test_score rank_test_score \n", "0 0.95 0.03118 12 \n", "1 0.95 0.03118 12 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dt_df = pd.DataFrame(dt_opt.cv_results_)\n", "dt_df.head(2)" ] }, { "cell_type": "markdown", "id": "801a49f6", "metadata": {}, "source": [ "```{admonition} Correction\n", "The parameters in this function were in the wrong \n", "order in this function in class\n", "```\n", "I changed the markers and the color of the error bars for readability." ] }, { "cell_type": "code", "execution_count": 10, "id": "da8d2d53", "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-11-15_18_0.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.errorbar(x=dt_df['mean_fit_time'],y=dt_df['mean_score_time'],\n", " xerr=dt_df['std_fit_time'],yerr=dt_df['std_score_time'],\n", " marker='s',ecolor='r')\n", "plt.xlabel('fit time')\n", "plt.ylabel('score time')\n", "# save the limits so we can reuse them\n", "xmin, xmax, ymin, ymax = plt.axis()" ] }, { "cell_type": "markdown", "id": "287d1a1f", "metadata": {}, "source": [ "The \"points\" are at the mean fit and score times. The lines are the \"standard deviation\" or how much we expect that number to vary, since means are an estimate. \n", "Because the data shows an upward trend, this plot tells us that mostly, the models that are slower to fit are also slower to apply. This makes sense for decision trees, deeper trees take longer to learn and longer to traverse when predicting. \n", "Because the error bars mostly overlap the other points, this tells us that mostly the variation in time is not a reliable difference. If we re-ran the GridSearch, we could get them in different orders. \n", "\n", "To interpret the error bar plot, let's look at a line plot of just the means, with the same limits so that it's easier to compare to the plot above." ] }, { "cell_type": "code", "execution_count": 11, "id": "816c4900", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.0003190333141088638, 0.00046708042490480804)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-11-15_20_1.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(dt_df['mean_fit_time'],\n", " dt_df['mean_score_time'], marker='s')\n", "plt.xlabel('fit time')\n", "plt.ylabel('score time')\n", "# match the axis limits to above\n", "plt.ylim(ymin, ymax)\n", "plt.xlim(xmin,xmax)" ] }, { "cell_type": "markdown", "id": "eb506ead", "metadata": {}, "source": [ "this plot shows the mean times, without the error bars." ] }, { "cell_type": "code", "execution_count": 12, "id": "a9f0c649", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-11-15_22_1.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "dt_df['mean_test_score'].plot(kind='bar')" ] }, { "cell_type": "code", "execution_count": 13, "id": "33453cf4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 0.950000\n", "1 0.950000\n", "2 0.950000\n", "3 0.950000\n", "4 0.950000\n", "5 0.950000\n", "6 0.950000\n", "7 0.950000\n", "8 0.950000\n", "9 0.958333\n", "10 0.958333\n", "11 0.958333\n", "12 0.950000\n", "13 0.950000\n", "14 0.950000\n", "15 0.950000\n", "16 0.950000\n", "17 0.950000\n", "18 0.958333\n", "19 0.958333\n", "20 0.958333\n", "21 0.950000\n", "22 0.950000\n", "23 0.950000\n", "24 0.950000\n", "25 0.950000\n", "26 0.950000\n", "27 0.950000\n", "28 0.950000\n", "29 0.950000\n", "30 0.950000\n", "31 0.950000\n", "32 0.950000\n", "33 0.950000\n", "34 0.950000\n", "35 0.950000\n", "36 0.958333\n", "37 0.958333\n", "38 0.958333\n", "39 0.950000\n", "40 0.950000\n", "41 0.950000\n", "42 0.950000\n", "43 0.950000\n", "44 0.950000\n", "45 0.950000\n", "46 0.958333\n", "47 0.958333\n", "48 0.950000\n", "49 0.950000\n", "50 0.950000\n", "51 0.950000\n", "52 0.950000\n", "53 0.950000\n", "Name: mean_test_score, dtype: float64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dt_df['mean_test_score']" ] }, { "cell_type": "markdown", "id": "5f76bcb3", "metadata": {}, "source": [ "Now let's compare with a different model, we'll use the parameter optimized version for that model." ] }, { "cell_type": "code", "execution_count": 14, "id": "d6707a20", "metadata": {}, "outputs": [ { "ename": "NameError", "evalue": "name 'GridSearchCV' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "Input \u001b[0;32mIn [14]\u001b[0m, in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m svm_clf \u001b[38;5;241m=\u001b[39m svm\u001b[38;5;241m.\u001b[39mSVC()\n\u001b[1;32m 2\u001b[0m param_grid \u001b[38;5;241m=\u001b[39m {\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mkernel\u001b[39m\u001b[38;5;124m'\u001b[39m:[\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mlinear\u001b[39m\u001b[38;5;124m'\u001b[39m,\u001b[38;5;124m'\u001b[39m\u001b[38;5;124mrbf\u001b[39m\u001b[38;5;124m'\u001b[39m], \u001b[38;5;124m'\u001b[39m\u001b[38;5;124mC\u001b[39m\u001b[38;5;124m'\u001b[39m:[\u001b[38;5;241m.5\u001b[39m, \u001b[38;5;241m1\u001b[39m, \u001b[38;5;241m10\u001b[39m]}\n\u001b[0;32m----> 3\u001b[0m svm_opt \u001b[38;5;241m=\u001b[39m \u001b[43mGridSearchCV\u001b[49m(svm_clf,param_grid,)\n", "\u001b[0;31mNameError\u001b[0m: name 'GridSearchCV' is not defined" ] } ], "source": [ "svm_clf = svm.SVC()\n", "param_grid = {'kernel':['linear','rbf'], 'C':[.5, 1, 10]}\n", "svm_opt = GridSearchCV(svm_clf,param_grid,)" ] }, { "cell_type": "markdown", "id": "652438d7", "metadata": {}, "source": [ "The error above is because we didn't import `GridSearchCV` directly today, we imported the whole `model_selection` module, so we have to use that in order to access the class." ] }, { "cell_type": "code", "execution_count": 15, "id": "c2d72dbb", "metadata": {}, "outputs": [], "source": [ "svm_clf = svm.SVC()\n", "param_grid = {'kernel':['linear','rbf'], 'C':[.5, .75,1,2,5,7, 10]}\n", "svm_opt = model_selection.GridSearchCV(svm_clf,param_grid,cv=10)" ] }, { "cell_type": "code", "execution_count": 16, "id": "db1fa584", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "module" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(model_selection)" ] }, { "cell_type": "code", "execution_count": 17, "id": "608d4625", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'scoring': None,\n", " 'estimator': DecisionTreeClassifier(),\n", " 'n_jobs': None,\n", " 'refit': True,\n", " 'cv': None,\n", " 'verbose': 0,\n", " 'pre_dispatch': '2*n_jobs',\n", " 'error_score': nan,\n", " 'return_train_score': False,\n", " 'param_grid': {'criterion': ['gini', 'entropy'],\n", " 'max_depth': [2, 3, 4],\n", " 'min_samples_leaf': [2, 4, 6, 8, 10, 12, 14, 16, 18]},\n", " 'multimetric_': False,\n", " 'best_index_': 9,\n", " 'best_score_': 0.9583333333333334,\n", " 'best_params_': {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 2},\n", " 'best_estimator_': DecisionTreeClassifier(max_depth=3, min_samples_leaf=2),\n", " 'refit_time_': 0.00036978721618652344,\n", " 'scorer_': ,\n", " 'cv_results_': {'mean_fit_time': array([0.00039306, 0.00034938, 0.00034437, 0.000349 , 0.00034513,\n", " 0.00034647, 0.00034838, 0.00034761, 0.00035162, 0.00038896,\n", " 0.00035572, 0.00034981, 0.00035625, 0.00034728, 0.00034947,\n", " 0.000349 , 0.00034842, 0.00034723, 0.00035934, 0.00035663,\n", " 0.00035768, 0.00035028, 0.00035329, 0.00034971, 0.00035257,\n", " 0.00035281, 0.0003489 , 0.00035853, 0.00035596, 0.00035648,\n", " 0.0003509 , 0.00035429, 0.00035157, 0.00035481, 0.000353 ,\n", " 0.000348 , 0.00036955, 0.00036836, 0.00036283, 0.00036693,\n", " 0.00035868, 0.00036645, 0.0003583 , 0.00035987, 0.00035768,\n", " 0.0003799 , 0.00037074, 0.00036554, 0.00036526, 0.00036535,\n", " 0.00036039, 0.00036349, 0.0003541 , 0.00035987]),\n", " 'std_fit_time': array([6.72941413e-05, 8.11268045e-06, 3.81648499e-06, 8.66270211e-06,\n", " 4.09525119e-06, 7.54458255e-06, 3.91587547e-06, 3.24809768e-06,\n", " 8.48636188e-06, 3.16120380e-05, 9.03854864e-06, 5.72204590e-07,\n", " 1.01033231e-05, 1.15430054e-06, 3.76187952e-06, 2.99839209e-06,\n", " 3.01050074e-06, 5.62304040e-06, 3.37646503e-06, 2.38609238e-06,\n", " 8.43530189e-06, 3.58548569e-06, 9.80059873e-06, 3.46618306e-06,\n", " 7.93385961e-06, 8.40289362e-06, 4.56769181e-06, 8.42451299e-06,\n", " 1.52289576e-06, 9.17140216e-06, 1.24709099e-06, 9.49133553e-06,\n", " 3.84911271e-06, 9.27249058e-06, 1.04203921e-05, 5.76164530e-07,\n", " 2.71839005e-06, 7.51680506e-06, 4.13338336e-06, 1.24590533e-05,\n", " 2.32430603e-06, 1.30709058e-05, 4.18259854e-06, 8.42208358e-06,\n", " 3.00218129e-06, 8.78804544e-06, 8.06545774e-06, 5.73474746e-06,\n", " 6.13435761e-06, 6.88794535e-06, 3.65208705e-06, 1.15504882e-05,\n", " 3.65768603e-06, 1.35648229e-05]),\n", " 'mean_score_time': array([0.00024743, 0.00022221, 0.00022173, 0.00022287, 0.00021906,\n", " 0.0002223 , 0.00025768, 0.00025663, 0.00026088, 0.00023623,\n", " 0.00022044, 0.00022583, 0.0002223 , 0.00022435, 0.00022206,\n", " 0.00022521, 0.0002264 , 0.00022044, 0.00022712, 0.00022411,\n", " 0.0002233 , 0.00022063, 0.00022211, 0.00022278, 0.00022445,\n", " 0.00022144, 0.00022278, 0.00022326, 0.00022783, 0.00021963,\n", " 0.00022578, 0.00021935, 0.00022659, 0.00022411, 0.00022068,\n", " 0.00022483, 0.00022292, 0.00022302, 0.00022135, 0.0002223 ,\n", " 0.00022497, 0.00022588, 0.00022087, 0.00022306, 0.00022216,\n", " 0.00022306, 0.00022664, 0.00022197, 0.00022216, 0.00022125,\n", " 0.00022173, 0.00022187, 0.00022726, 0.00022092]),\n", " 'std_score_time': array([4.32319092e-05, 3.19160472e-06, 3.24809768e-06, 3.58548569e-06,\n", " 8.03580262e-07, 3.42062119e-06, 1.14242063e-06, 2.86102295e-07,\n", " 6.98432161e-06, 1.09738869e-05, 9.84180805e-07, 7.53855268e-06,\n", " 3.78657946e-06, 8.77561758e-06, 3.19017957e-06, 1.06726340e-05,\n", " 1.00810187e-05, 1.04033586e-06, 8.75174819e-06, 5.33759511e-06,\n", " 6.38961792e-06, 1.78161065e-06, 2.86340614e-06, 2.90675176e-06,\n", " 3.54466773e-06, 1.61280961e-06, 3.74431046e-06, 4.77218475e-06,\n", " 7.51014740e-06, 1.01601008e-06, 8.88633093e-06, 5.43678010e-07,\n", " 6.88794535e-06, 8.95641852e-06, 8.86968386e-07, 8.29751462e-06,\n", " 1.60291064e-06, 1.86271906e-06, 1.24891289e-06, 1.30238536e-06,\n", " 9.11569983e-06, 9.07445041e-06, 9.36836372e-07, 2.85545443e-06,\n", " 5.09122765e-07, 2.26986508e-06, 7.90112121e-06, 1.28834306e-06,\n", " 2.66943646e-06, 1.16800773e-06, 7.83523403e-07, 1.68991519e-06,\n", " 8.43260596e-06, 1.16410786e-06]),\n", " 'param_criterion': masked_array(data=['gini', 'gini', 'gini', 'gini', 'gini', 'gini', 'gini',\n", " 'gini', 'gini', 'gini', 'gini', 'gini', 'gini', 'gini',\n", " 'gini', 'gini', 'gini', 'gini', 'gini', 'gini', 'gini',\n", " 'gini', 'gini', 'gini', 'gini', 'gini', 'gini',\n", " 'entropy', 'entropy', 'entropy', 'entropy', 'entropy',\n", " 'entropy', 'entropy', 'entropy', 'entropy', 'entropy',\n", " 'entropy', 'entropy', 'entropy', 'entropy', 'entropy',\n", " 'entropy', 'entropy', 'entropy', 'entropy', 'entropy',\n", " 'entropy', 'entropy', 'entropy', 'entropy', 'entropy',\n", " 'entropy', 'entropy'],\n", " mask=[False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False],\n", " fill_value='?',\n", " dtype=object),\n", " 'param_max_depth': masked_array(data=[2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3,\n", " 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2,\n", " 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4],\n", " mask=[False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False],\n", " fill_value='?',\n", " dtype=object),\n", " 'param_min_samples_leaf': masked_array(data=[2, 4, 6, 8, 10, 12, 14, 16, 18, 2, 4, 6, 8, 10, 12, 14,\n", " 16, 18, 2, 4, 6, 8, 10, 12, 14, 16, 18, 2, 4, 6, 8, 10,\n", " 12, 14, 16, 18, 2, 4, 6, 8, 10, 12, 14, 16, 18, 2, 4,\n", " 6, 8, 10, 12, 14, 16, 18],\n", " mask=[False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False, False, False,\n", " False, False, False, False, False, False],\n", " fill_value='?',\n", " dtype=object),\n", " 'params': [{'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 2},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 4},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 6},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 8},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 10},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 12},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 14},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 16},\n", " {'criterion': 'gini', 'max_depth': 2, 'min_samples_leaf': 18},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 2},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 4},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 6},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 8},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 10},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 12},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 14},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 16},\n", " {'criterion': 'gini', 'max_depth': 3, 'min_samples_leaf': 18},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 2},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 4},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 6},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 8},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 10},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 12},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 14},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 16},\n", " {'criterion': 'gini', 'max_depth': 4, 'min_samples_leaf': 18},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 2},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 4},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 6},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 8},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 10},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 12},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 14},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 16},\n", " {'criterion': 'entropy', 'max_depth': 2, 'min_samples_leaf': 18},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 2},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 4},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 6},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 8},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 10},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 12},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 14},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 16},\n", " {'criterion': 'entropy', 'max_depth': 3, 'min_samples_leaf': 18},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 2},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 4},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 6},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 8},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 10},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 12},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 14},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 16},\n", " {'criterion': 'entropy', 'max_depth': 4, 'min_samples_leaf': 18}],\n", " 'split0_test_score': array([0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333]),\n", " 'split1_test_score': array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,\n", " 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,\n", " 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,\n", " 1., 1., 1.]),\n", " 'split2_test_score': array([0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667]),\n", " 'split3_test_score': array([0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.95833333,\n", " 0.95833333, 0.95833333, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.95833333,\n", " 0.95833333, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.95833333, 0.95833333, 0.95833333, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667, 0.91666667,\n", " 0.91666667, 0.95833333, 0.95833333, 0.91666667, 0.91666667,\n", " 0.91666667, 0.91666667, 0.91666667, 0.91666667]),\n", " 'split4_test_score': array([0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 1. , 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333, 0.95833333,\n", " 0.95833333, 0.95833333, 0.95833333, 0.95833333]),\n", " 'mean_test_score': array([0.95 , 0.95 , 0.95 , 0.95 , 0.95 ,\n", " 0.95 , 0.95 , 0.95 , 0.95 , 0.95833333,\n", " 0.95833333, 0.95833333, 0.95 , 0.95 , 0.95 ,\n", " 0.95 , 0.95 , 0.95 , 0.95833333, 0.95833333,\n", " 0.95833333, 0.95 , 0.95 , 0.95 , 0.95 ,\n", " 0.95 , 0.95 , 0.95 , 0.95 , 0.95 ,\n", " 0.95 , 0.95 , 0.95 , 0.95 , 0.95 ,\n", " 0.95 , 0.95833333, 0.95833333, 0.95833333, 0.95 ,\n", " 0.95 , 0.95 , 0.95 , 0.95 , 0.95 ,\n", " 0.95 , 0.95833333, 0.95833333, 0.95 , 0.95 ,\n", " 0.95 , 0.95 , 0.95 , 0.95 ]),\n", " 'std_test_score': array([0.03118048, 0.03118048, 0.03118048, 0.03118048, 0.03118048,\n", " 0.03118048, 0.03118048, 0.03118048, 0.03118048, 0.02635231,\n", " 0.02635231, 0.02635231, 0.03118048, 0.03118048, 0.03118048,\n", " 0.03118048, 0.03118048, 0.03118048, 0.0372678 , 0.02635231,\n", " 0.02635231, 0.03118048, 0.03118048, 0.03118048, 0.03118048,\n", " 0.03118048, 0.03118048, 0.03118048, 0.03118048, 0.03118048,\n", " 0.03118048, 0.03118048, 0.03118048, 0.03118048, 0.03118048,\n", " 0.03118048, 0.02635231, 0.02635231, 0.02635231, 0.03118048,\n", " 0.03118048, 0.03118048, 0.03118048, 0.03118048, 0.03118048,\n", " 0.03118048, 0.02635231, 0.02635231, 0.03118048, 0.03118048,\n", " 0.03118048, 0.03118048, 0.03118048, 0.03118048]),\n", " 'rank_test_score': array([12, 12, 12, 12, 12, 12, 12, 12, 12, 1, 1, 1, 12, 12, 12, 12, 12,\n", " 12, 11, 1, 1, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12,\n", " 12, 12, 1, 1, 1, 12, 12, 12, 12, 12, 12, 12, 1, 1, 12, 12, 12,\n", " 12, 12, 12], dtype=int32)},\n", " 'n_splits_': 5}" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dt_opt.__dict__" ] }, { "cell_type": "markdown", "id": "1390ca8e", "metadata": {}, "source": [ "This doesn't have attributes yet, even though they are the same type, because we have not fit it tot data yet." ] }, { "cell_type": "code", "execution_count": 18, "id": "909d33b5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(sklearn.model_selection._search.GridSearchCV,\n", " sklearn.model_selection._search.GridSearchCV)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(svm_opt), type(dt_opt)" ] }, { "cell_type": "markdown", "id": "1102e164", "metadata": {}, "source": [ "Now we can fit the model to the training data of this second model." ] }, { "cell_type": "code", "execution_count": 19, "id": "6901d229", "metadata": {}, "outputs": [], "source": [ "# fit the model and put the CV results in a dataframe\n", "svm_opt.fit(iris_X_train,iris_y_train)\n", "sv_df = pd.DataFrame(svm_opt.cv_results_)" ] }, { "cell_type": "code", "execution_count": 20, "id": "912d4c56", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mean_fit_timestd_fit_timemean_score_timestd_score_timeparam_Cparam_kernelparamssplit0_test_scoresplit1_test_scoresplit2_test_scoresplit3_test_scoresplit4_test_scoresplit5_test_scoresplit6_test_scoresplit7_test_scoresplit8_test_scoresplit9_test_scoremean_test_scorestd_test_scorerank_test_score
00.0005080.0000570.0002720.0000160.5linear{'C': 0.5, 'kernel': 'linear'}1.01.0000001.0000001.01.01.0000000.9166670.9166671.0000001.00.9833330.0333331
10.0006060.0000090.0002970.0000080.5rbf{'C': 0.5, 'kernel': 'rbf'}1.00.9166670.9166671.01.00.9166670.9166670.9166670.9166671.00.9500000.04082514
\n", "
" ], "text/plain": [ " mean_fit_time std_fit_time mean_score_time std_score_time param_C \\\n", "0 0.000508 0.000057 0.000272 0.000016 0.5 \n", "1 0.000606 0.000009 0.000297 0.000008 0.5 \n", "\n", " param_kernel params split0_test_score \\\n", "0 linear {'C': 0.5, 'kernel': 'linear'} 1.0 \n", "1 rbf {'C': 0.5, 'kernel': 'rbf'} 1.0 \n", "\n", " split1_test_score split2_test_score split3_test_score split4_test_score \\\n", "0 1.000000 1.000000 1.0 1.0 \n", "1 0.916667 0.916667 1.0 1.0 \n", "\n", " split5_test_score split6_test_score split7_test_score split8_test_score \\\n", "0 1.000000 0.916667 0.916667 1.000000 \n", "1 0.916667 0.916667 0.916667 0.916667 \n", "\n", " split9_test_score mean_test_score std_test_score rank_test_score \n", "0 1.0 0.983333 0.033333 1 \n", "1 1.0 0.950000 0.040825 14 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sv_df.head(2)" ] }, { "cell_type": "code", "execution_count": 21, "id": "88013de9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-11-15_35_1.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.errorbar(x=sv_df['mean_fit_time'],xerr=sv_df['std_fit_time'],\n", " y=sv_df['mean_score_time'],yerr=sv_df['std_score_time'])" ] }, { "cell_type": "code", "execution_count": 22, "id": "974c9fb8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['mean_fit_time', 'std_fit_time', 'mean_score_time', 'std_score_time',\n", " 'param_C', 'param_kernel', 'params', 'split0_test_score',\n", " 'split1_test_score', 'split2_test_score', 'split3_test_score',\n", " 'split4_test_score', 'split5_test_score', 'split6_test_score',\n", " 'split7_test_score', 'split8_test_score', 'split9_test_score',\n", " 'mean_test_score', 'std_test_score', 'rank_test_score'],\n", " dtype='object')" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sv_df.columns" ] }, { "cell_type": "markdown", "id": "be40ab10", "metadata": {}, "source": [ "We can see if the models that take longer to fit or score perform better." ] }, { "cell_type": "code", "execution_count": 23, "id": "3f8723c4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-11-15_38_1.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "svm_time = sv_df.melt(id_vars=['param_C', 'param_kernel', 'params',],\n", " value_vars=['mean_fit_time', 'std_fit_time', 'mean_score_time', 'std_score_time'])\n", "sns.lmplot(data=sv_df, x='mean_fit_time',y='mean_test_score',\n", " hue='param_kernel',fit_reg=False)" ] }, { "cell_type": "markdown", "id": "46959e80", "metadata": {}, "source": [ "This looks like mostly no." ] }, { "cell_type": "code", "execution_count": 24, "id": "faa2a5f6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-11-15_40_1.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sns.lmplot(data=sv_df, x='mean_score_time',y='mean_test_score',\n", " hue='param_kernel',fit_reg=False)" ] }, { "cell_type": "markdown", "id": "ad6c3ffe", "metadata": {}, "source": [ "Again, for score time, the slower models don't appear to be better. Remember though the time differences weren't that different. \n", "\n", "```{admonition} Try it yourself\n", "Try this same analysis for the decision tree, does it matter there?\n", "```" ] }, { "cell_type": "code", "execution_count": 25, "id": "8bdf2e6a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
param_Cparam_kernelparamsvariablescore
00.5linear{'C': 0.5, 'kernel': 'linear'}split0_test_score1.0
10.5rbf{'C': 0.5, 'kernel': 'rbf'}split0_test_score1.0
20.75linear{'C': 0.75, 'kernel': 'linear'}split0_test_score1.0
30.75rbf{'C': 0.75, 'kernel': 'rbf'}split0_test_score1.0
41linear{'C': 1, 'kernel': 'linear'}split0_test_score1.0
\n", "
" ], "text/plain": [ " param_C param_kernel params variable \\\n", "0 0.5 linear {'C': 0.5, 'kernel': 'linear'} split0_test_score \n", "1 0.5 rbf {'C': 0.5, 'kernel': 'rbf'} split0_test_score \n", "2 0.75 linear {'C': 0.75, 'kernel': 'linear'} split0_test_score \n", "3 0.75 rbf {'C': 0.75, 'kernel': 'rbf'} split0_test_score \n", "4 1 linear {'C': 1, 'kernel': 'linear'} split0_test_score \n", "\n", " score \n", "0 1.0 \n", "1 1.0 \n", "2 1.0 \n", "3 1.0 \n", "4 1.0 " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sv_df_scores = sv_df.melt(id_vars=['param_C', 'param_kernel', 'params',],\n", " value_vars=['split0_test_score',\n", " 'split1_test_score', 'split2_test_score', 'split3_test_score',\n", " 'split4_test_score'], value_name='score')\n", "sv_df_scores.head()" ] }, { "cell_type": "code", "execution_count": 26, "id": "7d10a20a", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-11-15_43_1.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sns.catplot(data=sv_df_scores,x='param_C',y='score',\n", " col='param_kernel')" ] }, { "cell_type": "markdown", "id": "293ca1b4", "metadata": {}, "source": [ "```{admonition} Try it yourself\n", "Try interpretting the plot above, what does it say? what can you conclude from it. \n", "```" ] }, { "cell_type": "code", "execution_count": 27, "id": "89ae55e8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-11-15_45_1.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "dt_df['mean_test_score'].plot(kind='bar')" ] }, { "cell_type": "code", "execution_count": 28, "id": "cb49a6bb", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXgAAAD/CAYAAAD7X81yAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAR5ElEQVR4nO3df5BdZX3H8ffuRiiQqLgulYQUWiTfChU1QpEqWgeiDh1H/ElTa6zM4ATbUO3YVmmhCKPVkY5TNDbxZ4nU6IioFVKxzDj1R8dRh1ChyjcpgokJyrqiJmhT2d3+cc7qzbI/zu49m819eL9mMtn73Ge/+83N7uc+57nnnu0bHx9HklSe/sVuQJK0MAx4SSqUAS9JhTLgJalQBrwkFWrJYjdQOxI4E7gPGF3kXiSpVwwAxwNfBw5MvvNwCfgzgS8tdhOS1KPOAb48efBwCfj7AB544EHGxpqdlz84uJSRkf0L1pD1rX841ra+9Tv19/dx7LHHQJ2hkx0uAT8KMDY23jjgJ+YvJOtb/3CsbX3rT2HKre1ZAz4irgFeCpwEPDkz75xizgBwLfACYBx4e2Z+YK4dSpLa0+Qsmk8Dzwa+O8OcVwJPBE4BzgaujIiTum1OkjR/swZ8Zn45M3fPMu1C4P2ZOZaZw1RPCi9voT9J0jy1dR78b3DwCn8XsLKl2pKkeThcXmQFqleP52JoaNkCdWJ96x++ta1v/abaCvhdwIlUJ9vDw1f0jYyM7G/86vHQ0DKGh/fN9Us0Zn3rH461rW/9Tv39fTMujNsK+E8AF0fEjcAgcAHVifeSpEXS5DTJa4GXAE8Abo2Ikcw8LSK2AVdk5jeAjwBnATvrT7sqM+9po8FjHn0URx85dZtTHcb87MBDPPjTn7fxpVvR6/2rXL3+vdnr/R8KswZ8Zl4KXDrF+PkdH48Cl7TbWuXoI5fQ/8bPNp4/ds0LeXAhGpmnXu9f5er1781e7/9QOKxeZNXhp9dXSXPpf669L/Rj0+uPvWZ2KP5/DXjNqNdXSXPpf669L/Rj0+uPvWZ2KP5/Dfge1+urvF7vX9Pr9f/bXu8fDPie1+urvF7vX9Pr9f/bXu8f/I1OklSsR/wKvoTDMEmayiM+4Es4DJOkqbhFI0mFMuAlqVAGvCQVyoCXpEIZ8JJUKANekgplwEtSoQx4SSqUAS9JhTLgJalQBrwkFcqAl6RCGfCSVCgDXpIKZcBLUqEMeEkqlAEvSYUy4CWpUAa8JBXKgJekQhnwklQoA16SCmXAS1KhDHhJKpQBL0mFMuAlqVBLmkyKiFXAdcAgMAKsy8ydk+YcB3wYWAk8CvgCcGlmPtRqx5KkRpqu4DcBGzNzFbAR2DzFnMuAb2fm6cDpwNOBl7TSpSRpzmYN+HplvhrYWg9tBVZHxNCkqePAsojoB44EjgD2tNirJGkOmqzgVwJ7MnMUoP57bz3e6WpgFXAf8H3glsz8Sou9SpLmoNEefEMvB74JnAssA/4tIl6WmTc0LTA4uLSVRoaGlrVSx/qPvPq93Lv1rT9Zk4DfDayIiIHMHI2IAWB5Pd5pA3BRZo4BP4mIzwDPBRoH/MjIfsbGxg8am88DNjy8r/Fc61v/UNW2vvXbrt/f3zfjwnjWLZrMvB+4HVhbD60Ftmfm8KSp9wAvAIiII4DzgDsb9i1JalnTs2jWAxsiYgfVSn09QERsi4gz6jmvB86JiDuonhB2AO9vtVtJUmON9uAz8y7grCnGz+/4+G5gTXutSZK64TtZJalQBrwkFcqAl6RCGfCSVCgDXpIKZcBLUqEMeEkqlAEvSYUy4CWpUAa8JBXKgJekQhnwklQoA16SCmXAS1KhDHhJKpQBL0mFMuAlqVAGvCQVyoCXpEIZ8JJUKANekgplwEtSoQx4SSqUAS9JhTLgJalQBrwkFcqAl6RCGfCSVCgDXpIKZcBLUqEMeEkqlAEvSYUy4CWpUEuaTIqIVcB1wCAwAqzLzJ1TzHsFcDnQB4wD52XmD9prV5LUVNMV/CZgY2auAjYCmydPiIgzgCuBNZn5O8CzgJ+01KckaY5mDfiIOA5YDWyth7YCqyNiaNLUNwDXZOb3ATLzJ5n5v202K0lqrskWzUpgT2aOAmTmaETsrceHO+adCtwTEV8ElgI3Am/NzPGWe5YkNdBoD76hAeB0YA1wBPA5YBewpWmBwcGlrTQyNLSslTrWf+TV7+XerW/9yZoE/G5gRUQM1Kv3AWB5Pd5pF3BDZh4ADkTEZ4DfZQ4BPzKyn7Gxgxf883nAhof3NZ5rfesfqtrWt37b9fv7+2ZcGM+6B5+Z9wO3A2vrobXA9swcnjT1o8DzIqIvIh4FnAv8V+POJUmtanoWzXpgQ0TsADbUt4mIbfXZMwAfA+4HvkX1hPDfwAdb7VaS1FijPfjMvAs4a4rx8zs+HgP+ov4jSVpkvpNVkgplwEtSoQx4SSqUAS9JhTLgJalQBrwkFcqAl6RCGfCSVCgDXpIKZcBLUqEMeEkqlAEvSYUy4CWpUAa8JBXKgJekQhnwklQoA16SCmXAS1KhDHhJKpQBL0mFMuAlqVAGvCQVyoCXpEIZ8JJUKANekgplwEtSoQx4SSqUAS9JhTLgJalQBrwkFcqAl6RCGfCSVCgDXpIKtaTJpIhYBVwHDAIjwLrM3DnN3AC2A+/NzDe21agkaW6aruA3ARszcxWwEdg81aSIGKjv+3Qr3UmS5m3WgI+I44DVwNZ6aCuwOiKGppj+JuAmYEdrHUqS5qXJCn4lsCczRwHqv/fW478UEU8Bng+8q+0mJUlz12gPfjYR8SjgfcBrMnO02oafu8HBpW20w9DQslbqWP+RV7+Xe7e+9SdrEvC7gRURMVCH9wCwvB6fcDxwMrCtDvfHAn0R8ejMfG3TZkZG9jM2Nn7Q2HwesOHhfY3nWt/6h6q29a3fdv3+/r4ZF8azBnxm3h8RtwNrgevrv7dn5nDHnF3A4yduR8SVwFLPopGkxdP0LJr1wIaI2AFsqG8TEdsi4oyFak6SNH+N9uAz8y7grCnGz59m/pXdtSVJ6pbvZJWkQhnwklQoA16SCmXAS1KhDHhJKpQBL0mFMuAlqVAGvCQVyoCXpEIZ8JJUKANekgplwEtSoQx4SSqUAS9JhTLgJalQBrwkFcqAl6RCGfCSVCgDXpIKZcBLUqEMeEkqlAEvSYUy4CWpUAa8JBXKgJekQhnwklQoA16SCmXAS1KhDHhJKpQBL0mFMuAlqVAGvCQVyoCXpEIZ8JJUqCVNJkXEKuA6YBAYAdZl5s5Jcy4H/hAYBX4BXJaZt7TbriSpqaYr+E3AxsxcBWwENk8x52vAmZl5OnAR8PGIOKqdNiVJczVrwEfEccBqYGs9tBVYHRFDnfMy85bM/Fl985tAH9WKX5K0CJps0awE9mTmKEBmjkbE3np8eJrPWQfcnZnfm0szg4NL5zJ9WkNDy1qpY/1HXv1e7t361p+s0R78XETEc4CrgTVz/dyRkf2MjY0fNDafB2x4eF/juda3/qGqbX3rt12/v79vxoVxkz343cCKiBgAqP9eXo8fJCLOBq4HLsjMbN62JKltswZ8Zt4P3A6srYfWAtsz86DtmYg4E/g48LLMvK3lPiVJc9R0i2Y9cF1EXAE8QLXHTkRsA67IzG8A7wWOAjZHxMTnvSoz72i3ZUlSE40CPjPvAs6aYvz8jo/PbLEvSVKXfCerJBXKgJekQhnwklQoA16SCmXAS1KhDHhJKpQBL0mFMuAlqVAGvCQVyoCXpEIZ8JJUKANekgplwEtSoQx4SSqUAS9JhTLgJalQBrwkFcqAl6RCGfCSVCgDXpIKZcBLUqEMeEkqlAEvSYUy4CWpUAa8JBXKgJekQhnwklQoA16SCmXAS1KhDHhJKpQBL0mFMuAlqVAGvCQVakmTSRGxCrgOGARGgHWZuXPSnAHgWuAFwDjw9sz8QLvtSpKaarqC3wRszMxVwEZg8xRzXgk8ETgFOBu4MiJOaqNJSdLczbqCj4jjgNXAmnpoK/CeiBjKzOGOqRcC78/MMWA4Ij4NvBx4Z4M+BgD6+/umvPPEY49qUOJXpqszHetb/1DUtr71267fcXtgqvl94+PjMxaMiKcDWzLztI6xbwF/nJm3dYzdAVyUmV+vb/8VcEJmXtqg72cBX2owT5L0cOcAX5482GgP/hD4OlWD9wGji9yLJPWKAeB4qgx9mCYBvxtYEREDmTlav5i6vB7vtAs4seML/Qbw3YZNHmCKZx9J0qzunu6OWV9kzcz7gduBtfXQWmD7pP13gE8AF0dEf0QMARcAN8ynW0lS95qeRbMe2BARO4AN9W0iYltEnFHP+QjwHWAn8FXgqsy8p+V+JUkNzfoiqySpN/lOVkkqlAEvSYUy4CWpUAa8JBXKgJekQh0u72RtJCIGgZX1zd2ZObKY/RzOIuLYzHxggWqfl5m3LkTthRYRS4FVwP9k5k8Xu5+5ioijgScBd2fmj1us+ziqNyc+VNf+eVu1tXh64jTJiDgZeB/VRc/21sPLgduA9ZMvXXy4qZ+Y3kH1A/SZzNzYcd8nM/OlXdZ/CvAhqss8vBq4Bngu1aWdX5iZt3dR+9Qphm8Bngf0Zea35lu7rr8mM/+9/vgxwHuA36N6c93rMvMHXdbfBFyemcMR8UzgRuCHwBDV9ZQ+32X9HwIfBT7UzeM8Q/0XU12qey+wjuoNhQ8CxwGvyczPdln/RKqrxT6f6jLfPwaOAv4JeHNm/l839bW4emWLZgtVgA1m5mn1hc8GgQ/X9y2Y+iJq3doM/IjqB+mCiLgxIiaOnn6rhfrXAm+hCsfPAR/NzKOB11GFfTfuBG4Cbu748wRgWz3erXd0fPxWYB/wIuAuqn9Xt87ueNf11VRPeKdRXeDubS3U30f1xPr5iLgtIv4sIo5toe6EK4BnAq+leszXZuapVNduuqqF+v8MXE/18/R6qu+hk4DHAO9qoT5QLXIi4qn1n8G26paoze+fXtmiGczMf+kcqC9LfH1E/G23xadZpf7ya3dbHzglM19Wf61PUf0Q3RQRF7RQG2BZZv5rXf/qiccqMz8bEd2GwFuAs6iOlHbVX+OezPzNLutO6Lz+6bOAMzPzF8DftPTk2nk91mWZ+TWAzNwREUe0UP+BzHxDffXUFwGvAf4+Im4GPjhxdNKNzLwDICL2Z+Z/1mPfjohuSwM8ruNn690R8bXM/LuIeC2Q3Raf7ug7Ihb86Dsi7sjMJ3dZY1GOviOi66Nv6J2A/1FErAU+lpnjABHRB/wR1SFlt+4E7uXgsJnw+Bbq/zJI6v7/NCLeSbUa/rUW6nf2PXnLoaujtMx8S0Q8DfhYRGzJzE1Uh/JtOTIinkT1bxivw31CG1cWvTUi/gG4HPhCRFyYmR+PiDVUW1itqPu+AbghIpYDfwK8G/jtLkuP14/PY4FjIuIZmfnV+resTXkN8Dl6KCJOzsy760uDH4BqARURv5jlc5vYArwXWFMvyoiIfqqf3S1Uvxxo3g7B4mwz1SVYtgGXRMS5wCsy8yHaPfp+LNXR92WZ+QcR8UKqsD+vm+K9EvCvpv6tUhGxpx5bQbVP++oW6t8LnJOZeybfERGTr5o5H9+JiGdn5hcnBjLzLyPibcBft1D/3ohYlpn7MvPiicGIOAH4WbfFM3N7RPw+cFVE3ErHE1YLjqZ6ousDiIgVmbknIh4NjLVQ/w1Uv3RmD1WgvzEitgBfAC5qof7DFgWZuZdq+6eNLaArgK9QPdldCFwdEccDJwCXtFT/qxHxfaqttwsBIuLX66/brQU9+mbhF2e9fPTdGwFfH8adW1+lsvMsmslXtJyvT1Jd6vhhAU/1oly3XsUUq97MvCwiru+2eGa+eJq7HqDaNuha/WLbmyLiGcBz2qhZ1z1pmrseAro6/K3rHwAujYg3AydTrXp3tXgG1gUt1ZlSZt4EPG7idkT8B/BU4HvdvgBd1785Ik6h+nWbOybOLKprXzzjJzez0Eff97Kwi7OePfqGHjmLRlJvqp88NgFP41cLqImj70sys6t9/jpsPzXx2sSk+/4xM/+8y/o3A+/oPPqux98GvCkzuwrh+qhgXWbumzR+AvCJzOxqC8uAl7TgFvDoe0HV7w8Yn+o9JRFxarenCc/wdY8Bju72cTLgJS2KNs5ysf7MemIPXlJvmuEslz5aOMtloc+i6fX6BrykhbTQZ7lYfwYGvKSFdC8Le5aL9WfQK5cqkNSbJk5BnkobpyBbfwa+yCpJhXIFL0mFMuAlqVAGvCQVyoCXpEIZ8JJUqP8HVPmeHgk+D/MAAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": { "filenames": { "image/png": "/home/runner/work/BrownFall21/BrownFall21/_build/jupyter_execute/notes/2021-11-15_46_1.png" }, "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "sv_df['mean_test_score'].plot(kind='bar')" ] }, { "cell_type": "markdown", "id": "bcc1cc9a", "metadata": {}, "source": [ "From these last two plots we see that the SVM performance is more sensitive to its parameters, where for the parameters tested, the decision tree is not impacted. \n", "\n", "What can we say based on this? We'll pick up from here on Wednesday." ] } ], "metadata": { "jupytext": { "text_representation": { "extension": ".md", "format_name": "myst", "format_version": 0.13, "jupytext_version": "1.10.3" } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" }, "source_map": [ 12, 16, 20, 32, 36, 38, 42, 45, 49, 53, 59, 61, 65, 67, 71, 73, 77, 80, 88, 96, 104, 112, 116, 120, 122, 126, 130, 134, 140, 144, 146, 150, 152, 156, 162, 166, 171, 173, 177, 182, 186, 189, 197, 206, 209, 215, 219, 221 ] }, "nbformat": 4, "nbformat_minor": 5 }