Neural Networks
Contents
32. Neural Networks#
We started thinking about machine learning wiht the idea that the basic idea is that we assume that our target variable (\(y_i\)) is related to the features \(\mathbf{x}_i\) by some function (for sample \(i\)):
But we don’t know that function exactly, so we assume a type of \(f\) (a decision tree, a boundary for SVM, a propbility distribuiton) that has some paramters \(\theta\) and then use a machine learning algorithm \(\mathcal{A}\) to estimate the parameters for \(f\). In the decision tree the parameters are the thresholds to compare to, in the GaussianNB the parameters are the mean and variance, in SVM it’s the support vecotrs that define the margin.
That we can use to test on our test data:
A neural net allows us to not assume a specific form for \(f\) first, it does universal funciton approximation. For one hidden layer and a binary classification problem:
where the function \(g\) is called the activation function. so we approximate some unkown, complicated function $f4 by taking a weighted sum of all of the inputs, and passing those through another, known function.
from sklearn.neural_network import MLPClassifier
from sklearn import svm
import pandas as pd
import sklearn
from sklearn import datasets
import matplotlib.pyplot as plt
from sklearn import model_selection
import numpy as np
We’re going to use the digits dataset again.
digits = datasets.load_digits()
digits_X = digits.data
digits_y = digits.target
X_train, X_test, y_train, y_test = model_selection.train_test_split(digits_X,digits_y)
digits.images[0]
array([[ 0., 0., 5., 13., 9., 1., 0., 0.],
[ 0., 0., 13., 15., 10., 15., 5., 0.],
[ 0., 3., 15., 2., 0., 11., 8., 0.],
[ 0., 4., 12., 0., 0., 8., 8., 0.],
[ 0., 5., 8., 0., 0., 9., 8., 0.],
[ 0., 4., 11., 0., 1., 12., 7., 0.],
[ 0., 2., 14., 5., 10., 12., 0., 0.],
[ 0., 0., 6., 13., 10., 0., 0., 0.]])
Sklearn provides an estimator for the MLP. We can see one with one layer to start.
mlp = MLPClassifier(
hidden_layer_sizes=(16),
max_iter=500,
alpha=1e-4,
solver="lbfgs",
verbose=10,
random_state=1,
learning_rate_init=0.1,
)
mlp.fit(X_train,y_train)
mlp.score(X_test,y_test)
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 1210 M = 10
At X0 0 variables are exactly at the bounds
At iterate 0 f= 9.27227D+00 |proj g|= 7.08994D+00
At iterate 1 f= 8.04002D+00 |proj g|= 7.04491D+00
At iterate 2 f= 3.12608D+00 |proj g|= 1.80942D+00
At iterate 3 f= 2.36715D+00 |proj g|= 3.52156D-01
At iterate 4 f= 2.26964D+00 |proj g|= 2.27675D-01
At iterate 5 f= 2.11984D+00 |proj g|= 2.55799D-01
At iterate 6 f= 1.99167D+00 |proj g|= 3.07291D-01
At iterate 7 f= 1.75901D+00 |proj g|= 6.61113D-01
At iterate 8 f= 1.64046D+00 |proj g|= 2.77600D-01
At iterate 9 f= 1.54232D+00 |proj g|= 5.85372D-01
At iterate 10 f= 1.46332D+00 |proj g|= 5.12800D-01
At iterate 11 f= 1.39561D+00 |proj g|= 2.99775D-01
At iterate 12 f= 1.30516D+00 |proj g|= 4.77882D-01
At iterate 13 f= 1.20060D+00 |proj g|= 9.86685D-01
At iterate 14 f= 1.10185D+00 |proj g|= 2.02083D-01
At iterate 15 f= 1.06560D+00 |proj g|= 1.90838D-01
At iterate 16 f= 1.02256D+00 |proj g|= 2.94326D-01
At iterate 17 f= 9.74097D-01 |proj g|= 3.44708D-01
At iterate 18 f= 9.31800D-01 |proj g|= 3.09162D-01
At iterate 19 f= 8.92911D-01 |proj g|= 1.98347D-01
At iterate 20 f= 8.56766D-01 |proj g|= 3.23628D-01
At iterate 21 f= 8.16915D-01 |proj g|= 3.39130D-01
At iterate 22 f= 7.74226D-01 |proj g|= 2.83939D-01
At iterate 23 f= 7.15964D-01 |proj g|= 2.69266D-01
At iterate 24 f= 6.93035D-01 |proj g|= 4.95794D-01
At iterate 25 f= 6.67817D-01 |proj g|= 6.30967D-01
At iterate 26 f= 6.47295D-01 |proj g|= 1.93662D-01
At iterate 27 f= 6.32798D-01 |proj g|= 2.14150D-01
At iterate 28 f= 6.05131D-01 |proj g|= 3.51393D-01
At iterate 29 f= 5.67418D-01 |proj g|= 1.88472D-01
At iterate 30 f= 5.48979D-01 |proj g|= 1.64330D-01
At iterate 31 f= 5.33075D-01 |proj g|= 1.75119D-01
At iterate 32 f= 5.10893D-01 |proj g|= 1.28782D-01
At iterate 33 f= 4.93290D-01 |proj g|= 1.27263D-01
At iterate 34 f= 4.71165D-01 |proj g|= 2.22851D-01
At iterate 35 f= 4.44793D-01 |proj g|= 1.44018D-01
At iterate 36 f= 4.32206D-01 |proj g|= 3.42970D-01
At iterate 37 f= 4.24355D-01 |proj g|= 2.62493D-01
At iterate 38 f= 4.16094D-01 |proj g|= 1.49284D-01
At iterate 39 f= 4.08195D-01 |proj g|= 7.78993D-02
At iterate 40 f= 3.99109D-01 |proj g|= 1.68352D-01
At iterate 41 f= 3.81397D-01 |proj g|= 3.01805D-01
At iterate 42 f= 3.72741D-01 |proj g|= 3.05747D-01
At iterate 43 f= 3.59836D-01 |proj g|= 1.94766D-01
At iterate 44 f= 3.49103D-01 |proj g|= 1.06654D-01
At iterate 45 f= 3.42337D-01 |proj g|= 1.63839D-01
At iterate 46 f= 3.35054D-01 |proj g|= 3.14030D-01
At iterate 47 f= 3.17285D-01 |proj g|= 2.06630D-01
At iterate 48 f= 3.03850D-01 |proj g|= 3.52371D-01
At iterate 49 f= 2.92263D-01 |proj g|= 2.73907D-01
At iterate 50 f= 2.86569D-01 |proj g|= 1.01353D-01
At iterate 51 f= 2.79707D-01 |proj g|= 6.83841D-02
At iterate 52 f= 2.71333D-01 |proj g|= 1.49279D-01
At iterate 53 f= 2.59688D-01 |proj g|= 1.46332D-01
At iterate 54 f= 2.50517D-01 |proj g|= 1.79559D-01
At iterate 55 f= 2.42272D-01 |proj g|= 1.48664D-01
At iterate 56 f= 2.34697D-01 |proj g|= 1.11852D-01
At iterate 57 f= 2.29063D-01 |proj g|= 1.11031D-01
At iterate 58 f= 2.21501D-01 |proj g|= 1.53909D-01
At iterate 59 f= 2.18002D-01 |proj g|= 2.33666D-01
At iterate 60 f= 2.08325D-01 |proj g|= 1.10925D-01
At iterate 61 f= 2.00121D-01 |proj g|= 1.26804D-01
At iterate 62 f= 1.96985D-01 |proj g|= 1.29513D-01
At iterate 63 f= 1.90568D-01 |proj g|= 1.16511D-01
At iterate 64 f= 1.86315D-01 |proj g|= 5.74096D-02
At iterate 65 f= 1.81652D-01 |proj g|= 8.12195D-02
At iterate 66 f= 1.74063D-01 |proj g|= 9.25730D-02
At iterate 67 f= 1.71796D-01 |proj g|= 2.72776D-01
At iterate 68 f= 1.63014D-01 |proj g|= 8.02688D-02
At iterate 69 f= 1.59663D-01 |proj g|= 4.76894D-02
At iterate 70 f= 1.55768D-01 |proj g|= 9.76366D-02
At iterate 71 f= 1.49661D-01 |proj g|= 1.13090D-01
At iterate 72 f= 1.42819D-01 |proj g|= 8.38093D-02
At iterate 73 f= 1.35735D-01 |proj g|= 7.86675D-02
At iterate 74 f= 1.32996D-01 |proj g|= 4.45142D-02
At iterate 75 f= 1.28996D-01 |proj g|= 6.60008D-02
At iterate 76 f= 1.26060D-01 |proj g|= 1.47951D-01
At iterate 77 f= 1.23559D-01 |proj g|= 5.20657D-02
At iterate 78 f= 1.21496D-01 |proj g|= 5.51397D-02
At iterate 79 f= 1.19065D-01 |proj g|= 6.31317D-02
At iterate 80 f= 1.17018D-01 |proj g|= 1.29986D-01
At iterate 81 f= 1.14221D-01 |proj g|= 3.39163D-02
At iterate 82 f= 1.13193D-01 |proj g|= 2.04129D-02
At iterate 83 f= 1.11959D-01 |proj g|= 3.56325D-02
At iterate 84 f= 1.09345D-01 |proj g|= 5.76385D-02
At iterate 85 f= 1.08258D-01 |proj g|= 5.67550D-02
At iterate 86 f= 1.06611D-01 |proj g|= 2.72224D-02
At iterate 87 f= 1.04447D-01 |proj g|= 2.34475D-02
At iterate 88 f= 1.02991D-01 |proj g|= 5.61546D-02
At iterate 89 f= 1.01380D-01 |proj g|= 3.38740D-02
At iterate 90 f= 9.98791D-02 |proj g|= 2.72291D-02
At iterate 91 f= 9.83953D-02 |proj g|= 2.70486D-02
At iterate 92 f= 9.60107D-02 |proj g|= 3.12134D-02
At iterate 93 f= 9.54945D-02 |proj g|= 1.62172D-01
At iterate 94 f= 9.23505D-02 |proj g|= 5.22738D-02
At iterate 95 f= 9.10973D-02 |proj g|= 1.34185D-02
At iterate 96 f= 9.01023D-02 |proj g|= 2.44081D-02
At iterate 97 f= 8.94528D-02 |proj g|= 4.28894D-02
At iterate 98 f= 8.91429D-02 |proj g|= 1.32633D-01
At iterate 99 f= 8.71876D-02 |proj g|= 4.28060D-02
At iterate 100 f= 8.62915D-02 |proj g|= 2.68617D-02
At iterate 101 f= 8.54859D-02 |proj g|= 7.98872D-02
At iterate 102 f= 8.47415D-02 |proj g|= 4.80975D-02
At iterate 103 f= 8.36305D-02 |proj g|= 3.64790D-02
At iterate 104 f= 8.19889D-02 |proj g|= 4.83787D-02
At iterate 105 f= 8.07567D-02 |proj g|= 5.13474D-02
At iterate 106 f= 7.96292D-02 |proj g|= 2.83173D-02
At iterate 107 f= 7.84190D-02 |proj g|= 1.91186D-02
At iterate 108 f= 7.76413D-02 |proj g|= 4.47166D-02
At iterate 109 f= 7.67706D-02 |proj g|= 2.18632D-02
At iterate 110 f= 7.58616D-02 |proj g|= 2.18128D-02
At iterate 111 f= 7.46519D-02 |proj g|= 4.85600D-02
At iterate 112 f= 7.36771D-02 |proj g|= 3.04623D-02
At iterate 113 f= 7.30455D-02 |proj g|= 2.24475D-02
At iterate 114 f= 7.17813D-02 |proj g|= 2.91206D-02
At iterate 115 f= 7.04006D-02 |proj g|= 5.84755D-02
At iterate 116 f= 6.93231D-02 |proj g|= 7.03734D-02
At iterate 117 f= 6.84442D-02 |proj g|= 2.08775D-02
At iterate 118 f= 6.76496D-02 |proj g|= 1.85692D-02
At iterate 119 f= 6.69563D-02 |proj g|= 4.69983D-02
At iterate 120 f= 6.62496D-02 |proj g|= 5.53999D-02
At iterate 121 f= 6.47823D-02 |proj g|= 4.18537D-02
At iterate 122 f= 6.37685D-02 |proj g|= 2.60466D-02
At iterate 123 f= 6.28934D-02 |proj g|= 2.52331D-02
At iterate 124 f= 6.13211D-02 |proj g|= 1.95883D-02
At iterate 125 f= 6.02689D-02 |proj g|= 4.06410D-02
At iterate 126 f= 5.91601D-02 |proj g|= 6.48052D-02
This problem is unconstrained.
At iterate 127 f= 5.76191D-02 |proj g|= 1.98001D-02
At iterate 128 f= 5.65497D-02 |proj g|= 5.81905D-02
At iterate 129 f= 5.57589D-02 |proj g|= 5.80664D-02
At iterate 130 f= 5.47377D-02 |proj g|= 3.60690D-02
At iterate 131 f= 5.38437D-02 |proj g|= 2.23770D-02
At iterate 132 f= 5.33311D-02 |proj g|= 2.95811D-02
At iterate 133 f= 5.25948D-02 |proj g|= 2.56814D-02
At iterate 134 f= 5.14684D-02 |proj g|= 4.74849D-02
At iterate 135 f= 5.08080D-02 |proj g|= 6.91872D-02
At iterate 136 f= 4.91555D-02 |proj g|= 2.71101D-02
At iterate 137 f= 4.82501D-02 |proj g|= 1.53895D-02
At iterate 138 f= 4.76426D-02 |proj g|= 1.19551D-02
At iterate 139 f= 4.68955D-02 |proj g|= 5.69765D-02
At iterate 140 f= 4.61671D-02 |proj g|= 2.33999D-02
At iterate 141 f= 4.57522D-02 |proj g|= 2.28369D-02
At iterate 142 f= 4.54256D-02 |proj g|= 1.47352D-02
At iterate 143 f= 4.50623D-02 |proj g|= 1.95392D-02
At iterate 144 f= 4.45685D-02 |proj g|= 2.55158D-02
At iterate 145 f= 4.42464D-02 |proj g|= 1.39652D-02
At iterate 146 f= 4.36814D-02 |proj g|= 1.65401D-02
At iterate 147 f= 4.33323D-02 |proj g|= 2.89405D-02
At iterate 148 f= 4.27415D-02 |proj g|= 2.32926D-02
At iterate 149 f= 4.22661D-02 |proj g|= 1.40558D-02
At iterate 150 f= 4.15041D-02 |proj g|= 2.78223D-02
At iterate 151 f= 4.10327D-02 |proj g|= 3.42134D-02
At iterate 152 f= 4.04825D-02 |proj g|= 1.46247D-02
At iterate 153 f= 3.99688D-02 |proj g|= 1.50799D-02
At iterate 154 f= 3.92599D-02 |proj g|= 3.03954D-02
At iterate 155 f= 3.87325D-02 |proj g|= 3.94113D-02
At iterate 156 f= 3.82362D-02 |proj g|= 1.92568D-02
At iterate 157 f= 3.78261D-02 |proj g|= 2.52130D-02
At iterate 158 f= 3.74526D-02 |proj g|= 1.04380D-02
At iterate 159 f= 3.71541D-02 |proj g|= 7.46592D-03
At iterate 160 f= 3.66250D-02 |proj g|= 3.09476D-02
At iterate 161 f= 3.63506D-02 |proj g|= 1.77655D-02
At iterate 162 f= 3.61100D-02 |proj g|= 9.33842D-03
At iterate 163 f= 3.58422D-02 |proj g|= 7.39890D-03
At iterate 164 f= 3.55588D-02 |proj g|= 2.92890D-02
At iterate 165 f= 3.52234D-02 |proj g|= 2.46712D-02
At iterate 166 f= 3.49777D-02 |proj g|= 1.39711D-02
At iterate 167 f= 3.45592D-02 |proj g|= 2.01480D-02
At iterate 168 f= 3.43682D-02 |proj g|= 7.03501D-02
At iterate 169 f= 3.39705D-02 |proj g|= 1.45501D-02
At iterate 170 f= 3.37866D-02 |proj g|= 1.00890D-02
At iterate 171 f= 3.35529D-02 |proj g|= 1.24861D-02
At iterate 172 f= 3.28207D-02 |proj g|= 3.24613D-02
At iterate 173 f= 3.25657D-02 |proj g|= 4.10267D-02
At iterate 174 f= 3.21485D-02 |proj g|= 1.85525D-02
At iterate 175 f= 3.15765D-02 |proj g|= 2.66676D-02
At iterate 176 f= 3.12651D-02 |proj g|= 5.69036D-02
At iterate 177 f= 3.07117D-02 |proj g|= 1.81783D-02
At iterate 178 f= 3.04437D-02 |proj g|= 1.15245D-02
At iterate 179 f= 2.98556D-02 |proj g|= 1.64261D-02
At iterate 180 f= 2.90769D-02 |proj g|= 2.22674D-02
At iterate 181 f= 2.89402D-02 |proj g|= 1.80972D-02
At iterate 182 f= 2.85405D-02 |proj g|= 1.21924D-02
At iterate 183 f= 2.81243D-02 |proj g|= 9.04513D-03
At iterate 184 f= 2.79002D-02 |proj g|= 4.38765D-02
At iterate 185 f= 2.75576D-02 |proj g|= 1.87747D-02
At iterate 186 f= 2.73610D-02 |proj g|= 1.85221D-02
At iterate 187 f= 2.67180D-02 |proj g|= 2.73868D-02
At iterate 188 f= 2.63216D-02 |proj g|= 2.76574D-02
At iterate 189 f= 2.61515D-02 |proj g|= 3.24705D-02
At iterate 190 f= 2.59423D-02 |proj g|= 1.59497D-02
At iterate 191 f= 2.54613D-02 |proj g|= 5.09686D-02
At iterate 192 f= 2.53987D-02 |proj g|= 6.70033D-02
At iterate 193 f= 2.50044D-02 |proj g|= 3.72615D-02
At iterate 194 f= 2.44712D-02 |proj g|= 1.94114D-02
At iterate 195 f= 2.37981D-02 |proj g|= 1.09713D-02
At iterate 196 f= 2.34430D-02 |proj g|= 7.57128D-03
At iterate 197 f= 2.31325D-02 |proj g|= 8.49478D-03
At iterate 198 f= 2.29015D-02 |proj g|= 1.37916D-02
At iterate 199 f= 2.26737D-02 |proj g|= 7.24115D-03
At iterate 200 f= 2.24585D-02 |proj g|= 6.13742D-03
At iterate 201 f= 2.21852D-02 |proj g|= 5.87103D-03
At iterate 202 f= 2.18573D-02 |proj g|= 9.87964D-03
At iterate 203 f= 2.15983D-02 |proj g|= 2.03374D-02
At iterate 204 f= 2.14041D-02 |proj g|= 5.84013D-03
At iterate 205 f= 2.12036D-02 |proj g|= 3.22615D-03
At iterate 206 f= 2.09255D-02 |proj g|= 4.57507D-03
At iterate 207 f= 2.08219D-02 |proj g|= 1.18319D-02
At iterate 208 f= 2.06510D-02 |proj g|= 5.55209D-03
At iterate 209 f= 2.05304D-02 |proj g|= 4.83881D-03
At iterate 210 f= 2.04403D-02 |proj g|= 5.98367D-03
At iterate 211 f= 2.03287D-02 |proj g|= 7.32337D-03
At iterate 212 f= 2.02253D-02 |proj g|= 5.00339D-03
At iterate 213 f= 2.00541D-02 |proj g|= 3.74653D-03
At iterate 214 f= 1.98635D-02 |proj g|= 2.71538D-03
At iterate 215 f= 1.97466D-02 |proj g|= 1.10610D-02
At iterate 216 f= 1.95787D-02 |proj g|= 5.29639D-03
At iterate 217 f= 1.94349D-02 |proj g|= 6.25969D-03
At iterate 218 f= 1.93098D-02 |proj g|= 4.20529D-03
At iterate 219 f= 1.92030D-02 |proj g|= 6.48101D-03
At iterate 220 f= 1.90631D-02 |proj g|= 6.11516D-03
At iterate 221 f= 1.88764D-02 |proj g|= 3.63345D-03
At iterate 222 f= 1.87004D-02 |proj g|= 2.82303D-03
At iterate 223 f= 1.85938D-02 |proj g|= 2.32498D-03
At iterate 224 f= 1.84318D-02 |proj g|= 1.00395D-02
At iterate 225 f= 1.83034D-02 |proj g|= 3.02143D-03
At iterate 226 f= 1.82256D-02 |proj g|= 2.83995D-03
At iterate 227 f= 1.80693D-02 |proj g|= 8.60365D-03
At iterate 228 f= 1.79449D-02 |proj g|= 9.86598D-03
At iterate 229 f= 1.78344D-02 |proj g|= 4.86177D-03
At iterate 230 f= 1.77450D-02 |proj g|= 2.64182D-03
At iterate 231 f= 1.75892D-02 |proj g|= 7.08056D-03
At iterate 232 f= 1.75129D-02 |proj g|= 6.44009D-03
At iterate 233 f= 1.74199D-02 |proj g|= 2.57001D-03
At iterate 234 f= 1.73812D-02 |proj g|= 1.66485D-03
At iterate 235 f= 1.73200D-02 |proj g|= 8.24379D-03
At iterate 236 f= 1.72320D-02 |proj g|= 2.72338D-03
At iterate 237 f= 1.71507D-02 |proj g|= 2.68199D-03
At iterate 238 f= 1.71202D-02 |proj g|= 3.38923D-03
At iterate 239 f= 1.70320D-02 |proj g|= 4.43150D-03
At iterate 240 f= 1.69531D-02 |proj g|= 5.73043D-03
At iterate 241 f= 1.68900D-02 |proj g|= 1.96023D-03
At iterate 242 f= 1.68107D-02 |proj g|= 3.89646D-03
At iterate 243 f= 1.67566D-02 |proj g|= 1.59855D-03
At iterate 244 f= 1.67101D-02 |proj g|= 2.44812D-03
At iterate 245 f= 1.66495D-02 |proj g|= 1.83545D-03
At iterate 246 f= 1.65450D-02 |proj g|= 1.02406D-03
At iterate 247 f= 1.64670D-02 |proj g|= 5.53368D-03
At iterate 248 f= 1.64051D-02 |proj g|= 4.97272D-03
At iterate 249 f= 1.63492D-02 |proj g|= 2.36781D-03
At iterate 250 f= 1.62687D-02 |proj g|= 1.35497D-03
At iterate 251 f= 1.62200D-02 |proj g|= 1.55589D-03
At iterate 252 f= 1.60781D-02 |proj g|= 4.10270D-03
At iterate 253 f= 1.60245D-02 |proj g|= 4.33171D-03
At iterate 254 f= 1.58950D-02 |proj g|= 4.80164D-03
At iterate 255 f= 1.57810D-02 |proj g|= 2.19162D-03
At iterate 256 f= 1.57246D-02 |proj g|= 2.04805D-03
At iterate 257 f= 1.57104D-02 |proj g|= 2.05273D-03
At iterate 258 f= 1.55950D-02 |proj g|= 1.43809D-02
At iterate 259 f= 1.55541D-02 |proj g|= 3.19328D-03
At iterate 260 f= 1.55157D-02 |proj g|= 6.22496D-03
At iterate 261 f= 1.54310D-02 |proj g|= 3.64952D-03
At iterate 262 f= 1.53471D-02 |proj g|= 3.61684D-03
At iterate 263 f= 1.52756D-02 |proj g|= 9.34025D-03
At iterate 264 f= 1.51270D-02 |proj g|= 2.87295D-03
At iterate 265 f= 1.49316D-02 |proj g|= 4.78373D-03
At iterate 266 f= 1.48503D-02 |proj g|= 9.52904D-03
At iterate 267 f= 1.47532D-02 |proj g|= 8.06706D-03
At iterate 268 f= 1.46582D-02 |proj g|= 6.11370D-03
At iterate 269 f= 1.44567D-02 |proj g|= 3.88189D-03
At iterate 270 f= 1.39819D-02 |proj g|= 6.25064D-03
At iterate 271 f= 1.39233D-02 |proj g|= 1.15165D-02
At iterate 272 f= 1.35994D-02 |proj g|= 6.00452D-03
At iterate 273 f= 1.34387D-02 |proj g|= 3.54468D-03
At iterate 274 f= 1.34139D-02 |proj g|= 3.17803D-03
At iterate 275 f= 1.33967D-02 |proj g|= 2.97319D-03
At iterate 276 f= 1.33960D-02 |proj g|= 2.96746D-03
At iterate 277 f= 1.33959D-02 |proj g|= 2.96701D-03
At iterate 278 f= 1.33959D-02 |proj g|= 3.07989D-02
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
1210 278 350 1 0 0 3.080D-02 1.340D-02
F = 1.3395897539264089E-002
CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
Warning: more than 10 function and gradient
evaluations in the last line search. Termination
may possibly be caused by a bad search direction.
0.9088888888888889
We can also see what happens if we increase the size of the hidden layer.
mlp = MLPClassifier(
hidden_layer_sizes=(64),
max_iter=500,
alpha=1e-4,
solver="lbfgs",
verbose=10,
random_state=1,
learning_rate_init=0.1,
)
mlp.fit(X_train,y_train)
mlp.score(X_test,y_test)
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 4810 M = 10
At X0 0 variables are exactly at the bounds
At iterate 0 f= 1.04698D+01 |proj g|= 8.19142D+00
At iterate 1 f= 9.40960D+00 |proj g|= 4.38798D+00
At iterate 2 f= 8.49749D+00 |proj g|= 4.75479D+00
At iterate 3 f= 6.75773D+00 |proj g|= 3.06551D+00
At iterate 4 f= 5.05641D+00 |proj g|= 2.28021D+00
At iterate 5 f= 3.56259D+00 |proj g|= 2.84974D+00
At iterate 6 f= 2.18216D+00 |proj g|= 1.12799D+00
At iterate 7 f= 1.60033D+00 |proj g|= 8.24687D-01
At iterate 8 f= 1.15630D+00 |proj g|= 4.70730D-01
At iterate 9 f= 8.63591D-01 |proj g|= 3.43946D-01
At iterate 10 f= 5.82189D-01 |proj g|= 2.28631D-01
At iterate 11 f= 4.26479D-01 |proj g|= 4.49107D-01
At iterate 12 f= 2.89106D-01 |proj g|= 1.28311D-01
At iterate 13 f= 2.43976D-01 |proj g|= 9.22916D-02
At iterate 14 f= 2.01287D-01 |proj g|= 9.67775D-02
At iterate 15 f= 1.65463D-01 |proj g|= 1.51950D-01
At iterate 16 f= 1.35398D-01 |proj g|= 6.21650D-02
At iterate 17 f= 1.20337D-01 |proj g|= 5.09786D-02
At iterate 18 f= 9.60194D-02 |proj g|= 3.68845D-02
At iterate 19 f= 7.27684D-02 |proj g|= 6.54578D-02
At iterate 20 f= 5.59220D-02 |proj g|= 2.69822D-02
At iterate 21 f= 4.69470D-02 |proj g|= 3.54462D-02
At iterate 22 f= 3.75954D-02 |proj g|= 3.11227D-02
At iterate 23 f= 3.15628D-02 |proj g|= 8.07837D-02
At iterate 24 f= 2.29887D-02 |proj g|= 2.13118D-02
At iterate 25 f= 2.06107D-02 |proj g|= 1.45824D-02
At iterate 26 f= 1.56146D-02 |proj g|= 1.42335D-02
At iterate 27 f= 1.31944D-02 |proj g|= 6.38546D-02
At iterate 28 f= 1.00681D-02 |proj g|= 8.65419D-03
At iterate 29 f= 9.24231D-03 |proj g|= 8.10791D-03
At iterate 30 f= 7.53521D-03 |proj g|= 1.18460D-02
At iterate 31 f= 5.09595D-03 |proj g|= 8.65832D-03
At iterate 32 f= 3.43894D-03 |proj g|= 2.43464D-02
At iterate 33 f= 2.22405D-03 |proj g|= 7.13082D-03
At iterate 34 f= 1.92190D-03 |proj g|= 4.08154D-03
At iterate 35 f= 1.59503D-03 |proj g|= 1.84045D-03
At iterate 36 f= 1.21083D-03 |proj g|= 2.31800D-03
At iterate 37 f= 9.24447D-04 |proj g|= 8.08666D-03
At iterate 38 f= 5.97799D-04 |proj g|= 1.36987D-03
At iterate 39 f= 4.88723D-04 |proj g|= 8.23700D-04
At iterate 40 f= 3.42157D-04 |proj g|= 1.05708D-03
At iterate 41 f= 2.46765D-04 |proj g|= 1.98554D-03
At iterate 42 f= 1.75509D-04 |proj g|= 5.32612D-04
At iterate 43 f= 1.51778D-04 |proj g|= 5.12587D-04
At iterate 44 f= 1.19209D-04 |proj g|= 6.00315D-04
At iterate 45 f= 8.46890D-05 |proj g|= 4.46872D-04
At iterate 46 f= 5.41279D-05 |proj g|= 1.41700D-04
At iterate 47 f= 4.24791D-05 |proj g|= 8.82478D-05
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
4810 47 49 1 0 0 8.825D-05 4.248D-05
F = 4.2479108727199123E-005
CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
This problem is unconstrained.
0.9666666666666667
We can compare it to SVM:
svm_clf = svm.SVC(gamma=0.001)
svm_clf.fit(X_train, y_train)
svm_clf.score(X_test,y_test)
0.9977777777777778
We can also have multiple hidden layers:
mlp = MLPClassifier(
hidden_layer_sizes=(64,64),
max_iter=500,
alpha=1e-4,
solver="lbfgs",
verbose=10,
random_state=1,
learning_rate_init=0.1,
)
mlp.fit(X_train,y_train)
mlp.score(X_test,y_test)
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 8970 M = 10
At X0 0 variables are exactly at the bounds
At iterate 0 f= 7.71478D+00 |proj g|= 5.25609D+00
At iterate 1 f= 6.48500D+00 |proj g|= 5.26103D+00
At iterate 2 f= 4.42779D+00 |proj g|= 2.12085D+00
At iterate 3 f= 3.25797D+00 |proj g|= 1.33043D+00
At iterate 4 f= 2.52204D+00 |proj g|= 1.13814D+00
At iterate 5 f= 1.87228D+00 |proj g|= 7.27518D-01
At iterate 6 f= 1.43680D+00 |proj g|= 5.89927D-01
At iterate 7 f= 1.06527D+00 |proj g|= 3.23321D-01
At iterate 8 f= 7.68364D-01 |proj g|= 2.96892D-01
At iterate 9 f= 5.56608D-01 |proj g|= 3.29116D-01
At iterate 10 f= 4.20392D-01 |proj g|= 1.63444D-01
At iterate 11 f= 3.51388D-01 |proj g|= 2.15978D-01
At iterate 12 f= 2.86287D-01 |proj g|= 2.34987D-01
At iterate 13 f= 2.44876D-01 |proj g|= 1.21928D-01
At iterate 14 f= 2.11853D-01 |proj g|= 1.46749D-01
At iterate 15 f= 1.84060D-01 |proj g|= 1.11867D-01
At iterate 16 f= 1.57790D-01 |proj g|= 3.51106D-01
At iterate 17 f= 1.26887D-01 |proj g|= 6.86937D-02
At iterate 18 f= 1.12359D-01 |proj g|= 5.41308D-02
At iterate 19 f= 9.21238D-02 |proj g|= 6.71390D-02
At iterate 20 f= 7.64971D-02 |proj g|= 1.31449D-01
At iterate 21 f= 6.40522D-02 |proj g|= 4.30071D-02
At iterate 22 f= 5.59814D-02 |proj g|= 5.55424D-02
At iterate 23 f= 4.58934D-02 |proj g|= 6.20258D-02
At iterate 24 f= 4.10159D-02 |proj g|= 1.03215D-01
At iterate 25 f= 3.29265D-02 |proj g|= 2.51322D-02
At iterate 26 f= 3.07147D-02 |proj g|= 1.81659D-02
At iterate 27 f= 2.61044D-02 |proj g|= 2.45009D-02
At iterate 28 f= 2.06602D-02 |proj g|= 6.99235D-02
At iterate 29 f= 1.54648D-02 |proj g|= 2.28806D-02
At iterate 30 f= 1.28063D-02 |proj g|= 1.30095D-02
At iterate 31 f= 9.76136D-03 |proj g|= 2.80805D-02
At iterate 32 f= 8.12416D-03 |proj g|= 2.98110D-02
At iterate 33 f= 7.00231D-03 |proj g|= 1.50257D-02
At iterate 34 f= 6.07564D-03 |proj g|= 9.89339D-03
At iterate 35 f= 5.17405D-03 |proj g|= 1.33440D-02
At iterate 36 f= 4.42905D-03 |proj g|= 2.16746D-02
At iterate 37 f= 3.62284D-03 |proj g|= 7.65736D-03
At iterate 38 f= 3.27339D-03 |proj g|= 7.76222D-03
At iterate 39 f= 2.83135D-03 |proj g|= 4.44776D-03
At iterate 40 f= 2.29191D-03 |proj g|= 4.95359D-03
At iterate 41 f= 2.13002D-03 |proj g|= 1.61772D-02
At iterate 42 f= 1.60152D-03 |proj g|= 3.85109D-03
At iterate 43 f= 1.34752D-03 |proj g|= 3.42780D-03
At iterate 44 f= 1.00925D-03 |proj g|= 3.46245D-03
This problem is unconstrained.
At iterate 45 f= 9.90281D-04 |proj g|= 1.47225D-02
At iterate 46 f= 6.17332D-04 |proj g|= 2.72747D-03
At iterate 47 f= 5.36680D-04 |proj g|= 2.01053D-03
At iterate 48 f= 4.10609D-04 |proj g|= 1.22670D-03
At iterate 49 f= 3.21408D-04 |proj g|= 1.10501D-03
At iterate 50 f= 2.29149D-04 |proj g|= 1.13645D-03
At iterate 51 f= 1.85399D-04 |proj g|= 1.16249D-03
At iterate 52 f= 1.53734D-04 |proj g|= 5.55915D-04
At iterate 53 f= 1.20287D-04 |proj g|= 5.60460D-04
At iterate 54 f= 8.44474D-05 |proj g|= 5.84243D-04
At iterate 55 f= 5.46868D-05 |proj g|= 4.53775D-04
At iterate 56 f= 3.29017D-05 |proj g|= 2.68175D-04
At iterate 57 f= 2.41188D-05 |proj g|= 9.86959D-05
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
8970 57 59 1 0 0 9.870D-05 2.412D-05
F = 2.4118825152332589E-005
CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL
0.9666666666666667
We saw that the SVM performed a bit better, but this is a simple problem. We can also compare these based on much they store, the number of parameters is realted to the complexity.
svm_clf.support_vectors_.shape
(692, 64)
[c.shape for c in mlp.coefs_]
[(64, 64), (64, 64), (64, 10)]
np.prod(list(svm_clf.support_vectors_.shape))
44288
np.sum([np.prod(list(c.shape)) for c in mlp.coefs_])
8832
We see this is much smaler.
32.1. Questions after class#
32.1.1. How can we use this in our assignment?#
You do not have to, but you could try an MLP in your assignment, but all that is required is any classifier and a text representation.
32.1.2. How do we know how to change the parameters?#
If it doesn’t work well, trying more layers or bigger layers is a good idea.
Neural nets work like a black box, they’re hard to interpret, so while there
are good heuristics, there isn’t as solid theory for how to know what do to
with them.
They work well, but because they’re hard to understand, that’s a risk of using them.