{
"cells": [
{
"cell_type": "markdown",
"id": "7a31a104",
"metadata": {},
"source": [
"# Performance Metrics continued\n",
"\n",
"## Logistics\n",
"\n",
"- A6 posted ASAP! (I promise it will be straightforward in terms of the actual code; you do need to carefully interpret though)\n",
"- a5 grading is behind, but if it's not done by early tomorrow (eg 10am) then I'll extend the portfolio deadline accordingly\n",
"- [Mid Semester Feedback](https://forms.gle/BdgHf8zHcXoUcw9E8)\n",
"\n",
"## Completing the COMPAS Audit\n",
"\n",
"Let's start where we left off, plus some additional imports\n",
"\n",
"````{margin}\n",
"```{note}\n",
"I used a [tag](https://jupyterbook.org/en/stable/reference/cheatsheet.html#tags) on this cell in the notes so that the output with warning is not shown here.\n",
"```\n",
"````"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e159ffe1",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:No module named 'tempeh': LawSchoolGPADataset will be unavailable. To install, run:\n",
"pip install 'aif360[LawSchoolGPA]'\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:No module named 'tensorflow': AdversarialDebiasing will be unavailable. To install, run:\n",
"pip install 'aif360[AdversarialDebiasing]'\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:No module named 'tensorflow': AdversarialDebiasing will be unavailable. To install, run:\n",
"pip install 'aif360[AdversarialDebiasing]'\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:No module named 'fairlearn': ExponentiatedGradientReduction will be unavailable. To install, run:\n",
"pip install 'aif360[Reductions]'\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:No module named 'fairlearn': GridSearchReduction will be unavailable. To install, run:\n",
"pip install 'aif360[Reductions]'\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING:root:No module named 'fairlearn': GridSearchReduction will be unavailable. To install, run:\n",
"pip install 'aif360[Reductions]'\n"
]
}
],
"source": [
"import pandas as pd\n",
"from sklearn import metrics as skmetrics\n",
"from aif360 import metrics as fairmetrics\n",
"from aif360.datasets import BinaryLabelDataset\n",
"import seaborn as sns\n",
"\n",
"compas_clean_url = 'https://raw.githubusercontent.com/ml4sts/outreach-compas/main/data/compas_c.csv'\n",
"compas_df = pd.read_csv(compas_clean_url,index_col = 'id')\n",
"\n",
"compas_df = pd.get_dummies(compas_df,columns=['score_text'],)"
]
},
{
"cell_type": "markdown",
"id": "adb99ea1",
"metadata": {},
"source": [
"```{warning}\n",
"We'll get a warning which is **okay** but if you run again it will go away.\n",
"```\n",
"\n",
"to review:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a5e3457a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" age | \n",
" c_charge_degree | \n",
" race | \n",
" age_cat | \n",
" sex | \n",
" priors_count | \n",
" days_b_screening_arrest | \n",
" decile_score | \n",
" is_recid | \n",
" two_year_recid | \n",
" c_jail_in | \n",
" c_jail_out | \n",
" length_of_stay | \n",
" score_text_High | \n",
" score_text_Low | \n",
" score_text_Medium | \n",
"
\n",
" \n",
" id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 3 | \n",
" 34 | \n",
" F | \n",
" African-American | \n",
" 25 - 45 | \n",
" Male | \n",
" 0 | \n",
" -1.0 | \n",
" 3 | \n",
" 1 | \n",
" 1 | \n",
" 2013-01-26 03:45:27 | \n",
" 2013-02-05 05:36:53 | \n",
" 10 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" 24 | \n",
" F | \n",
" African-American | \n",
" Less than 25 | \n",
" Male | \n",
" 4 | \n",
" -1.0 | \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" 2013-04-13 04:58:34 | \n",
" 2013-04-14 07:02:04 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 8 | \n",
" 41 | \n",
" F | \n",
" Caucasian | \n",
" 25 - 45 | \n",
" Male | \n",
" 14 | \n",
" -1.0 | \n",
" 6 | \n",
" 1 | \n",
" 1 | \n",
" 2014-02-18 05:08:24 | \n",
" 2014-02-24 12:18:30 | \n",
" 6 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 10 | \n",
" 39 | \n",
" M | \n",
" Caucasian | \n",
" 25 - 45 | \n",
" Female | \n",
" 0 | \n",
" -1.0 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 2014-03-15 05:35:34 | \n",
" 2014-03-18 04:28:46 | \n",
" 2 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 14 | \n",
" 27 | \n",
" F | \n",
" Caucasian | \n",
" 25 - 45 | \n",
" Male | \n",
" 0 | \n",
" -1.0 | \n",
" 4 | \n",
" 0 | \n",
" 0 | \n",
" 2013-11-25 06:31:06 | \n",
" 2013-11-26 08:26:57 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" age c_charge_degree race age_cat sex priors_count \\\n",
"id \n",
"3 34 F African-American 25 - 45 Male 0 \n",
"4 24 F African-American Less than 25 Male 4 \n",
"8 41 F Caucasian 25 - 45 Male 14 \n",
"10 39 M Caucasian 25 - 45 Female 0 \n",
"14 27 F Caucasian 25 - 45 Male 0 \n",
"\n",
" days_b_screening_arrest decile_score is_recid two_year_recid \\\n",
"id \n",
"3 -1.0 3 1 1 \n",
"4 -1.0 4 1 1 \n",
"8 -1.0 6 1 1 \n",
"10 -1.0 1 0 0 \n",
"14 -1.0 4 0 0 \n",
"\n",
" c_jail_in c_jail_out length_of_stay score_text_High \\\n",
"id \n",
"3 2013-01-26 03:45:27 2013-02-05 05:36:53 10 0 \n",
"4 2013-04-13 04:58:34 2013-04-14 07:02:04 1 0 \n",
"8 2014-02-18 05:08:24 2014-02-24 12:18:30 6 0 \n",
"10 2014-03-15 05:35:34 2014-03-18 04:28:46 2 0 \n",
"14 2013-11-25 06:31:06 2013-11-26 08:26:57 1 0 \n",
"\n",
" score_text_Low score_text_Medium \n",
"id \n",
"3 1 0 \n",
"4 1 0 \n",
"8 0 1 \n",
"10 1 0 \n",
"14 1 0 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compas_df.head()"
]
},
{
"cell_type": "markdown",
"id": "4ec7e991",
"metadata": {},
"source": [
"Notice today we imported the sklearn.metrics module with an alias."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "facd3c9b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6288366805608185"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"skmetrics.accuracy_score(compas_df['two_year_recid'],compas_df['score_text_High'])"
]
},
{
"cell_type": "markdown",
"id": "bf8d9ed0",
"metadata": {},
"source": [
"More common is to use medium or high to check accuracy (or not low) we can calulate tihs by either summing two or inverting. We'll do it as not low for now, to review using apply.\n",
"\n",
"```{admonition} Try it Yourself\n",
"A good exercise to review data manipulation is to try creating the `score_text_MedHigh` column by adding the other two together (because medium or high if they're booleans is the same as medium + high if they're ints)\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "879397ba",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6582038651004168"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int_not = lambda a:int(not(a))\n",
"compas_df['score_text_MedHigh'] = compas_df['score_text_Low'].apply(int_not)\n",
"skmetrics.accuracy_score(compas_df['two_year_recid'],\n",
" compas_df['score_text_MedHigh'])"
]
},
{
"cell_type": "markdown",
"id": "9f7bcdca",
"metadata": {},
"source": [
"We can see this gives us a slightly higher score, but still not that great.\n",
"\n",
"\n",
"the `int_not` `lambda` is a function:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "046f1cda",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"function"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"type(int_not)"
]
},
{
"cell_type": "markdown",
"id": "464b6659",
"metadata": {},
"source": [
"it is equivalent as the following, but a more compact notation."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a847348e",
"metadata": {},
"outputs": [],
"source": [
"def int_not_f(a):\n",
" return int(not(a))"
]
},
{
"cell_type": "markdown",
"id": "d784fa4a",
"metadata": {},
"source": [
"It flips a 0 to a 1"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "352f5c3b",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int_not(0)"
]
},
{
"cell_type": "markdown",
"id": "fe60a59e",
"metadata": {},
"source": [
"and th eother way"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "f23076fd",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"int_not(1)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "2ec3ce1e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"score_text_Medium\n",
"0 1\n",
"1 5\n",
"Name: decile_score, dtype: int64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compas_df.groupby('score_text_Medium')['decile_score'].min()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "de0fb672",
"metadata": {},
"outputs": [],
"source": [
"compas_race = compas_df.groupby('race')"
]
},
{
"cell_type": "markdown",
"id": "7a3984bf",
"metadata": {},
"source": [
"## Per Group scores with groupby\n",
"\n",
"To groupby and then do the score, we can use a lambda again, with apply"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "3c32f29d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"race\n",
"African-American 0.649134\n",
"Caucasian 0.671897\n",
"dtype: float64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"acc_fx = lambda d: skmetrics.accuracy_score(d['two_year_recid'],\n",
" d['score_text_MedHigh'])\n",
"\n",
"compas_race.apply(acc_fx,)"
]
},
{
"cell_type": "markdown",
"id": "ed4c422d",
"metadata": {},
"source": [
"In this case it gives a series, but with `reset_index` we can make it a DataFrame and then rename the column to label it as accuracy."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "59624af9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" race | \n",
" accuracy | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" African-American | \n",
" 0.649134 | \n",
"
\n",
" \n",
" 1 | \n",
" Caucasian | \n",
" 0.671897 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" race accuracy\n",
"0 African-American 0.649134\n",
"1 Caucasian 0.671897"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compas_race.apply(acc_fx,).reset_index().rename(columns={0:'accuracy'})"
]
},
{
"cell_type": "markdown",
"id": "1e43a103",
"metadata": {},
"source": [
"That lambda + apply is equivalent to:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "d3f4900a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" race | \n",
" accuracy | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" African-American | \n",
" 0.649134 | \n",
"
\n",
" \n",
" 1 | \n",
" Caucasian | \n",
" 0.671897 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" race accuracy\n",
"0 African-American 0.649134\n",
"1 Caucasian 0.671897"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"race_acc = []\n",
"for race, rdf in compas_race:\n",
" acc = skmetrics.accuracy_score(rdf['two_year_recid'],\n",
" rdf['score_text_MedHigh'])\n",
" race_acc.append([race,acc])\n",
"\n",
"pd.DataFrame(race_acc, columns =['race','accuracy'])"
]
},
{
"cell_type": "markdown",
"id": "0d31e0a4",
"metadata": {},
"source": [
"## Using AIF360\n",
"\n",
"The AIF360 package implements fairness metrics, some of which are derived from metrics we have seen and some others. [the documentation](https://aif360.readthedocs.io/en/latest/modules/generated/aif360.metrics.ClassificationMetric.html#aif360.metrics.ClassificationMetric) has the full list in a summary table with English explanations and details with most equations.\n",
"\n",
"However, it has a few requirements:\n",
"- its constructor takes two `BinaryLabelDataset` objects\n",
"- these objects must be the same except for the label column\n",
"- the constructor for `BinaryLabelDataset` only accepts all numerical DataFrames\n",
"\n",
"\n",
"So, we have some preparation to do. \n",
"\n",
"\n",
"First, we'll make a numerical copy of the `compas_df` columns that we need. The only nonnumerical column that we need is race, wo we'll make a `dict` to replace that/"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "7f62ed22",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'African-American': 0, 'Caucasian': 1}"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"race_num_map = {r:i for i,r, in enumerate(compas_df['race'].value_counts().index)}\n",
"race_num_map"
]
},
{
"cell_type": "markdown",
"id": "7b055a1f",
"metadata": {},
"source": [
"and here we select columns and replace the values"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "d657b4c5",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" race | \n",
" two_year_recid | \n",
" score_text_MedHigh | \n",
"
\n",
" \n",
" id | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 3 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 4 | \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" race two_year_recid score_text_MedHigh\n",
"id \n",
"3 0 1 0\n",
"4 0 1 0"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"required_cols = ['race','two_year_recid','score_text_MedHigh']\n",
"num_compas = compas_df[required_cols].replace(race_num_map)\n",
"num_compas.head(2)"
]
},
{
"cell_type": "markdown",
"id": "a2116d75",
"metadata": {},
"source": [
"Next we will make two versions, one with race & the ground truth and ht eother with race & the predictions. It's easiest to drop the column we don't want."
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "ac690bf2",
"metadata": {},
"outputs": [],
"source": [
"num_compas_true = num_compas.drop(columns=['score_text_MedHigh'])\n",
"num_compas_pred = num_compas.drop(columns=['two_year_recid'])"
]
},
{
"cell_type": "markdown",
"id": "c92b0124",
"metadata": {},
"source": [
"Now we make the [`BinaryLabelDataset`](https://aif360.readthedocs.io/en/latest/modules/generated/aif360.datasets.BinaryLabelDataset.html#aif360.datasets.BinaryLabelDataset) objects, this type comes from AIF360 too. Basically, it is a DataFrame with extra attributes; some specific and some inherited from [`StructuredDataset`](https://aif360.readthedocs.io/en/latest/modules/generated/aif360.datasets.StructuredDataset.html#aif360.datasets.StructuredDataset)."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "6e884cb7",
"metadata": {},
"outputs": [],
"source": [
"# here we want actual favorable outcome\n",
"broward_true = BinaryLabelDataset(0,1,df = num_compas_true,\n",
" label_names= ['two_year_recid'],\n",
" protected_attribute_names=['race'])\n",
"compas_predictions = BinaryLabelDataset(0,1,df = num_compas_pred,\n",
" label_names= ['score_text_MedHigh'],\n",
" protected_attribute_names=['race'])"
]
},
{
"cell_type": "markdown",
"id": "a8672cfd",
"metadata": {},
"source": [
"```{admonition} Try it Yourself\n",
"remember, you can inspect *any* object using the `__dict__` attribute\n",
"```\n",
"\n",
"\n",
"This type also has an `ignore_fields` column for when comparisons are made, since the requirement is that only the *content* of the label column is different, but in our case also the label names are different, we have to tell it that that's okay."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "edb63ff2",
"metadata": {},
"outputs": [],
"source": [
"compas_predictions.ignore_fields.add('label_names')\n",
"broward_true.ignore_fields.add('label_names')"
]
},
{
"cell_type": "markdown",
"id": "343664d2",
"metadata": {},
"source": [
"Now, we can instantiate our metric object:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "4e083eea",
"metadata": {},
"outputs": [],
"source": [
"compas_fair_scorer = fairmetrics.ClassificationMetric(broward_true,\n",
" compas_predictions,\n",
" unprivileged_groups=[{'race':0}],\n",
" privileged_groups = [{'race':1}])"
]
},
{
"cell_type": "markdown",
"id": "858fa2ff",
"metadata": {},
"source": [
"And finally we can compute! First, we can verify that we get the same accuracy as before"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "f2cd14e5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6582038651004168"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compas_fair_scorer.accuracy()"
]
},
{
"cell_type": "markdown",
"id": "e97c1a10",
"metadata": {},
"source": [
"For the aif360 metrics, they have one parameter, `privleged` with a defautl value of `None` when it's none it computes th ewhole dataset. When `True` it compues only the priveleged group."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "56e9b593",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6718972895863052"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compas_fair_scorer.accuracy(True)"
]
},
{
"cell_type": "markdown",
"id": "e7030fc4",
"metadata": {},
"source": [
"Here that is Caucasion people.\n",
"\n",
"\n",
"When `False` it's the unpriveleged group, here African American"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "23317260",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6491338582677165"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compas_fair_scorer.accuracy(False)"
]
},
{
"cell_type": "markdown",
"id": "4818a797",
"metadata": {},
"source": [
"We can also compute other scores. Many fairness scores are ratios of the un priveleged group's score to the privleged group's score. \n",
"\n",
"In Disparate Impact the ratio is of the positive outcome, independent of the predictor. So this is the ratio of the % of Black people not rearrested to % of white people rearrested."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "f767f8a6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.6336457196581771"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compas_fair_scorer.disparate_impact()"
]
},
{
"cell_type": "markdown",
"id": "598138fc",
"metadata": {},
"source": [
"The courts use an \"80%\" rule saying that if this ratio is above .8 for things like employment, it's close enough. T"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "d7131f8a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.0693789798014377"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compas_fair_scorer.error_rate_ratio()"
]
},
{
"cell_type": "markdown",
"id": "837a8c3a",
"metadata": {},
"source": [
"We can also do ratios of the scores. This is where the journalists [found bias](https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing)."
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "66ab8df0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.5737241916634204"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compas_fair_scorer.false_positive_rate_ratio()"
]
},
{
"cell_type": "markdown",
"id": "4aea7d71",
"metadata": {},
"source": [
"Black people were given a low score and then re-arrested only a little more than half as often as white people. (White people were give an low score and rearrested almost twice as often)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "d5aa9d5f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.9232342111919953"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compas_fair_scorer.false_negative_rate_ratio()"
]
},
{
"cell_type": "markdown",
"id": "3ec69053",
"metadata": {},
"source": [
"Black people were given a high score and not rearrested almost twice as often as white people.\n",
"\n",
"So while the accuracy was similar (see error rate ratio) for Black and White people; the algorithm makes the opposite types of errors. \n",
"\n",
"After the journalists published the piece, the people who made COMPAS countered with a technical report, arguing that that the journalists had measured fairness incorrectly.\n",
"\n",
"The journalists two measures false positive rate and false negative rate use the true outcomes as the denominator. \n",
"\n",
"The [COMPAS creators argued](https://www.equivant.com/response-to-propublica-demonstrating-accuracy-equity-and-predictive-parity/) that the model should be evaluated in terms of if a given score means the same thing across races; using the prediction as the denominator."
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "c6dbdda4",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.8649767923408909"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compas_fair_scorer.false_omission_rate_ratio()"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "e73cd787",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1.2118532033913119"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"compas_fair_scorer.false_discovery_rate_ratio()"
]
},
{
"cell_type": "markdown",
"id": "bae818ad",
"metadata": {},
"source": [
"On these two metrics, the ratio is closer to 1 and much less disparate.\n",
"\n",
"\n",
"The creators thought it was important for the score to mean the same thing for every person assigned a score. The journalists thought it was more important for the algorithm to have the same impact of different groups of people. \n",
"Ideally, we would like the score to both mean the same thing for different people and to have the same impact. \n",
"\n",
"Researchers established that these are mutually exclusive, provably. We cannot have both, so it is very important to think about what the performance metrics mean and how your algorithm will be used in order to choose how to prepare a model. We will train models starting next week, but knowing these goals in advance is essential.\n",
"\n",
"Importantly, this is not a statistical, computational choice that data can answer for us. This is about *human* values (and to some extent the law; certain domains have legal protections that require a specific condition).\n",
"\n",
"The Fair Machine Learning book's classificaiton Chapter has a [section on relationships between criteria](https://fairmlbook.org/classification.html#relationships-between-criteria) with the proofs.\n",
"\n",
"\n",
"```{important}\n",
"\n",
"We used ProPublica's COMPAS dataset to replicate (parts of, with different tools) their analysis. That is, they collected the dataset in order to audit the COMPAS algorithm and we used it for the same purpose (and to learn model evaluation). This dataset is not designed for *training* models, even though it has been used as such many times. This is [not the best way](https://openreview.net/pdf?id=qeM58whnpXM) to use this dataset and for future assignments I do not recommend using this dataset.\n",
"```\n",
"\n",
"````{margin}\n",
"```{note}\n",
"If you are interested in fairness in ML, that is what my research is. Aiden has been working on a project with me since summer and I'll be taking new students in the spring. Students who have completed this course are excellent candidates to join my lab. Let me know if you are interested!\n",
"```\n",
"````\n",
"\n",
"\n",
"## Portfolio Reminder\n",
"\n",
"If you do not need level 3s to be happy with your grade for the course (eg you want a B) and you have all the achievements so far you can skip the portfolio submission. If you do not need level 3 and you are not on track, you should submit to get caught up. This can be (and is advised to be) reflective revisions of past assignment(s). \n",
"\n",
"If you need level 3 achievements for your desired grade, then you can pick a [subset of the eligible skills](https://rhodyprog4ds.github.io/BrownFall22/portfolio/index.html#upcoming-checks)(or all) and add *new* work that shows that you have learned those skills according to the [level 3 checklists](https://rhodyprog4ds.github.io/BrownFall22/syllabus/achievements.html#detailed-checklists). [the ideas page](https://rhodyprog4ds.github.io/BrownFall22/portfolio/check1ideas.html) has example formats for that new work.\n",
"\n",
"\n",
"## Questions After Class\n",
"\n",
"Today's questions were only clarifying, so hopefully re-reading the notes is enough. If not, post a question as an issue!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0fba8610",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"jupytext": {
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": 0.13,
"jupytext_version": "1.14.1"
}
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
},
"source_map": [
12,
32,
44,
51,
53,
56,
58,
66,
71,
78,
80,
84,
87,
91,
93,
95,
99,
103,
105,
111,
116,
119,
121,
125,
133,
151,
154,
157,
162,
165,
168,
171,
179,
187,
190,
193,
198,
201,
203,
207,
209,
215,
217,
223,
225,
229,
231,
236,
238,
242,
244,
257,
261,
264,
303
]
},
"nbformat": 4,
"nbformat_minor": 5
}