This website is build using mystmd.
It allows me to post jupyter notebooks (which I have also converted to markdown files using jupytext to make them easier to manage with git) to the website easily. I put the files in a repository and then the mystmd
tool runs all of the code in the notebook and renders the markdown together into HTML for the website. It makes all of the links and everything automatically.
mystmd is designed to make pubslishing data-intensive results easier, with a better reading experience.
For example, I have glossary file in the repository too, see the raw file on github that creates the glossary page on the site. However when you are reading and I use a term like GitHub or git you can see the definition without having to change pages.
myst is a free, open tool, that you can use too!
And it works in live notebooks too!
The main box above, we made in class, it is called a directive, and specifically an attention
admonition
To make one, in a markdown cell use colons(:
) to make a “fence” around the block and label it with a valid admonition type in {}
:
:::::{attention}
This will be important
::::::
This will be important
Above I linked to the myst docs using a special cross referencing syntax that allows it to pull the information from that other website (since it is also made with myst) into hover text.
[valid admonition type](xref:myst/admonitions#admonitions-list)
mystmd has some nice features
In markdown cells, we can even have syntax highligting of non-executable code:
```Python
import pandas as pd
```
import pandas as pd
Preview of the COMPAS Data¶
Read the Machine Bias article if you have not already. It’s not a technical article, just a news article. There are more technical details linked if you are curious.
import pandas as pd
from sklearn import metrics
import seaborn as sns
If sklearn
does not load install with:
pip install scikit-learn
Since I do not need that install line to run in the website, I made it in a markdown cell, but used code formatting for it
Now we will load a cleaned version of the data:
compas_clean_url = 'https://raw.githubusercontent.com/ml4sts/outreach-compas/main/data/compas_c.csv'
compas_df = pd.read_csv(compas_clean_url)
compas_df.head(3)
One Hot Encoding¶
One hot encoding is a way of transforming categorical variables with multiple values into binary values. So if we have one column with values in it, in one hot encoding we will have columns that are all binary (0/1 or True/False). In those columns exactly one will be a True in each row.
To see what one hot encoding looks like, we will apply it first to just the one column.
compas_df[['score_text']].head()
pd.get_dummies(compas_df['score_text']).head()
compas_df_clean = pd.get_dummies(compas_df, columns=['score_text'])
compas_df_clean.head(2)
How many people received a ‘High’ score?
high_counts = compas_df_clean['score_text_High'].value_counts()
high_counts
score_text_High
False 4210
True 1068
Name: count, dtype: int64
1068 people got a high score
this uses a the myst {eval}
I wrote a note