Skip to article frontmatterSkip to article content

Sharing Results

This website is build using mystmd.

It allows me to post jupyter notebooks (which I have also converted to markdown files using jupytext to make them easier to manage with git) to the website easily. I put the files in a repository and then the mystmd tool runs all of the code in the notebook and renders the markdown together into HTML for the website. It makes all of the links and everything automatically.

mystmd is designed to make pubslishing data-intensive results easier, with a better reading experience.

For example, I have glossary file in the repository too, see the raw file on github that creates the glossary page on the site. However when you are reading and I use a term like GitHub or git you can see the definition without having to change pages.

myst is a free, open tool, that you can use too!

And it works in live notebooks too!

The main box above, we made in class, it is called a directive, and specifically an attention admonition

To make one, in a markdown cell use colons(: ) to make a “fence” around the block and label it with a valid admonition type in {}:

Myst
Rendered
:::::{attention}
This will be important
::::::

In markdown cells, we can even have syntax highligting of non-executable code:

Myst
Rendered

```Python
import pandas as pd
```

Preview of the COMPAS Data

import pandas as pd
from sklearn import metrics
import seaborn as sns

If sklearn does not load install with:

pip install scikit-learn

Now we will load a cleaned version of the data:

compas_clean_url = 'https://raw.githubusercontent.com/ml4sts/outreach-compas/main/data/compas_c.csv'
compas_df = pd.read_csv(compas_clean_url)
compas_df.head(3)
Loading...

One Hot Encoding

One hot encoding is a way of transforming categorical variables with multiple values into binary values. So if we have one column with xx values in it, in one hot encoding we will have xx columns that are all binary (0/1 or True/False). In those xx columns exactly one will be a True in each row.

To see what one hot encoding looks like, we will apply it first to just the one column.

Categorical Data
One Hot Encoded
compas_df[['score_text']].head()
Loading...
compas_df_clean = pd.get_dummies(compas_df, columns=['score_text'])
compas_df_clean.head(2)
Loading...

How many people received a ‘High’ score?

high_counts = compas_df_clean['score_text_High'].value_counts()
high_counts
score_text_High False 4210 True 1068 Name: count, dtype: int64

1068 people got a high score

I wrote a note

Programatic tools for accessing data

UCI Repo

HuggingFace :hugs: