Sharing Results - Programming for Data Science (Fall 2025)

This website is build using mystmd.

It allows me to post jupyter notebooks (which I have also converted to markdown files using jupytext to make them easier to manage with git) to the website easily. I put the files in a repository and then the mystmd tool runs all of the code in the notebook and renders the markdown together into HTML for the website. It makes all of the links and everything automatically.

mystmd is designed to make pubslishing data-intensive results easier, with a better reading experience.

For example, I have glossary file in the repository too, see the raw file on github that creates the glossary page on the site. However when you are reading and I use a term like GitHub or git you can see the definition without having to change pages.

myst is a free, open tool, that you can use too!

And it works in live notebooks too!

Attention

For this to work, install myst

!pip install mystmd

and the jupyterlab myst extention

!pip install jupyterlab_myst

then restart your kernel (or the whole jupyter lab).

If you have trouble, see the docs for:

The main box above, we made in class, it is called a directive, and specifically an attention admonition

To make one, in a markdown cell use colons(: ) to make a “fence” around the block and label it with a valid admonition type in {}:

Myst

Rendered

:::::{attention}
This will be important
::::::

Note

Above I linked to the myst docs using a special cross referencing syntax that allows it to pull the information from that other website (since it is also made with myst) into hover text.

[valid admonition type](xref:myst/admonitions#admonitions-list)

mystmd has some nice features

In markdown cells, we can even have syntax highligting of non-executable code:

Myst

Rendered


```Python
import pandas as pd
```

Preview of the COMPAS Data¶

import pandas as pd
from sklearn import metrics
import seaborn as sns

If sklearn does not load install with:

pip install scikit-learn

Now we will load a cleaned version of the data:

compas_clean_url = 'https://raw.githubusercontent.com/ml4sts/outreach-compas/main/data/compas_c.csv'
compas_df = pd.read_csv(compas_clean_url)

compas_df.head(3)

One Hot Encoding¶

One hot encoding is a way of transforming categorical variables with multiple values into binary values. So if we have one column with $x$ values in it, in one hot encoding we will have $x$ columns that are all binary (0/1 or True/False). In those $x$ columns exactly one will be a True in each row.

To see what one hot encoding looks like, we will apply it first to just the one column.

Categorical Data

One Hot Encoded

compas_df[['score_text']].head()

compas_df_clean = pd.get_dummies(compas_df, columns=['score_text'])

compas_df_clean.head(2)

How many people received a ‘High’ score?

high_counts = compas_df_clean['score_text_High'].value_counts()
high_counts

score_text_High
False    4210
True     1068
Name: count, dtype: int64

Preview of the COMPAS Data¶

One Hot Encoding¶

Programatic tools for accessing data¶

UCI Repo¶

HuggingFace :hugs:¶