9. Assignment 9: Clustering#

9.1. Quick Facts#

9.3. Assessment#

task

skill

apply and interpret kmeans clustering

clustering (2)

use multiple metrics evaluate performance

evaluate (2)

interpret how decisions impact model performance

evaluate (2)

interpret the classifier performance in the context of the dataset

process (2)

analyze the impact of model parameters on model performance

process (2)

usse EDA techniques to interpret the experimental results

summarize (2), visualize (2)

9.4. Instructions#

Use the same dataset you used for assignment 7, unless there was a problem, or pick one of the recommended ones for that assignment if you did not complete assignment 7.

  1. Describe what question you’d be asking in applying clustering to this dataset.

  2. Apply Kmeans using the known, correct number of clusters, \(K\).

  3. Evaluate how well clustering worked on the data:

    • using a true clustering metric

    • using visual inspection

    • using a clustering metric that uses the ground truth labels

  4. Include a discussion of your results that addresses the following:

    • describes what the clustering means

    • what the metrics show

    • Does this clustering work better or worse than expected based on the classification performance (if you didn’t complete assignment 7, also apply a classifier)

  5. Repeat your analysis using a different number of clusters:

    • can you interpret the new clusters?

    • how to they relate to the original clusters? are they completely different, did one split? did some merge?

    • is there a reasonable explanation for more clusters than there are classes in this dataset?

Think Ahead

How can clustering be used to ask many different questions? What can you do with clustering results?