9. Assignment 9#

accept the assignment

Due: 2022-11-11

9.1. #

Eligible skills: (links to checklists)

  • first chance clustering 1 and 2

  • evaluate 1 and 2

  • python 1 and 2

  • summarize 1 and 2

  • visualize 1 and 2

9.3. Instructions#

Use the same dataset you used for assignment 7, unless there was a problem, or pick one of the recommended ones for that assignment if you did not complete assignment 7.

  1. Describe what question you’d be asking in applying clustering to this dataset.

  2. Apply Kmeans using the known, correct number of clusters, \(K\).

  3. Evaluate how well clustering worked on the data:

    • using a true clustering metric and

    • using visualization and

    • using a clustering metric that uses the ground truth labels

  4. Include a discussion of your results that addresses the following:

    • describes what the clustering means

    • what the metrics show

    • Does this clustering work better or worse than expected based on the classification performance (if you didn’t complete assignment 7, also apply a classifier)

  5. Repeat your analysis using a 2 different numbers (1 higher, one lower) of clusters:

    • can you interpret the new clusters?

    • how to they relate to the original clusters? are they completely different, did one split?

    • is there a reasonable explanation for more clusters than there are classes in this dataset?