Support vector machine classification of shatter artifacts

This project is a simple implementation of supervised machine learning to find a boundary between “good” data and “artifact” data. In particular, I am looking at ice crystal data that researches collected by flying through cirrus clouds in the NASA WB-57 as part of the MACPEX campaign in 2011.

A shatter artifact in this dataset occurs when an incoming ice crystal shatters on the aircraft or sample inlet instead of being sampled as an intact crystal. These events are typically characterized by a large number of small ice crystals being detected in a short period of time.

Using this information, I can train a linear support vector machine on what I think are “good” and “artifact” observations. I can then plot these data in some parameter space and try to find the boundary that maximizes the margin between these two two classes.

In this example, I’m looking to see if such a boundary exists when plotting the size of the detected ice crystal vs. the relative humidity when it was detected. The figure below shows the training data, with green points corresponding to good data, red to shatter artifacts, and black circles showing the support vectors used to generate the boundary between the two classes. Using this type of model, I can then classify whether a new observation is either good or is an artifact within some quantifiable uncertainty.svm