Coronavirus Information for the UC San Diego Community

Our leaders are working closely with federal and state officials to ensure your ongoing safety at the university. Stay up to date with the latest developments. Learn more.

Improving plankton image classification using context metadata

TitleImproving plankton image classification using context metadata
Publication TypeJournal Article
Year of Publication2019
AuthorsEllen J.S, Graff C.A, Ohman MD
Date Published2019/08
Type of ArticleArticle
ISBN Number1541-5856
Accession NumberWOS:000480553600002
Keywordsidentification; instrument; Marine & Freshwater Biology; neural-networks; oceanography; phytoplankton; recognition; resolution; system

Advances in both hardware and software are enabling rapid proliferation of in situ plankton imaging methods, requiring more effective machine learning approaches to image classification. Deep Learning methods, such as convolutional neural networks (CNNs), show marked improvement over traditional feature-based supervised machine learning algorithms, but require careful optimization of hyperparameters and adequate training sets. Here, we document some best practices in applying CNNs to zooplankton and marine snow images and note where our results differ from contemporary Deep Learning findings in other domains. We boost the performance of CNN classifiers by incorporating metadata of different types and illustrate how to assimilate metadata beyond simple concatenation. We utilize both geotemporal (e.g., sample depth, location, time of day) and hydrographic (e.g., temperature, salinity, chlorophyll a) metadata and show that either type by itself, or both combined, can substantially reduce error rates. Incorporation of context metadata also boosts performance of the feature-based classifiers we evaluated: Random Forest, Extremely Randomized Trees, Gradient Boosted Classifier, Support Vector Machines, and Multilayer Perceptron. For our assessments, we use an original data set of 350,000 in situ images (roughly 50% marine snow and 50% non-snow sorted into 26 categories) from a novel in situ Zooglider. We document asymptotically increasing performance with more computationally intensive techniques, such as substantially deeper networks and artificially augmented data sets. Our best model achieves 92.3% accuracy with our 27-class data set. We provide guidance for further refinements that are likely to provide additional gains in classifier accuracy.

Student Publication: