Coronavirus Information for the UC San Diego Community

Our leaders are working closely with federal and state officials to ensure your ongoing safety at the university. Stay up to date with the latest developments. Learn more.

Deep data analytics for genetic engineering of diatoms linking genotype to phenotype via machine learning

TitleDeep data analytics for genetic engineering of diatoms linking genotype to phenotype via machine learning
Publication TypeJournal Article
Year of Publication2019
AuthorsTrofimov A.A, Pawlicki A.A, Borodinov N., Mandal S., Mathews T.J, Hildebrand M, Ziatdinov M.A, Hausladen K.A, Urbanowicz P.K, Steed C.A, Ievlev A.V, Belianinov A., Michener J.K, Vasudevan R., Ovchinnikova O.S
Volume5
Date Published2019/06
Type of ArticleArticle
ISBN Number2057-3960
Accession NumberWOS:000471718200001
Keywordschemistry; classification; Materials Science; microscopy; neural-network models; photoluminescence; visual analytics
Abstract

Genome engineering for materials synthesis is a promising avenue for manufacturing materials with unique properties under ambient conditions. Biomineralization in diatoms, unicellular algae that use silica to construct micron-scale cell walls with nanoscale features, is an attractive candidate for functional synthesis of materials for applications including photonics, sensing, filtration, and drug delivery. Therefore, controllably modifying diatom structure through targeted genetic modifications for these applications is a very promising field. In this work, we used gene knockdown in Thalassiosira pseudonana diatoms to create modified strains with changes to structural morphology and linked genotype to phenotype using supervised machine learning. An artificial neural network (NN) was developed to distinguish wild and modified diatoms based on the SEM images of frustules exhibiting phenotypic changes caused by a specific protein (Thaps3_21880), resulting in 94% detection accuracy. Class activation maps visualized physical changes that allowed the NNs to separate diatom strains, subsequently establishing a specific gene that controls pores. A further NN was created to batch process image data, automatically recognize pores, and extract pore-related parameters. Class interrelationship of the extracted paraments was visualized using a multivariate data visualization tool, called CrossVis, and allowed to directly link changes in morphological diatom phenotype of pore size and distribution with changes in the genotype.

DOI10.1038/s41524-019-0202-3
Student Publication: 
No