|Title||Deep data analytics for genetic engineering of diatoms linking genotype to phenotype via machine learning|
|Publication Type||Journal Article|
|Year of Publication||2019|
|Authors||Trofimov A.A, Pawlicki A.A, Borodinov N., Mandal S., Mathews T.J, Hildebrand M, Ziatdinov M.A, Hausladen K.A, Urbanowicz P.K, Steed C.A, Ievlev A.V, Belianinov A., Michener J.K, Vasudevan R., Ovchinnikova O.S|
|Type of Article||Article|
|Keywords||chemistry; classification; Materials Science; microscopy; neural-network models; photoluminescence; visual analytics|
Genome engineering for materials synthesis is a promising avenue for manufacturing materials with unique properties under ambient conditions. Biomineralization in diatoms, unicellular algae that use silica to construct micron-scale cell walls with nanoscale features, is an attractive candidate for functional synthesis of materials for applications including photonics, sensing, filtration, and drug delivery. Therefore, controllably modifying diatom structure through targeted genetic modifications for these applications is a very promising field. In this work, we used gene knockdown in Thalassiosira pseudonana diatoms to create modified strains with changes to structural morphology and linked genotype to phenotype using supervised machine learning. An artificial neural network (NN) was developed to distinguish wild and modified diatoms based on the SEM images of frustules exhibiting phenotypic changes caused by a specific protein (Thaps3_21880), resulting in 94% detection accuracy. Class activation maps visualized physical changes that allowed the NNs to separate diatom strains, subsequently establishing a specific gene that controls pores. A further NN was created to batch process image data, automatically recognize pores, and extract pore-related parameters. Class interrelationship of the extracted paraments was visualized using a multivariate data visualization tool, called CrossVis, and allowed to directly link changes in morphological diatom phenotype of pore size and distribution with changes in the genotype.