On two dimensional mappings of SNP marker data and CNNs : Overcoming the limitations of existing methods using Fermat distance.

Elenter, Juan - Etchebarne, Guillermo - Hounie, Ignacio - Fariello, María Inés - Lecumberry, Federico

Resumen:

In recent years, Convolutional Neural Networks have attracted great attention establishing state-of-the-art results in many fields, most notably, in Computer Vision.In an attempt to leverage their success and ubiquity, approaches mapping non-euclidian data into two dimensional image-like feature maps, which are used as inputs to CNN architectures, have been proposed. Such mappings include common dimensionality reduction techniques such as PCA and t-SNE. CNN models trained on these feature maps have been found to perform well on a variety of tasks, ranging from text analysis to tumor classification using gene expression data.We assess these techniques in the context of genome enabled prediction of complex traits, finding that they do not outperform mapping SNP markers to pixels randomly. We also tested random mappings on a synthetic dataset commonly used for benchmarking, with the same outcome. These results contradict the claim that said approach is able to recover and exploit local structure. To account for both the underlying manifold and density from which data is sampled, we propose a method to construct these mappings based on Fermat distance. Our method outperforms other mappings, and thus presents a promising alternative which may potentiate the use of 2D-CNNs on SNP markers and other types of genetic data


Detalles Bibliográficos
2021
Este trabajo fue parcialmente financiado por el proyecto ANII FSDA 1-2018-1-154364.
Genomic prediction
CNN
Dimensionality reduction
Inglés
Universidad de la República
COLIBRI
https://meetings.cshl.edu/meetings.aspx?meet=PROBGEN&year=21
https://hdl.handle.net/20.500.12008/36813
Acceso abierto
Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)