Quality control of genotypes using heritability estimates of gene content at the marker.

FORNERIS, N. S. - LEGARRA, A. - VITEZICA, Z. G. - TSURUTA, S. - AGUILAR, I. - MISZTAL, I. - CANTET, R. J. C.

Resumen:

ABSTRACT Quality control filtering of single-nucleotide polymorphisms (SNPs) is a key step when analyzing genomic data. Here we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1, or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses restricted maximum likelihood (REML) to estimate heritability of gene content at each SNP and also builds a likelihood-ratio test statistic to test for zero error variance in genotyping. As a by-product, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 0.96 (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real data set with genotypes from 3534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip and a pedigree of 6473 individuals; those markers underwent very little quality control. A total of 4099 markers with P-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses all information in the population simultaneously, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided.Copyright © 2015 by the Genetics Society of America


Detalles Bibliográficos
2015
GENE CONTENT
QUALITY CONTROL
SNP
GENOMIC SELECTION
REML
SHARED DATA RESOURCE
GENPRED
MEJORAMIENTO GENETICO ANIMAL
Inglés
Instituto Nacional de Investigación Agropecuaria
AINFO
http://www.ainfo.inia.uy/consulta/busca?b=pc&id=54004&biblioteca=vazio&busca=54004&qFacets=54004
Acceso abierto
_version_ 1805580524410896384
author FORNERIS, N. S.
author2 LEGARRA, A.
VITEZICA, Z. G.
TSURUTA, S.
AGUILAR, I.
MISZTAL, I.
CANTET, R. J. C.
author2_role author
author
author
author
author
author
author_facet FORNERIS, N. S.
LEGARRA, A.
VITEZICA, Z. G.
TSURUTA, S.
AGUILAR, I.
MISZTAL, I.
CANTET, R. J. C.
author_role author
bitstream.checksum.fl_str_mv 17d0bb645bc77bf52fe8aaa57c464465
bitstream.checksumAlgorithm.fl_str_mv MD5
bitstream.url.fl_str_mv https://redi.anii.org.uy/jspui/bitstream/20.500.12381/2468/1/sword-2022-12-16T17%3a42%3a15.original.xml
collection AINFO
dc.creator.none.fl_str_mv FORNERIS, N. S.
LEGARRA, A.
VITEZICA, Z. G.
TSURUTA, S.
AGUILAR, I.
MISZTAL, I.
CANTET, R. J. C.
dc.date.accessioned.none.fl_str_mv 2022-12-16T20:42:15Z
dc.date.available.none.fl_str_mv 2022-12-16T20:42:15Z
dc.date.issued.none.fl_str_mv 2015
dc.date.updated.none.fl_str_mv 2022-12-16T20:42:15Z
dc.description.abstract.none.fl_txt_mv ABSTRACT Quality control filtering of single-nucleotide polymorphisms (SNPs) is a key step when analyzing genomic data. Here we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1, or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses restricted maximum likelihood (REML) to estimate heritability of gene content at each SNP and also builds a likelihood-ratio test statistic to test for zero error variance in genotyping. As a by-product, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 0.96 (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real data set with genotypes from 3534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip and a pedigree of 6473 individuals; those markers underwent very little quality control. A total of 4099 markers with P-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses all information in the population simultaneously, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided.Copyright © 2015 by the Genetics Society of America
dc.identifier.none.fl_str_mv http://www.ainfo.inia.uy/consulta/busca?b=pc&id=54004&biblioteca=vazio&busca=54004&qFacets=54004
dc.language.iso.none.fl_str_mv en
eng
dc.rights.es.fl_str_mv Acceso abierto
dc.rights.none.fl_str_mv info:eu-repo/semantics/openAccess
dc.source.none.fl_str_mv reponame:AINFO
instname:Instituto Nacional de Investigación Agropecuaria
instacron:Instituto Nacional de Investigación Agropecuaria
dc.subject.none.fl_str_mv GENE CONTENT
QUALITY CONTROL
SNP
GENOMIC SELECTION
REML
SHARED DATA RESOURCE
GENPRED
MEJORAMIENTO GENETICO ANIMAL
dc.title.none.fl_str_mv Quality control of genotypes using heritability estimates of gene content at the marker.
dc.type.none.fl_str_mv Article
PublishedVersion
info:eu-repo/semantics/article
dc.type.version.none.fl_str_mv info:eu-repo/semantics/publishedVersion
description ABSTRACT Quality control filtering of single-nucleotide polymorphisms (SNPs) is a key step when analyzing genomic data. Here we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1, or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses restricted maximum likelihood (REML) to estimate heritability of gene content at each SNP and also builds a likelihood-ratio test statistic to test for zero error variance in genotyping. As a by-product, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 0.96 (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real data set with genotypes from 3534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip and a pedigree of 6473 individuals; those markers underwent very little quality control. A total of 4099 markers with P-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses all information in the population simultaneously, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided.Copyright © 2015 by the Genetics Society of America
eu_rights_str_mv openAccess
format article
id INIAOAI_d9eeb8124a0d4a4023aa62c88d79e370
instacron_str Instituto Nacional de Investigación Agropecuaria
institution Instituto Nacional de Investigación Agropecuaria
instname_str Instituto Nacional de Investigación Agropecuaria
language eng
language_invalid_str_mv en
network_acronym_str INIAOAI
network_name_str AINFO
oai_identifier_str oai:redi.anii.org.uy:20.500.12381/2468
publishDate 2015
reponame_str AINFO
repository.mail.fl_str_mv lorrego@inia.org.uy
repository.name.fl_str_mv AINFO - Instituto Nacional de Investigación Agropecuaria
repository_id_str
rights_invalid_str_mv Acceso abierto
spelling 2022-12-16T20:42:15Z2022-12-16T20:42:15Z20152022-12-16T20:42:15Zhttp://www.ainfo.inia.uy/consulta/busca?b=pc&id=54004&biblioteca=vazio&busca=54004&qFacets=54004ABSTRACT Quality control filtering of single-nucleotide polymorphisms (SNPs) is a key step when analyzing genomic data. Here we present a practical method to identify low-quality SNPs, meaning markers whose genotypes are wrongly assigned for a large proportion of individuals, by estimating the heritability of gene content at each marker, where gene content is the number of copies of a particular reference allele in a genotype of an animal (0, 1, or 2). If there is no mutation at the marker, gene content has an additive heritability of 1 by construction. The method uses restricted maximum likelihood (REML) to estimate heritability of gene content at each SNP and also builds a likelihood-ratio test statistic to test for zero error variance in genotyping. As a by-product, estimates of the allele frequencies of markers at the base population are obtained. Using simulated data with 10% permutation error (4% actual error) in genotyping, the method had a specificity of 0.96 (4% of correct markers are rejected) and a sensitivity of 0.99 (1% of wrong markers are accepted) if markers with heritability lower than 0.975 are discarded. Checking of Mendelian errors resulted in a lower sensitivity (0.84) for the same simulation. The proposed method is further illustrated with a real data set with genotypes from 3534 animals genotyped for 50,433 markers from the Illumina PorcineSNP60 chip and a pedigree of 6473 individuals; those markers underwent very little quality control. A total of 4099 markers with P-values lower than 0.01 were discarded based on our method, with associated estimates of heritability as low as 0.12. Contrary to other techniques, our method uses all information in the population simultaneously, can be used in any population with markers and pedigree recordings, and is simple to implement using standard software for REML estimation. Scripts for its use are provided.Copyright © 2015 by the Genetics Society of Americahttps://hdl.handle.net/20.500.12381/2468enenginfo:eu-repo/semantics/openAccessAcceso abiertoGENE CONTENTQUALITY CONTROLSNPGENOMIC SELECTIONREMLSHARED DATA RESOURCEGENPREDMEJORAMIENTO GENETICO ANIMALQuality control of genotypes using heritability estimates of gene content at the marker.ArticlePublishedVersioninfo:eu-repo/semantics/articleinfo:eu-repo/semantics/publishedVersionreponame:AINFOinstname:Instituto Nacional de Investigación Agropecuariainstacron:Instituto Nacional de Investigación AgropecuariaFORNERIS, N. S.LEGARRA, A.VITEZICA, Z. G.TSURUTA, S.AGUILAR, I.MISZTAL, I.CANTET, R. J. C.SWORDsword-2022-12-16T17:42:15.original.xmlOriginal SWORD entry documentapplication/octet-stream3369https://redi.anii.org.uy/jspui/bitstream/20.500.12381/2468/1/sword-2022-12-16T17%3a42%3a15.original.xml17d0bb645bc77bf52fe8aaa57c464465MD5120.500.12381/24682022-12-16 17:42:16.073oai:redi.anii.org.uy:20.500.12381/2468Gobiernohttp://inia.uyhttps://redi.anii.org.uy/oai/requestlorrego@inia.org.uyUruguayopendoar:2022-12-16T20:42:16AINFO - Instituto Nacional de Investigación Agropecuariafalse
spellingShingle Quality control of genotypes using heritability estimates of gene content at the marker.
FORNERIS, N. S.
GENE CONTENT
QUALITY CONTROL
SNP
GENOMIC SELECTION
REML
SHARED DATA RESOURCE
GENPRED
MEJORAMIENTO GENETICO ANIMAL
status_str publishedVersion
title Quality control of genotypes using heritability estimates of gene content at the marker.
title_full Quality control of genotypes using heritability estimates of gene content at the marker.
title_fullStr Quality control of genotypes using heritability estimates of gene content at the marker.
title_full_unstemmed Quality control of genotypes using heritability estimates of gene content at the marker.
title_short Quality control of genotypes using heritability estimates of gene content at the marker.
title_sort Quality control of genotypes using heritability estimates of gene content at the marker.
topic GENE CONTENT
QUALITY CONTROL
SNP
GENOMIC SELECTION
REML
SHARED DATA RESOURCE
GENPRED
MEJORAMIENTO GENETICO ANIMAL
url http://www.ainfo.inia.uy/consulta/busca?b=pc&id=54004&biblioteca=vazio&busca=54004&qFacets=54004