Mega2: validated data-reformatting for linkage and association analyses
© Baron et al.; licensee BioMed Central Ltd. 2014
Received: 18 July 2014
Accepted: 14 November 2014
Published: 5 December 2014
In a typical study of the genetics of a complex human disease, many different analysis programs are used, to test for linkage and association. This requires extensive and careful data reformatting, as many of these analysis programs use differing input formats. Writing scripts to facilitate this can be tedious, time-consuming, and error-prone. To address these issues, the open source Mega2 data reformatting program provides validated and tested data conversions from several commonly-used input formats to many output formats.
Mega2, the Manipulation Environment for Genetic Analysis, facilitates the creation of analysis-ready datasets from data gathered as part of a genetic study. It transparently allows users to process genetic data for family-based or case/control studies accurately and efficiently. In addition to data validation checks, Mega2 provides analysis setup capabilities for a broad choice of commonly-used genetic analysis programs. First released in 2000, Mega2 has recently been significantly improved in a number of ways. We have rewritten it in C++ and have reduced its memory requirements. Mega2 now can read input files in LINKAGE, PLINK, and VCF/BCF formats, as well as its own specialized annotated format. It supports conversion to many commonly-used formats including SOLAR, PLINK, Merlin, Mendel, SimWalk2, Cranefoot, IQLS, FBAT, MORGAN, BEAGLE, Eigenstrat, Structure, and PLINK/SEQ. When controlled by a batch file, Mega2 can be used non-interactively in data reformatting pipelines. Support for genetic data from several other species besides humans has been added.
By providing tested and validated data reformatting, Mega2 facilitates more accurate and extensive analyses of genetic data, avoiding the need to write, debug, and maintain one’s own custom data reformatting scripts.
Mega2 is freely available at https://watson.hgen.pitt.edu/register/.
KeywordsSoftware Linkage Association Human Genetics Data management
The gene-discovery process is very well advanced at the data-generation end with sophisticated database management systems, laboratory information management systems, and bioinformatics tools. There has also been enormous progress in terms of analytical software. However, very little has been done to facilitate the efficient transfer of data from the generation stage to the analysis stage; analysis programs have diverse and stringent requirements (not always clearly documented) on how the input data should be formatted, which is often very different from how the generated data are formatted. Researchers face the need to collect and collate genetic data from diverse sources, and this need has increased significantly as rapidly improving technology generates orders of magnitude more data. As new analysis programs come into being, data setup and organization continues to be an error-prone and very time-consuming task if performed manually, but ideal for well-tested computer automation.
In the course of a single study of the genetics of a complex disease, the optimal analysis might require use of several different programs. For example, one might want to use pedstats  to check for data validity, PREST , to check for relationship errors, SOLAR , to test for linkage, and Mendel - to test for association in the presence of linkage. Each provides the best possible analysis but also has its own strict input format requirements, so there is great value in being able to quickly and easily convert one’s data format as required.
To meet these needs, we developed Mega2, the Manipulation Environment for Genetic Analysis ,, which automates common data reformatting tasks, thereby accelerating analyses, saving time, and reducing errors. We describe here recent major updates to Mega2, which include improvements in memory efficiency, improved support for commonly used input formats such as PLINK and VCF, and addition of several more target output formats.
Mega2 was originally released in January 2000, and has undergone continuous revisions since. Mega2 was originally written in C, but now has been written in C++, allowing us to now use modern object-oriented programming techniques. Mega2 was designed to be used in a Unix environment, and so for extended functionality, such as plotting results with R or running generated scripts, uses a few other programs commonly available in the Unix environment, such as Perl, Awk, Python, tcsh and bash-shells, and R. Perl is used for producing formatted output such as tables and HTML reports, and R is used to create graphical output using our R “nplplot” package. The currently released version (4.7.1) of Mega2 is available in Additional file 1; for updated versions, please visit the project home page as listed in the “Availability and requirements” section.
Mega2 was originally written without much attention to memory efficiency, as at that time a genome-wide scan consisted of only several hundred markers. Thus, Mega2’s memory usage was initially on the order of people x allele x 8 bytes, as each person/allele combination was assigned a pointer to the allele label. For two-allele marker data, we have markedly reduced memory requirements by replacing each pair of 8 byte pointers with a 2 bit index specifying which alleles the individual has. We also allow the user to switch out of the 2 bit mode if they want to work with more highly polymorphic markers. Further memory efficiencies have been gained by not storing (unknown) genotypes for completely untyped individuals, but who are still needed to specify the pedigree structure. As a result of these improvements, Mega2 now can handle genome-wide scale data – for example, 895K two-allele markers on 3.1K people requires only 1.12 Gb of memory for Mega2 processing.
Mega2 can now read data in from a wider variety of input formats. Many researchers now have their data in PLINK-format , so we have extended Mega2 to support reading PLINK input files. Mega2 now directly processes PLINK ‘ped’ and binary input formats. Mega2 also supports PLINK phenotype files, as well as Mega2-format map files that specify a sex-specific genetic map. Furthermore, we recently added support for reading Variant Call Format (VCF) files and their binary compressed equivalent, BCF; most sequencing-based data are now in VCF/BCF format .
Mega2 currently supports 37 output targets seven new ones have been added since 2011
SimWalk2 format 
Mendel format 
Vintage MENDEL format 
Vitesse format 
Cranefoot format 
GeneHunter-Plus format 
Testing loci for HWE
Mega2 annotated format
Allegro format 
Conversion to nuclear families
MLBQTL format 
PLINK format  (binary added 1/13)
SAGE format 
FBAT format  (added 1/13)
SPLINK format 
Morgan format  (added 6/13)
SIMULATE format 
Merlin format 
Old SAGE format
Loki format 
PLINK/SEQ format  (added 10/13)
The Mega2 distribution package has been updated to provide greater ease of installation and compatibility with many Unix environments. It contains added support for migration of legacy input data to our updated formats.
In applied data analysis, a thorough analysis often requires the use of multiple different programs, many of which have their own precise input format requirements. Reformatting programs such as Mega2 can markedly accelerate analyses by providing accurate, quick, and error-free conversion routines. This need has been recognized in the area of population genetics, where several reformatting programs have been written -, including one that converts to 52 different formats . In the area of human genetics, limited reformatting options have been made available as part of larger database systems. For example, the GeneLink database system  initially exported into LINKAGE -,, GAS , or RelCheck , formats, while the Integrated Genotyping System  exported into several formats, including Merlin , GeneHunter ,, QTDT , and Transmit  formats. However, these database systems can be difficult to install and maintain. Other more stand-alone approaches to reformatting in this area include SIB-PAIR , by David Duffy, which is a command-line oriented program that can create locus and pedigree files in a variety of formats, such as FISHER , GAS , Genehunter ,, LINKAGE -,, LOKI , MENDEL , MERLIN , PAP , and SAGE . SIB-PAIR appears to require very detailed line-by-line commands that would make it harder to use than Mega2 for most users. Another program is fcGENE  (available from SourceForge), which is focused on converting PLINK-format data for imputation (MaCH , IMPUTE , BEAGLE ,-, BIMBAM ), and then converting the resulting imputed data into the following formats: PLINK , SNPTEST , HAPLOVIEW , EIGENSOFT ,, GenABEL , and VCF . While fcGENE is fast and easy to use, it is currently limited (e.g., it does not accept VCF or LINKAGE format as input, it only supports a single dichotomous phenotype, it does not support selection by chromosome, etc.).
ALOHOMORA  provides an elegant interface for carrying out linkage analyses of Affymetrix 10K single nucleotide polymorphism (SNP) genotype data. This program actually uses Mega2 as its internal reformatting engine for some of its options.
PLINK  is an association analysis toolset that has a variety of data management and filtering options for handling large-scale SNP data. The main focus of PLINK is population-based unrelated samples, with some support for family-based association testing. PLINK only exports data in a few limited formats. We have used PLINK on our family data to carry out data cleaning, but then still needed Mega2 to reformat the data in order to carry out analyses using other external programs. In our experience, it is difficult to use PLINK on family data while maintaining the original pedigree structures upon output, as PLINK favors automatically filtering out individuals with low genotyping success rates (such as untyped founders).
From this brief survey of currently available data reformatting software, two things are immediately apparent: many researchers have recognized the need for providing one’s data in many different formats; and Mega2, which is free, open source, and available on Unix, Windows, and Macintosh platforms, is well-positioned to continue to fill this need.
When carrying out quality control and statistical analyses for a genetic study of a human disease, one quickly discovers that data organization and analysis set-up is a critical, time-consuming, and extremely tedious task. Furthermore, one often needs to use several different analysis programs, each with its own idiosyncratic input format requirements. To meet these needs, we developed Mega2, taking the time to carefully understand the precise (sometimes poorly documented) requirements of each target format, implementing our data reformatting pipeline in tested and well-documented code. Mega2’s tested and validated data conversion options expands the universe of possible analyses for the average researcher by removing the hurdle of having to tediously write, check, debug, and maintain their own conversion scripts.
Availability and requirements
Project name: Mega2
Project home page: https://watson.hgen.pitt.edu/register/
Operating systems: Linux, Macintosh OS X, Windows, Solaris
Programming language: C++
Other requirements: R, Perl, Python, awk, bash, and csh
License: GNU GPL v3
Any restrictions to use by non-academics: None.
This work was supported by the National Institutes of Health grant R01 GM076667 (P.I. Daniel E. Weeks) and the University of Pittsburgh. We thank Lee Almasy, Mark Schroeder, and William P. Mulvihill for early contributions as programmers to our Mega2 project.
- Wigginton JE, Abecasis GR: PEDSTATS: descriptive statistics, graphics and quality assessment for gene mapping data. Bioinformatics. 2005, 21 (16): 3445-3447. 10.1093/bioinformatics/bti529.View ArticlePubMedGoogle Scholar
- Sun L, Wilder K, McPeek MS: Enhanced pedigree error detection. Hum Hered. 2002, 54 (2): 99-110. 10.1159/000067666.View ArticlePubMedGoogle Scholar
- McPeek MS, Sun L: Statistical tests for detection of misspecified relationships by use of genome-screen data. Am J Hum Genet. 2000, 66 (3): 1076-1094. 10.1086/302800.PubMed CentralView ArticlePubMedGoogle Scholar
- Almasy L, Blangero J: Multipoint quantitative-trait linkage analysis in general pedigrees. Am J Hum Genet. 1998, 62 (5): 1198-1211. 10.1086/301844.PubMed CentralView ArticlePubMedGoogle Scholar
- Blangero J, Almasy L: Multipoint oligogenic linkage analysis of quantitative traits. Genet Epidemiol. 1997, 14 (6): 959-964. 10.1002/(SICI)1098-2272(1997)14:6<959::AID-GEPI66>3.0.CO;2-K.View ArticlePubMedGoogle Scholar
- Lange K, Papp JC, Sinsheimer JS, Sripracha R, Zhou H, Sobel EM: Mendel: the Swiss army knife of genetic analysis programs. Bioinformatics. 2013, 29 (12): 1568-1570. 10.1093/bioinformatics/btt187.PubMed CentralView ArticlePubMedGoogle Scholar
- Lange K, Cantor R, Horvath S, Perola M, Sabatti C, Sinsheimer J, Sobel E: MENDEL version 4.0: A complete package for the exact genetic analysis of discrete traits in pedigree and population data sets. Am J Hum Genet. 2001, 69 (Suppl): 504.Google Scholar
- Lange K, Weeks D, Boehnke M: Programs for pedigree analysis: MENDEL, FISHER, and dGENE. Genet Epidemiol. 1988, 5: 471-472. 10.1002/gepi.1370050611.View ArticlePubMedGoogle Scholar
- Mukhopadhyay N, Almasy L, Schroeder M, Mulvihill WP, Weeks DE: Mega2: data-handling for facilitating genetic linkage and association analyses. Bioinformatics. 2005, 21 (10): 2556-2557. 10.1093/bioinformatics/bti364.View ArticlePubMedGoogle Scholar
- Mukhopadhyay N, Almasy L, Schroeder M, Mulvihill WP, Weeks DE: Mega2, a data-handling program for facilitating genetic linkage and association analyses. Am J Hum Genet. 1999, 65: A436.Google Scholar
- Lathrop GM, Lalouel J-M: Easy calculations of lod scores and genetic risks on small computers. Am J Hum Genet. 1984, 36: 460-465.PubMed CentralPubMedGoogle Scholar
- Lathrop GM, Lalouel JM, Julier C, Ott J: Strategies for multilocus linkage analysis in humans. Proc Natl Acad Sci U S A. 1984, 81: 3443-3446. 10.1073/pnas.81.11.3443.PubMed CentralView ArticlePubMedGoogle Scholar
- Lathrop GM, Lalouel JM: Efficient computations in multilocus linkage analysis. Am J Hum Genet. 1988, 42: 498-505.PubMed CentralPubMedGoogle Scholar
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007, 81 (3): 559-575. 10.1086/519795.PubMed CentralView ArticlePubMedGoogle Scholar
- Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, Genomes Project Analysis G: The variant call format and VCFtools. Bioinformatics. 2011, 27 (15): 2156-2158. 10.1093/bioinformatics/btr330.PubMed CentralView ArticlePubMedGoogle Scholar
- Makinen VP, Parkkonen M, Wessman M, Groop PH, Kanninen T, Kaski K: High-throughput pedigree drawing. Eur J Hum Genet. 2005, 13 (8): 987-989. 10.1038/sj.ejhg.5201430.View ArticlePubMedGoogle Scholar
- Wang Z, McPeek MS: An incomplete-data quasi-likelihood approach to haplotype-based genetic association studies on related Individuals. J Am Stat Assoc. 2009, 104 (487): 1251-1260. 10.1198/jasa.2009.tm08507.PubMed CentralView ArticlePubMedGoogle Scholar
- Abney MA, Ober C, McPeek MS: Homozygosity mapping of quantitative trait loci in complex inbred pedigrees. Am J Hum Genet. 2000, 67 (Suppl 2): 327.Google Scholar
- Wang Z, McPeek MS: ATRIUM: testing untyped SNPs in case-control association studies with related individuals. Am J Hum Genet. 2009, 85 (5): 667-678. 10.1016/j.ajhg.2009.10.006.PubMed CentralView ArticlePubMedGoogle Scholar
- Laird NM, Horvath S, Xu X: Implementing a unified approach to family-based tests of association. Genet Epidemiol. 2000, 19 (Suppl 1): S36-42. 10.1002/1098-2272(2000)19:1+<::AID-GEPI6>3.0.CO;2-M.View ArticlePubMedGoogle Scholar
- Thompson EA: Statistical inference from genetic data on pedigrees, vol. 6. 2000, Institute of Mathematical Sciences and the American Statistical Association, Beechwood, OHGoogle Scholar
- Browning BL, Browning SR: Efficient multilocus association testing for whole genome association studies using localized haplotype clustering. Genet Epidemiol. 2007, 31 (5): 365-375. 10.1002/gepi.20216.View ArticlePubMedGoogle Scholar
- Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006, 38 (8): 904-909. 10.1038/ng1847.View ArticlePubMedGoogle Scholar
- Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet. 2006, 2 (12): e190-10.1371/journal.pgen.0020190.PubMed CentralView ArticlePubMedGoogle Scholar
- Pritchard JK, Stephens M, Donnelly P: Inference of population structure using multilocus genotype data. Genetics. 2000, 155 (2): 945-959.PubMed CentralPubMedGoogle Scholar
- Falush D, Stephens M, Pritchard JK: Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003, 164 (4): 1567-1587.PubMed CentralPubMedGoogle Scholar
- PLINK/SEQ: A library for the analysis of genetic variation data; [http://atgu.mgh.harvard.edu/plinkseq/]
- Sobel E, Lange K: Descent graphs in pedigree analysis: Applications to haplotyping, location scores, and marker-sharing statistics. Am J Hum Genet. 1996, 58 (6): 1323-1337.PubMed CentralPubMedGoogle Scholar
- O’Connell JR, Weeks DE: The VITESSE algorithm for rapid exact multilocus linkage analysis via genotype set-recoding and fuzzy inheritance. Nat Genet. 1995, 11: 402-408. 10.1038/ng1295-402.View ArticlePubMedGoogle Scholar
- Lemire M: SUP: an extension to SLINK to allow a larger number of marker loci to be simulated in pedigrees conditional on trait values. BMC Genet. 2006, 7: 40-10.1186/1471-2156-7-40.PubMed CentralView ArticlePubMedGoogle Scholar
- Schäffer AA, Lemire M, Ott J, Lathrop GM, Weeks DE: Coordinated conditional simulation with SLINK and SUP of many markers linked or associated to a trait in large pedigrees. Hum Hered. 2011, 71 (2): 126-134. 10.1159/000324177.PubMed CentralView ArticlePubMedGoogle Scholar
- Kong A, Cox NJ: Allele-sharing models: LOD scores and accurate linkage tests. Am J Hum Genet. 1997, 61 (5): 1179-1188. 10.1086/301592.PubMed CentralView ArticlePubMedGoogle Scholar
- Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES: Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet. 1996, 58: 1347-1363.PubMed CentralPubMedGoogle Scholar
- Kruglyak L, Lander ES: Faster multipoint linkage analysis using Fourier transforms. J Comput Biol. 1998, 5 (1): 1-7. 10.1089/cmb.1998.5.1.View ArticlePubMedGoogle Scholar
- Gudbjartsson DF, Jonasson K, Frigge ML, Kong A: Allegro, a new computer program for multipoint linkage analysis. Nat Genet. 2000, 25 (1): 12-13. 10.1038/75514.View ArticlePubMedGoogle Scholar
- Abney M, McPeek MS, Ober C: Estimation of variance components of quantitative traits in inbred populations. Am J Hum Genet. 2000, 66 (2): 629-650. 10.1086/302759.PubMed CentralView ArticlePubMedGoogle Scholar
- Alcais A, Abel L: Maximum-Likelihood-Binomial method for genetic model-free linkage analysis of quantitative traits in sibships. Genet Epidemiol. 1999, 17 (2): 102-117. 10.1002/(SICI)1098-2272(1999)17:2<102::AID-GEPI2>3.0.CO;2-6.View ArticlePubMedGoogle Scholar
- Weeks DE, Ott J, Lathrop GM: SLINK: a general simulation program for linkage analysis. Am J Hum Genet. 1990, 47 (3): A204.Google Scholar
- S.A.G.E: Statistical Analysis for Genetic Epidemiology; [http://darwin.cwru.edu/sage/]
- Holmans P: Asymptotic properties of affected-sib-pair linkage analysis. Am J Hum Genet. 1993, 52 (2): 362-374.PubMed CentralPubMedGoogle Scholar
- Browning BL, Browning SR: A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am J Hum Genet. 2009, 84 (2): 210-223. 10.1016/j.ajhg.2009.01.005.PubMed CentralView ArticlePubMedGoogle Scholar
- Browning SR, Browning BL: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet. 2007, 81 (5): 1084-1097. 10.1086/521987.PubMed CentralView ArticlePubMedGoogle Scholar
- Browning SR, Briley JD, Briley LP, Chandra G, Charnecki JH, Ehm MG, Johansson KA, Jones BJ, Karter AJ, Yarnall DP, Wagner MJ: Case-control single-marker and haplotypic association analysis of pedigree data. Genet Epidemiol. 2005, 28 (2): 110-122. 10.1002/gepi.20051.View ArticlePubMedGoogle Scholar
- Terwilliger JD, Speer M, Ott J: Chromosome-based method for rapid computer simulation in human genetic linkage analysis. Genet Epidemiol. 1993, 10 (4): 217-224. 10.1002/gepi.1370100402.View ArticlePubMedGoogle Scholar
- Hasstedt SJ: jPAP: Document-driven software for genetic analysis. Genet Epidemiol. 2005, 29: 255.Google Scholar
- PAP: Pedigree Analysis Software; [http://hasstedt.genetics.utah.edu/]
- Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30 (1): 97-101. 10.1038/ng786.View ArticlePubMedGoogle Scholar
- Falush D, Stephens M, Pritchard JK: Inference of population structure using multilocus genotype data: dominant markers and null alleles. Mol Ecol Notes. 2007, 7 (4): 574-578. 10.1111/j.1471-8286.2007.01758.x.PubMed CentralView ArticlePubMedGoogle Scholar
- Heath SC: Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. Am J Hum Genet. 1997, 61 (3): 748-760. 10.1086/515506.PubMed CentralView ArticlePubMedGoogle Scholar
- Manoukis NC: FORMATOMATIC: a program for converting diploid allelic data between common formats for population genetic analysis. Mol Ecol Notes. 2007, 7 (4): 592-593. 10.1111/j.1471-8286.2007.01784.x.PubMed CentralView ArticlePubMedGoogle Scholar
- Coombs JA, Letcher BH, Nislow KH: CREATE: a software to create input files from diploid genotypic data for 52 genetic software programs. Mol Ecol Resour. 2008, 8 (3): 578-580. 10.1111/j.1471-8286.2007.02036.x.View ArticlePubMedGoogle Scholar
- Glaubitz JC: CONVERT: A user-friendly program to reformat diploid genotypic data for commonly used population genetic software packages. Mol Ecol Notes. 2004, 4 (2): 309-310. 10.1111/j.1471-8286.2004.00597.x.View ArticleGoogle Scholar
- Gillanders EM, Masiello A, Gildea D, Umayam L, Duggal P, Jones MP, Klein AP, Freas-Lutz D, Ibay G, Trout K, Wolfsberg TG, Trent JM, Bailey-Wilson JE, Baxevanis AD: GeneLink: a database to facilitate genetic studies of complex traits. BMC Genomics. 2004, 5 (1): 81-10.1186/1471-2164-5-81.PubMed CentralView ArticlePubMedGoogle Scholar
- Lathrop GM, Lalouel JM, Julier C, Ott J: Multilocus linkage analysis in humans: detection of linkage and estimation of recombination. Am J Hum Genet. 1985, 37 (3): 482-498.PubMed CentralPubMedGoogle Scholar
- GAS: Genetic Analysis System; [http://users.ox.ac.uk/~ayoung/gas.html]
- Epstein MP, Duren WL, Boehnke M: Improved inference of relationship for pairs of individuals. Am J Hum Genet. 2000, 67 (5): 1219-1231. 10.1086/321195.PubMed CentralView ArticlePubMedGoogle Scholar
- Boehnke M, Cox NJ: Accurate inference of relationships in sib-pair linkage studies. Am J Hum Genet. 1997, 61 (2): 423-429. 10.1086/514862.PubMed CentralView ArticlePubMedGoogle Scholar
- Fiddy S, Cattermole D, Xie D, Duan XY, Mott R: An integrated system for genetic analysis. BMC Bioinformatics. 2006, 7: 210-10.1186/1471-2105-7-210.PubMed CentralView ArticlePubMedGoogle Scholar
- Abecasis GR, Cardon LR, Cookson WO: A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000, 66 (1): 279-292. 10.1086/302698.PubMed CentralView ArticlePubMedGoogle Scholar
- Clayton D: A generalization of the transmission/disequilibrium test for uncertain-haplotype transmission. Am J Hum Genet. 1999, 65 (4): 1170-1177. 10.1086/302577.PubMed CentralView ArticlePubMedGoogle Scholar
- ᅟ: SIB-PAIR; [http://genepi.qimr.edu.au/staff/davidD/]
- fcGENE: Genotype format converter; [http://sourceforge.net/projects/fcgene/]
- Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR: MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol. 2010, 34 (8): 816-834. 10.1002/gepi.20533.PubMed CentralView ArticlePubMedGoogle Scholar
- Marchini J, Howie B: Genotype imputation for genome-wide association studies. Nat Rev Genet. 2010, 11 (7): 499-511. 10.1038/nrg2796.View ArticlePubMedGoogle Scholar
- Servin B, Stephens M: Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 2007, 3 (7): e114-10.1371/journal.pgen.0030114.PubMed CentralView ArticlePubMedGoogle Scholar
- Barrett JC, Fry B, Maller J, Daly MJ: Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005, 21 (2): 263-265. 10.1093/bioinformatics/bth457.View ArticlePubMedGoogle Scholar
- Aulchenko YS, Ripke S, Isaacs A, van Duijn CM: GenABEL: an R library for genome-wide association analysis. Bioinformatics. 2007, 23 (10): 1294-1296. 10.1093/bioinformatics/btm108.View ArticlePubMedGoogle Scholar
- Ruschendorf F, Nurnberg P: ALOHOMORA: a tool for linkage analysis using 10K SNP array data. Bioinformatics. 2005, 21 (9): 2123-2125. 10.1093/bioinformatics/bti264.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.