From The MarthLab
 Scotty: a web tool for designing RNA-Seq experiments to measure differential gene expression
Scotty is an interactive web-based application that assists biologists to design an experiment with an appropriate sample size and read depth to satisfy the user-defined experimental objectives. Busby MA, Stewart C, Miller CA, Grzeda KR, Marth GT. Bioinformatics. 2013 Feb 3.
 Scribl: an HTML5 Canvas-based graphics library for visualizing genomic data over the web
Using recent advances in core web technologies (HTML5), we developed Scribl, a flexible genomic visualization library specifically targeting coordinate-based data such as genomic features, DNA sequence and genetic variants. Miller CA, Anthony J, Meyer MM, Marth G. Bioinformatics. 2013 Feb 1.
 Targeted proteomic dissection of Toxoplasma cytoskeleton sub-compartments using MORN1
This study significantly contributes to the annotation of the unique cytoskeleton of Apicomplexa. Lorestani A, Ivey FD, Thirugnanam S, Busby MA, Marth GT, Cheeseman IM, Gubbels MJ. Cytoskeleton (Hoboken). 2012 Dec.
 Copy Number Variation detection from 1000 Genomes project exon capture sequencing data
This study demonstrates that exonic sequencing datasets, collected both in population based and medical sequencing projects, will be a useful substrate for detecting genic CNV events, particularly deletions. Wu J, Grzeda KR, Stewart C, Grubert F, Urban AE, Snyder MP, Marth GT. BMC Bioinformatics. 2012 Nov 17.
 An integrated map of genetic variation from 1,092 human genomes
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. 1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. Nature. 2012 Nov 1.
 The 1000 Genomes Project: data management and community access
The 1000 Genomes Project was launched as one of the largest distributed data collection and analysis projects ever undertaken in biology, and members of the project data coordination center have developed and deployed several tools to enable widespread data access. Clarke L, Zheng-Bradley X, Smith R, Kulesha E, Xiao C, Toneva I, Vaughan B, Preuss D, Leinonen R, Shumway M, Sherry S, Flicek P; 1000 Genomes Project Consortium. Nature Methods. 2012 Apr 27.
 ART: a next-generation sequencing read simulator
ART is a set of simulation tools that generate synthetic next-generation sequencing reads.. Huang W, Li L, Myers JR, Marth GT. Bioinformatics. 2012 Feb 15.
 A DOC2 protein identified by mutational profiling is essential for apicomplexan parasite exocytosis
The phenotype of a Toxoplasma gondii conditional mutant impaired in host cell invasion and egress was pinpointed to a defect in secretion of the micronemes, an apicomplexan-specific organelle that contains adhesion proteins. Farrell A, Thirugnanam S, Lorestani A, Dvorin JD, Eidell KP, Ferguson DJ, Anderson-White BR, Duraisingh MT, Marth GT, Gubbels MJ. Science. 2012 Jan 13.
 Expression divergence measured by transcriptome sequencing of four yeast species
We provide an improved methodology for measuring gene expression changes in evolutionary diverged species using RNA Seq, where experimental artifacts can mimic evolutionary effects. Busby MA, Gray JM, Costa AM, Stewart C, Stromberg MP, Barnett D, Chuang JH, Springer M, Marth GT. BMC Genomics. 2011 Dec 29.
 The functional spectrum of low-frequency coding variation
This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation. Marth GT, Yu F, Indap AR, Garimella K, Gravel S, Leong WF, Tyler-Smith C, Bainbridge M, Blackwell T, Zheng-Bradley X, Chen Y, Challis D, Clarke L, Ball EV, Cibulskis K, Cooper DN, Fulton B, Hartl C, Koboldt D, Muzny D, Smith R, Sougnez C, Stewart C, Ward A, Yu J, Xue Y, Altshuler D, Bustamante CD, Clark AG, Daly M, DePristo M, Flicek P, Gabriel S, Mardis E, Palotie A, Gibbs R; 1000 Genomes Project. Genome Biology. 2011 Sep 14.
 A comprehensive map of mobile element insertion polymorphisms in humans
Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods. Stewart C, Kural D, Strömberg MP, Walker JA, Konkel MK, Stütz AM, Urban AE, Grubert F, Lam HY, Lee WP, Busby M, Indap AR, Garrison E, Huff C, Xing J, Snyder MP, Jorde LB, Batzer MA, Korbel JO, Marth GT; 1000 Genomes Project. PLOS Genetics. 2011 Aug.
 The variant call format and VCFtools
VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R; 1000 Genomes Project Analysis Group. Bioinformatics. 2011 Aug 1.
 Demographic history and rare allele sharing among human populations
We examined the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. Gravel S, Henn BM, Gutenkunst RN, Indap AR, Marth GT, Clark AG, Yu F, Gibbs RA; 1000 Genomes Project, Bustamante CD. Proceedings of the National Academy of Sciences USA. 2011 Jul 19.
 Variation in genome-wide mutation rates within and between human families
We present the first direct comparative analysis of male and female germline mutation rates from the complete genome sequences of two parent-offspring trios. Conrad DF, Keebler JE, DePristo MA, Lindsay SJ, Zhang Y, Casals F, Idaghdour Y, Hartl CL, Torroja C, Garimella KV, Zilversmit M, Cartwright R, Rouleau GA, Daly M, Stone EA, Hurles ME, Awadalla P; 1000 Genomes Project. Nature Genetics. 2011 Jun 12.
 Introduction of a software suite for research analysis and data management using BAM files
BamTools: a C++ API and toolkit for analyzing and managing BAM files. Barnett D, Garrison E, Quinlan A, Strömberg M, Marth G. Bioinformatics. 2011 Apr 14. [Epub ahead of print]
 DNA as supramolecular scaffold for functional molecules: progress in DNA nanotechnology
This tutorial review focuses on the recent progress in this highly active field of research with an emphasis on covalent modifications of DNA. Bandy TJ, Brewer A, Burns JR, Marth G, Nguyen T, Stulz E. Chemical Society Reviews. 2011 Jan.
 Map of unbalanced SVs based on whole genome DNA sequencing
Mapping copy number variation by population-scale genome sequencing. Mills RE, Walter K, Stewart C, Handsaker RE, Chen K, Alkan C, Abyzov A, Yoon SC, Ye K, Cheetham RK, Chinwalla A, Conrad DF, Fu Y, Grubert F, Hajirasouliha I, Hormozdiari F, Iakoucheva LM, Iqbal Z, Kang S, Kidd JM, Konkel MK, Korn J, Khurana E, Kural D, Lam HY, Leng J, Li R, Li Y, Lin CY, Luo R, Mu XJ, Nemesh J, Peckham HE, Rausch T, Scally A, Shi X, Stromberg MP, Stütz AM, Urban AE, Walker JA, Wu J, Zhang Y, Zhang ZD, Batzer MA, Ding L, Marth GT, McVean G, Sebat J, Snyder M, Wang J, Ye K, Eichler EE, Gerstein MB, Hurles ME, Lee C, McCarroll SA, and Korbel JO; 1000 Genomes Project. Nature Methods. 2011;470:59-65.
 Diversity of human copy number variation and multicopy genes
Our approach makes ~1000 genes accessible to genetic studies of disease association. Sudmant PH, Kitzman JO, Antonacci F, Alkan C, Malig M, Tsalenko A, Sampas N, Bruhn L, Shendure J; 1000 Genomes Project, Eichler EE. Science. 2010 Oct 29.
 A map of human genome variation from population-scale sequencing
This pilot phase of the 1000 Genomes Project is designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. 1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA. Nature. 2010 Oct 28.
 Genome Variation Format (GVF) and the 10Gen dataset
A standard variation file format for human genome sequences. Reese MG, Moore B, Batchelor C, Salas F, Cunningham F, Marth GT, Stein L, Flicek P, Yandell M, Eilbeck K. Genome Biology. 2010;11:R88. [Epub ahead of print]
 The Sequence Alignment/Map format and SAMtools
The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R; 1000 Genome Project Data Processing Subgroup. Bioinformatics. 2009 Aug 15.
 Application of the Roche/454 platform to survey natural variation in strains of Drosophila melanogaster
Population genomic inferences from sparse high-throughput sequencing of two populations of Drosophila melanogaster. Sackton TB, Kulathinal RJ, Bergman CM, Quinlan AR, Dopman EB, Carneiro M, Marth GT, Hartl DL, Clark AG. Genome Biology and Evolution. 2009;1:449-65.
 Mutational profiling with next-generation DNA sequencers.
Rapid whole-genome mutational profiling using next-generation sequencing technologies. Douglas R. Smith, Aaron R. Quinlan, Heather E. Peckham, Kathryn Makowsky , Wei Tao, Betty Woolf, Lei Shen, William F. Donahue, and Nadeem Tusneem , Michael P. Stromberg, Donald A. Stewart, Lu Zheng, Swati S. Ranade, Jason B. Warner, Clarence C. Lee, Brittney E. Coleman, Zheng Zhang, Stephen F. McLaughlin , Joel A. Malek, Jon M. Sorenson, Alan P. Blanchard, Jarrod Chapman, David Hillman , Feng Chen, Daniel S. Rokhsar, Kevin J. McKernan, Thomas W. Jeffries, Gabor T. Marth, and Paul M. Richardson. Genome Research. 2008 Sep 4. [Epub ahead of print]
 A next-generation sequence assembly viewer program
EagleView: A genome assembly viewer for next-generation sequencing technologies. Huang W, Marth G. Genome Research. 2008;18(9):1538-43. Epub 2008 Jun 11.
 Whole-genome SNP calling in Illumina reads.
Whole-genome sequencing and variant discovery in C. elegans. Hillier LW, Marth GT, Quinlan AR, Dooling D, Fewell G, Barnett D, Fox P, Glasscock JI, Hickenbotham M, Huang W, Magrini VJ, Richt RJ, Sander SN, Stewart DA, Stromberg M, Tsung EF, Wylie T, Schedl T, Wilson RK, Mardis ER. Nature Methods. 2008;5:183-8.
 A base caller program for 454 reads.
PYROBAYES: An improved base-caller for SNP discovery in pyrosequences. Quinlan AR, Stewart DA, Strömberg MP, Marth GT. Nature Methods. 2008;5:179-81.
 Missing SNPs because of heterozygosity in PCR primer binding sites
Primer-site SNPs mask mutations. Quinlan, AR, Marth, G.T. Nature Methods 4, 192 (Mar, 2007)
 Analysis of concordance of different haplotype block partitioning algorithms
We simulated 1000 haplotypes using the standard coalescent for three world populations and applied three classes of block partitioning algorithms, assessing algorithm differences in number, size, and coverage of blocks inferred under different conditions of SNP density, allele frequency, and sample size. Indap AR, Marth GT, Struble CA, Tonellato P, Olivier M. BMC Bioinformatics. 2005 Dec 15.
 Reconstruction of demographic history from the SNP allele frequency spectrum of three world populations
The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Marth, G.T., Czabarka, E., Murvai, J., Sherry, S.T. Genetics 166, 351-372 (2004)
 A high-density, high-quality microsatellite map of the human genome
STRP screening sets for the human genome at 5 cM density. Ghebranious N., Vaske D., Yu A, Zhao .C, Marth G., Weber J.L. BMC Genomics 4, (2003)
 SNP discovery in overlapping sections of BAC clones sequenced by the Human Genome Project. Population genetic inference from polymorphism density distributions
Sequence variations in the public human genome data reflect a bottlenecked population history. Marth, G.T., Cutler, D., Wooding, S., Schuler, G., Yeh, R., Davenport, R., Agarwala, R., Church, D., Wheelan, S., Baker, J., Ward, M., Kholodov, M., Phan, L., Czabarka, E., Murvai, J., Cutler, D., Wooding, S., Rogers, A., Chakravarti, A., Harpending, H.C., Kwok, P-Y., Sherry, S.T. PNAS 100, 376-381 (2003)
 A review of SNP mining methods and data sources
Computational SNP discovery in DNA sequence data. Gabor T. Marth. In: Single Nucleotide Polymorphisms: Methods and Protocols (ed. Kwok, P.Y.), Humana Press 2002.
 Discovery and characterization of short diallelic insertions and deletions
Human diallelic insertion/deletion polymorphisms. American Journal of Human Genetics. Weber, J.L., David, D., Heil, J., Fan, Y., Zhao, C. & Marth, G.T. American Journal of Human Genetics 71, 854-62 (2002)
 Validation and population-specific allele frequency estimation for hundreds of SNPs found by The SNP Consortium
Single-nucleotide polymorphisms in the public domain: how useful are they?. Marth, G, Yeh, R, Minton, M, Donaldson, R, Li, Q, Duan, S., Davenport R, Miller RD, Kwok PY. Nature Genetics 27, 371-2 (2001)
 The first high-density SNP map of the Human genome
A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Sachidanandam, R, Weissman, D, Schmidt, SC, Kakol, JM, Stein, LD, Marth, G, Sherry S, Mullikin JC, Mortimore BJ, Willey DL, Hunt SE, Cole CG, Coggill PC, Rice CM, Ning Z, Rogers J, Bentley DR, Kwok PY, Mardis ER, Yeh RT, Schultz B, Cook L, Davenport R, Dante M, Fulton L, Hillier L, Waterston RH, McPherson JD, Gilman B, Schaffner S, Van Etten WJ, Reich D, Higgins J, Daly MJ, Blumenstiel B, Baldwin J, Stange-Thomann N, Zody MC, Linton L, Lander ES, Altshuler D; The International SNP Map Working Group. Nature 409, 928-33 (2001)
 The PolyBayes SNP discovery algorithm
A general approach to single-nucleotide polymorphism discovery. Gabor T. Marth, Mark D. Yandell, Ian Korf, Zhijie Gu, Raymond T. Yeh, Hamideh Zakeri, Nathan O. Stitziel, LaDeana Hillier, Pui-Yan Kwok and Warren Gish. Nature Genetics 23, 452-456 (1999)
 The Common Assembly Format (CAF)
Sequence assembly with CAFTOOLS. Dear, S., Durbin, R., Hillier, L., Marth, G., Thierry-Mieg, J. & Mott, R.. Genome Research 8, 260-7 (1998)