Tools Development


A graphical user interface (GUI) is being developed to serve as a front-end for the TagZilla program, which may currently be run only on the command-line. This tool is being developed for internal use, but will eventually be made publicly-available.

Genotype Library and Utilities (GLU)

Whole-genome association studies are generating unprecedented amounts of genotype data, frequently billions of genotypes per study, and require new and scalable computational approaches to address the storage, management, quality control, and genetic analysis. GLU is a framework and a software package that was designed around a set of novel conceptual approaches. GLU addresses the need for general and powerful tools that can scale to effectively handle trillions of genotypes. More information can be found in the GLU 1.0 DCEG presentation.

Key innovations

  • Compressed binary genotype storage
  • Use of streaming and stackable data transformations that avoid fully materializing data sets in main memory
  • Integration with a high-level scripting language for easy customization and extension
  • Support for parallel processing and distributed computing (not yet released)

Data management features

  • The ability to import, export, merge, and split genotype data among several common formats and standards
  • Filter based on powerful criteria for inclusion, exclusion
  • Rename and adjust sample and locus metadata

Descriptive tools genotype quality assurance

  • Estimation of assay completion
  • Reproducibility and concordance
  • Verification of known and detection of unknown duplicate samples
  • Empirical sex determination
  • Testing for deviations from Hardy-Weinberg proportions, Mendelian inheritance patterns, non-random patterns of missing data.

Analytic tools

  • LD estimation
  • Fitting generalized linear models to test for phenotype/genotype association
  • A high-performance tagger-like application with the ability to augment SNPs from set panels with a optimal tag SNPs using flexible criteria including design scores from major genotyping vendors.

Viewed as a library or framework, GLU is designed to be highly extensible, so that it may be easily augmented, customized, and serve as a foundation for the rapid development of new applications.