From The MarthLab
Contents |
[edit] Overview

Mosaik is a suite comprising of three modular programs: MosaikBuild, MosaikAligner, and MosaikAssembler.
-
MosaikBuild converts various sequence formats into Mosaik’s native read format.
-
MosaikAligner pairwise aligns each read to a specified series of reference (anchor) sequences.
-
MosaikAssembler parses the aligned sequence archive and produces a multiple sequence alignment which is then saved into an assembly file format.
Mosaik is written in C++ with multiple platforms in mind. Compiled versions are currently available for both Microsoft Windows and Linux operating systems, but can be made available on different platforms upon request. Cluster-aware (MPI) versions of the MosaikAligner exist that have been tested on up to 160 processors.
At the time of the beta release, the workflow consists of supplying sequences in a FASTA format consistent with the output of phd2fasta (i.e. separate FASTA files for bases, base qualities, and base positions) and obtaining assembly files in the phrap ace format which can be viewed with utilities such as consed, Sequencher, or EagleView.
[edit] Features
-
aligns a large range of read lengths
from short Illumina reads to medium 454 reads to long legacy Sanger reads -
co-assembly
can create an assembly with multiple sequencing technologies (Illumina, 454, and Sanger) -
cluster-aware
can be run on any number of processors (tested up to 160 processors) -
anchored aligner
use an entire genome as a reference when aligning reads -
gapped alignment
especially useful for insertion / deletion (indel) detection -
fast
aligns 100 million Illumina reads in 90 minutes on one processor
[edit] Publication & Release Schedule
Summer 2007
[edit] Beta Release
Registered beta testers can download the documentation and Mosaik distribution here:
- Beta Release Documentation
- Windows Distribution
- Linux Distribution (32-bit)
- Linux Distribution (64-bit)
- Linux Distribution (Itanium2)
- Apple OS X Distribution (Intel)
- Apple iPhone Distribution :-)
[edit] Found a bug?
Found a bug in Mosaik? Please report it on our bug tracking website for quicker bug resolution.
[edit] Masked Human Genome 36.2
Derek Barnett has masked out all of the repeats in the human genome build 36.2 using repeat masker for documented repeats and BLAT for micro-repeats.
- masked human genome (entire genome in one FASTA file)
- masked human genome (individual FASTA files for each chromosome)
[edit] Illumina Support
If you currently don't have a utility to convert Illumina data sets to FASTA, you can use the following Mosaik utilities to help convert the data sets to both FASTA and the Mosaik internal read format (bypassing MosaikBuild).
If GERALD base quality calibration was used, use the ConvertGerald utility. Otherwise use the ConvertBustard utility. For a list of options, just run each program with no parameters.
