gnumap
Overview:
The Genomic Next-generation Universal MAPper GNUMAP is a program designed to accurately map sequence data obtained from next-generation sequencing machines (specifically that of Solexa/Illumina) back to a genome of any size.With the emergence of high-throughput next-generation sequencing machines, an incredible amount of data is being produced at a very high rate. The big problem is mapping this data back to the genome. One significant problem with many genomic mapping programs is the way duplicate regions in genomic DNA are dealt with. Since it is impossible to know where exactly where a duplicate region should be mapped to, many programs simply throw out these sequences. Often, this results in a loss of nearly 40% of the data.
This project develops GNUMAP, a program capable of handling such repetitive regions. By using the posterior probability of mapping a given read to a specific genomic loation, we are able to account for these repetitive reads by distributing them across several regions in the genome. In addition, the output of the program is created in such a way that it can be easily viewed through other free and readily- available programs. Several benchmark data sets were created with spiked-in duplicate regions, and GNUMAP was able to more accurately account for these duplicate regions.
GNUMAP Users Google Group
- If you would like to become a part of the google group, send questions to the group as a whole, or browse questions that have been asked previously, please visit http://groups.google.com/group/gnumap-users.
Applications
- SNP Calling
- Bisulfite Sequencing -- download the software from here
- RNA Editing
- ChIP-seq
- RNA-Seq and miRNA-seq
News
- GNUMAP Bisulfite paper (GNUMAP-BS) has been submitted and is under review
- GNUMAP can now be used with MPI (Version 2.*). If several machines are available, the work can be spread evenly across them, greatly decreasing the processing time. For an example submission, see the GNUMAP user's group.
- When determining if a SNP is actual or just background noise, GNUMAP now uses a pairwise Hidden Markov Model in addition to the calculation of a posterior probability to identify a SNP location. Preliminary results outperform any existing program.
- GNUMAP can now save and read in a binary genome file! The latest version is available here.
Presentations and Publications
- May 2012: HiCOMB (in conjunction with IPDPS) in Shanghai, China. The workshop paper can be found here, and the presentation is available in PowerPoint format.
- May 2011: HiCOMB (in conjunction with IPDPS) in Anchorage, Alaska. The workshop paper can be found here, and the presentation is available in Keynote or PDF formats
- The GNUMAP algorithm was recently published in Bioinformatics as The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. Here is the link.
- June 2009: ISMB in Stockholm, Sweden. The presentation can be downloaded as a pptx or pdf.
Recent Updates:
- GNUMAP features are constantly added. Recently, a pairwise Hidden Markhov Model was added to increase accuracy for SNP detection.
- For questions or usage directions, please email Nathan at nathanlclement (at) gmail.com or gnumap-users (at) googlegroups.com.