We present Kohdista, which is an index-based algorithm for finding pairwise alignments between single molecule maps (Rmaps). The novelty of our approach is the formulation of the alignment problem as automaton path matching, and the application of modern index-based data structures. In particular, we combine the use of the Generalized Compressed Suffix Array (GCSA) index with the wavelet tree in order to build Kohdista. We validate the approach on E. coli data, and show Kohdista is at least 3x faster than existing methods for finding quality pairwise alignments in plum Rmap data. Lastly, we demonstrate Kohdista is the only non-proprietary method that is capable of finding high quality pairwise Rmap alignments for large eukaryote organisms in reasonable time. Kohdista is available at https://github.com/mmuggli/KOHDISTA/.
A Succinct Solution to Rmap Alignment. Martin D. Muggli, Simon J. Puglisi, and Christina Boucher. In the proceedings of Workshop on Algorithms in Bioinformatics (WABI), Helsinki, Finland, Leibniz International Proceedings in Informatics (LIPIcs), Volume 113, Article 12, pp 1-16, 2018.