RNA splicing is a cellular process driven by the interaction between numerous regulatory sequences and binding sites, however, such interactions have been primarily explored by laboratory methods since computational tools largely ignore the relationship between different splicing elements. Current computational methods identify either splice sites or other regulatory sequences, such as enhancers and silencers. We present a novel approach for characterizing co-occurring relationships between splice site motifs and splicing enhancers. Our approach relies on an efficient algorithm for approximately solving Consensus Sequence with Outliers , an NP-complete string clustering problem. In particular, we give an algorithm for this problem that outputs near-optimal solutions in polynomial time. To our knowledge, this is the first formulation and computational attempt for detecting co-occurring sequence elements in RNA sequence data. Further, we demonstrate that SeeSite is capable of showing that certain ESEs are preferentially associated with weaker splice sites, and that there exists a co-occurrence relationship with splice site motifs.

Christine Lo, Boyko Kakaradov, Daniel Lokshtanov and Christina Boucher. "SeeSite: Characterizing Relationships between Splice Junctions and Splicing Enhancers," Computational Biology and Bioinformatics, IEEE/ACM Transactions on , vol.11, no.4, pp.648-656, July-Aug. 2014 doi: 10.1109/TCBB.2014.2304294

author={Lo, C. and Kakaradov, B. and Lokshtanov, D. and Boucher, C.}, 
journal={Computational Biology and Bioinformatics, IEEE/ACM Transactions on}, 
title={SeeSite: Characterizing Relationships between Splice Junctions and Splicing Enhancers}, 
keywords={RNA;genetics;molecular biophysics;molecular configurations;Consensus Sequence-with-Outliers;NP-complete string clustering problem;RNA sequence data;RNA splicing;SeeSite;binding sites;cellular process;computational methods;genetics;laboratory methods;near-optimal solutions;polynomial time;regulatory sequences;splice junctions;splice site motifs;splicing elements;splicing enhancers;Approximation algorithms;Bioinformatics;Computational biology;RNA;Splicing;EPTAS;PTAS;RNA splicing;exon splicing enhansers;randomized algorithms},