Christina Boucher and Mohamed Omar

Given a set S of n strings, each of length l, and a non-negative value d, we define a center string as a string of length l that has Hamming distance at most d from each string in S. The #Closest String problem aims to determine the number of unique center strings for a given set of strings S and input parameters n, l, and d. We show #Closest String is impossible to solve exactly or even approximately in polynomial time, and that restricting #Closest String so that any one of the parameters n, l, or d is fixed leads to an FPRAS. We show equivalent results for the problem of efficiently sampling center strings uniformly at random.

Christina Boucher and Mohamed Omar. "On the Hardness of Counting and Sampling Center Strings," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol.9, no.6, pp.1843,1846, Nov.-Dec. 2012. Conference version in the proceedings of the 17th Annual Symposium on String Processing and Information Retrieval (SPIRE 2010), pages 128--135.

@ARTICLE{6205735, author={Boucher, C. and Omar, M.}, journal={Computational Biology and Bioinformatics, IEEE/ACM Transactions on}, title={On the Hardness of Counting and Sampling Center Strings}, year={2012}, month={Nov}, volume={9}, number={6}, pages={1843-1846}, keywords={DNA;RNA;biology computing;biomechanics;computational complexity;hardness;molecular biophysics;molecular configurations;polynomial approximation;proteins;sampling methods;#CLOSEST STRING problem;DNA sequences;Hamming distance;RNA sequences;center string counting;center string sampling;computational complexity;fully polynomial-time randomized approximation scheme;hardness;protein sequences;Approximation algorithms;Approximation methods;Bioinformatics;Computational biology;Hamming distance;Polynomials;Sequential analysis;Biological sequence analysis;computational complexity;fully polynomial almost uniform sampler (FPAUS);fully polynomial-time randomized approximation scheme (FPRAS);journal;motif recognition;Algorithms;Computational Biology;DNA;Protein Binding;Proteins;Sequence Analysis}, doi={10.1109/TCBB.2012.84}, ISSN={1545-5963},}