We initiate the study of the smoothed complexity of the Closest String problem by proposing a semi-random model of Hamming distance. We restrict interest to the optimization version of the Closest String problem, and give a 2-approximation algorithm, which we refer to as CSP-Greedy, that runs in O(nl + l^3)-time, where l is the string length and n is the number of strings. Using smoothed analysis, we prove CSP-Greedy achieves a (1 + 2*epsilon)-approximation guarantee, where epsilon > 0 is a small value. This approximation and runtime guarantee demonstrates that Closest String instances with a relatively large number of input strings are efficiently solved in practice. We also give experimental results demonstrating that CSP-Greedy runs efficiently on instances with a large number of strings. The counter-intuitive fact that large Closest String instances are relatively easy and efficient to solve gives new insight into this well-investigated problem.

Christina Boucher and Kathleen Wilkie. Smoothed Analysis of the Closest String Problem. In proceedings of the 17th Annual Symposium on String Processing and Information Retrieval (SPIRE 2010), pages 107--118.