Solving the String Statistics Problem in Time

The string statistics problem consists of preprocessing a string of length

such that given a query pattern of length

, the maximum number of non-overlapping occurrences of the query pattern in the string can be reported efficiently. Apostolico and Preparata introduced the minimal augmented suffix tree (MAST) as a data structure for the string statistics problem, and showed how to construct the MAST in time ${\cal O}(n\log^2 n)$ and how it supports queries in time ${\cal O}(m)$ for constant sized alphabets. A subsequent theorem by Fraenkel and Simpson stating that a string has at most a linear number of distinct squares implies that the MAST requires space ${\cal O}(n)$ . In this paper we improve the construction time for the MAST to ${\cal O}(n\log n)$ by extending the algorithm of Apostolico and Preparata to exploit properties of efficient joining and splitting of search trees together with a refined analysis

Abstract: