Comparative Methods for Gene Structure Prediction in Homologous
Sequences
Christian N. S. Pedersen
June 2002 |
Abstract:
The increasing number of sequenced genomes motivates the use of
evolutionary patterns to detect genes. We present a series of comparative
methods for gene finding in homologous prokaryotic or eukaryotic sequences.
Based on a model of legal genes and a similarity measure between genes, we
find the pair of legal genes of maximum similarity. We develop methods based
on genes models and alignment based similarity measures of increasing
complexity, which take into account many details of real gene structures,
e.g. the similarity of the proteins encoded by the exons. When using a
similarity measure based on an exiting alignment, the methods run in linear
time. When integrating the alignment and prediction process which allows for
more fine grained similarity measures, the methods run in quadratic time. We
evaluate the methods in a series of experiments on synthetic and real
sequence data, which show that all methods are competitive but that taking
the similarity of the encoded proteins into account really boost the
performance
Available as PostScript, PDF. |