Hashing, Randomness and Dictionaries
Rasmus Pagh October 2002 |
Abstract:
This thesis is centered around one of the most basic
information retrieval problems, namely that of storing and accessing the
elements of a set. Each element in the set has some associated information
that is returned along with it. The problem is referred to as the dictionary problem, due to the similarity to a bookshelf dictionary, which
contains a set of words and has an explanation associated with each word. In
the static version of the problem the set is fixed, whereas in the
dynamic version, insertions and deletions of elements are
possible.
The approach taken is that of the theoretical algorithms community. We work (almost) exclusively with a model, a mathematical object that is meant to capture essential aspects of a real computer. The main model considered here (and in most of the literature on dictionaries) is a unit cost RAM with a word size that allows a set element to be stored in one word. We consider several variants of the dictionary problem, as well as some related problems. The problems are studied mainly from an upper bound perspective, i.e., we try to come up with algorithms that are as efficient as possible with respect to various computing resources, mainly computation time and memory space. To some extent we also consider lower bounds, i.e., we attempt to show limitations on how efficient algorithms are possible. A central theme in the thesis is randomness. Randomized algorithms play an important role, in particular through the key technique of hashing. Additionally, probabilistic methods are used in several proofs. Available as PostScript, PDF. |