To content
Department of Computer Science

Theses on Algorithmic Foundations

Requirements

Please respect our general guideline for theses in our group.

For theses in the area of algorithmic fondations, your should satisfy the following:

  • You enjoy working on algorithmic problems
  • You have knowledge of a memory and runtime-efficient programming language; unless stated otherwise, we expect implementations in C++
  • You have experience with programming on the bit level / you feel comfortable with AND, OR and shifts

These topics are suitable for students of computer science. If you are interested, please contact Prof. Dr. Johannes Fischer, Patrick Dinklage or Jonas Ellert.

Open Topics

The following topics are currently available for bachelor's (BA) or master's (MA) theses.

The suffix array is a powerful index data structure that contains the lexicographic order of the suffixes of the input text. By itself, it already serves as a data structure for pattern matching, but it is also a central ingredient for more sophisticated algorithms and data structures in the fields of pattern matching, information retrieval, text compression and many more.

However, if stored as a plain array, the suffix array requires O(n lg n) bits of space, where n is the length of the input text. In many applications, this can be prohibitive, and thus there has been plenty of research on means to compress the suffix array while still allowing efficient, preferrably close to constant-time access. One possibility is to not compress the suffix array directly, but instead compress the delta suffix array, which contains the differences between two neighbored suffix array entries. Because of the nature of the lexicographic order of suffixes, the delta suffix array - other than the suffix array - may contain repetitions that can be exploited for compression.

In this thesis, we want to investigate whether LZ-End is a suitable scheme for compressing the delta suffix array. Other than typical Lempel-Ziv-based schemes, LZ-End has means to efficiently decode a particular character or substring from the original input without first having to decode the entire prefix, which enables efficient access. In the thesis, this new approach should be implemented in C++ and compared to a similar approach that uses Relative Lempel-Ziv (RLZ), which has similar properties in regards of decoding particular substrings.