To content
Department of Computer Science

Project Groups

Our groups has organized the following project groups in the scope of algorithmic foundations.

For general information about project groups, please refer to the department's information pages.

Succint data structures are those that can be used to answer queries efficiently while their memory consumptions is as close as possibile to the information-theoretic lower bound. A simple example is the counting of 0 or 1 bits in bit strings up to a certain position: without any preprocessing, this can only be done in time linear in the length of the string. However, there is a succinct data structure that can answer these so-called rank queries in constant time using only a sublinear amount of space and form the foundation for many advanced data structures (e.g., storing a tree topology of n nodes in 2n+o(n) bits).

Goals

The goal of the project group is the design and implementation of a high performance, extensible C++ library of different succinct data structures as well as the benchmaring of this library in the scope of typical usecases.

The data structures to be implemented come from the following fields:

  • Document retrieval
  • Finding repetitive patterns in strings
  • Grammar-based text compression
  • Graphs
  • Perfect hash functions
  • Search trees
  • Predecessor data structures
Results

The project group was carried out in summer 2019 and winter 2019/2020 and was finished successfully.

Final Report (English)  C++ Source Code (GNU GPL)

Sorting an array of numbers of arguably one of the best known and researched problem in algorithmic. This problem can be considered for strings: sort all the suffixes of a string in lexicographic order, e.g.: aanabana and na for the string bana. This sorting is fundamental for many important text indices.

Goals

The goal of this project group is the practical analysis and improvement of existing suffix sorting algorithms.

Results

The project group was carried out in summer 2018 and winter 2018/2019 and has finished successfully. It resulted in the publication of an article at SPIRE 2019.

Final Report (English) C++ Source Code (BSD 3 License) SPIRE 2019 Publication

Storing genomes can require a lot of memory (e.g., storing the genomes of all human inhabitatants of Dortmund naively would require about 500 TB). However, the variation between different individuals of the same species is only about 0.5%. This fact, in combination with the very small alphabet (A, C, G, T, possible few more), suggests that genomes can be compressed very well. Furthermore, from a medical perspective, it is important to be able to access especially annotated parts (genome sequences that contain a textual description of what is encoded, e.g. what enzyme of a person). For example, for correctly applying certain medications, it is important to know what gene variation the patient has.

Goals

The goal of this project group is to construct a data structure based on dynamic text indices. It should be able to maintain multiple individual genome sequences in a way that adding or removing single sequences avoids reconstruction of the entire index. The index should be efficient in terms of memory use (both RAM and disk) and be able to answer relevant queries (e.g., pattern matching) as fast as possible.

Results

The project group was carried out in summer 2017 and winter 2017/2018 and was finished successfully. It resulted in a poster at the German Conference of Bioinformatics (GCB) 2017.

GCB 2017 Paper GCB 2017 Abstract

How can we tell whether parts of a document are copied? In this PG, we will deal with the automatic search for plagiarisms using techniques from string algorithmics.

Goals

The goal of this project group is the combination of concepts from stringology as well as information retrieval for the design and implementation of a procedure to quickly and exactly search for plagiarisms in text documents (particularly PDF and text files). We are given a collection of documents that serve as a basis for searching plagiarisms in input documents (e.g., a collection of theses).

Results

The project group was carried out in summer 2015 and winter 2015/2016 and was finished successfully.

Final Report (German)

Dates and other Details

This website only contains general information on regularly offered courses. Please use the LSF for information about ongoing or upcoming courses, including dates and other details. Participants of our courses will receive access to corresponding Moodle rooms.