Doctoral colloquium - Andrej Baláž (13.3.2023)
Monday 13.3.2023 at 13:10, Lecture room I/9
Andrej Baláž:
Compressed self-indexes for pangenomic datasets
Abstract:
Recent advancements in sequencing technologies brought a steep decrease in the acquisition price and a rapid increase in the growth of the size of novel genomic datasets. This growth and the shifting paradigm of jointly analysing all the related sequences, also called pangenomics, demand new data structures and algorithms for efficient processing. We will present several data structures, also called self-indexes, which form the basic building blocks of fundamental bioinformatics algorithms, such as read alignment. Due to the immense sizes of the pangenomic datasets, these self-indexes have to be compressed while remaining time-efficient to be practical. Therefore, we will show two compression techniques, tunnelling and r-indexing, and highlight our contributions to the compressed self-indexes in the form of a space-efficient construction algorithm and pattern-matching algorithm.