Fakulta matematiky, fyziky
a informatiky
Univerzita Komenského v Bratislave

Seminár z teoretickej informatiky - Mário Lipovský (3.11.2017)

v piatok 3.11.2017 o 11:00 hod. v miestnosti M/213

02. 11. 2017 09.44 hod.
Od: Rastislav Královič

Prednášajúci: Mário Lipovský

Názov: Approximate Abundance Histograms and Their Use for Genome Size Estimation

Termín: 3.11.2017, 11:00 hod., M/213 

DNA sequencing data is typically a large collection of short strings called reads. We can summarize such data by computing a histogram of the number of occurrences of substrings of a fixed length. Such histograms can be used for example to estimate the size of a genome. In this paper, we study a recent tool, Kmerlight, which computes approximate histograms. We discover an approximation bias, and we propose a new, unbiased version of Kmerlight. We also model the distribution of approximation errors and support our theoretical model by experimental data. Finally, we use another tool, CovEst, to compute genome size estimates from approximate histograms. Our results show that although CovEst was designed to work with exact histograms, it can be used with their approximate versions, which can be produced in a much smaller memory.

web:  http://kedrigern.dcs.fmph.uniba.sk/STI2 
rss:  http://kedrigern.dcs.fmph.uniba.sk/STI2/rss.php