Computer Science Talks
Text Analysis with Enhanced Annotated Suffix Trees
Annotated suffix trees is an extension of the classic suffix tree data structure. It has a range of interesting applications in natural language processing, such as language-independent information retrieval. We present an enhanced implementation of this data structure where its internal representation is based on suffix arrays instead of the traditional but heavyweight suffix trees. This makes it possible to achieve significant improvements in performance with respect to both memory usage and runtime. See the paper for more details.
Sublinear Geometric Algorithms
We start by discussing the main approaches to sublinear algorithm design as well as the basic types of randomized algorithms. We then go into more detail about three randomized Las Vegas algorithms with no extra preprocessing for three geometric problems: a 1D problem (successor search), a 2D problem (polygonal intersection), and a 3D problem (polyhedral intersection).
String Processing in NLP, Biology and Finance
An overview of classic string processing algorithms (full-text indexing, sequence alignment, word embeddings) with a few example applications in natural language processing, biology and finance.
Refal: Recursive Functions' Algorithmic Language
An introduction to Refal, a unique functional programming language based on Markov algorithms and term rewriting (in Russian). See the corresponding GitHub repository for a few code snippets.
Burrows-Wheeler Transform
The Burrows-Wheeler Transform is a transformation on words widely used in bioinformatics as a way of building practical full-text indexes for genomes. We give a description of the transform and go into some of its most importart properties.
Tree Automata
Automata on infinite objects, in particular trees, play an important role in computer science: they allow to model nonterminating systems and are more suitable than words when nondeterminism needs to be modelled. We give an introduction to tree automata and present one of the proofs of a theorem stating that the class of languages recognized by finite-state tree automata is closed under complementation.
Cylindrical Algebraic Decomposition
Cylindrical Algebraic Decomposition (CAD) is a general tool for dealing with semilagebraic sets that allows to solve a wide rage of problems: quantifier elimination over the reals, tests for emptiness, finiteness, connectedness of semialgebraic sets, computing connected components of a given semialgebraic set etc. We introduce the concept and then describe the first algorithm for computing the CADs, the Collins algorithm.