Next Generation Indexes For Big Data Engineering
Next Generation Indexes For Big Data Engineering


Maximizing performance in data engineering is a daunting challenge. We present some of our work on designing faster indexes, with a particular emphasis on compressed indexes. Some of our prior work includes (1) Roaring indexes which are part of multiple big-data systems such as Spark, Hive, Atlas, Pinot, Kylin, (2) EWAH indexes are part of Git (GitHub) and included in major Linux distributions.

We will present ongoing and future work on how we can process data faster while supporting the diverse systems found in the cloud (with upcoming ARM processors) and under multiple programming languages (e.g., Java, C++, Go, Python). We seek to minimize shared resources (e.g., RAM) while exploiting algorithms designed for the single-instruction-multiple-data (SIMD) instructions available on commodity processors. Our end goal is to process billions of records per second per core.

The talk will be aimed at programmers who want to better understand the performance characteristics of current big-data systems as well as their evolution. The following specific topics will be addressed:

1. The various types of indexes and their performance characteristics and trade-offs: hashing, sorted arrays, bitsets and so forth.

2. Index and table compression techniques: binary packing, patched coding, dictionary coding, frame-of-reference.


Daniel Lemire is a computer science professor at the Université du Québec (TELUQ). He has also been a research officer at the National Research Council of Canada and an entrepreneur. He has written over 70 peer-reviewed publications, including more than 40 journal articles. He has held competitive research grants for the last 15 years. He serves on the program committees of leading computer science conferences (e.g., ACM CIKM, WWW, ACM WSDM, ACM SIGIR, ACM RecSys). He programs in C, C++, Java, JavaScript, Python, Swift and Go. He works primarily in an open-source setting.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google