Open-source Data Curation and Governance for Large and Growing Data Lakes


Data Lake Technology provides a powerful way to process, refine, and present huge volumes of diverse data. But this comes at a cost. As a Data Lake evolves, it grows in size and complexity. If not properly managed, a Data Lake can outgrow the abilities and resources of the team that manages it, negatively impacting the usefulness of an organization’s data and slowing or halting the team’s implementation of new analytics and applications. In this talk, Roger Dev showcases how the open source HPCC Systems platform has developed an open-source data curation and governance system called Tombolo to complement the powerful storage and compute capabilities of the HPCC Systems Data Lake operating system.

Session Outline:

Roger will demonstrate how the system enables you to:

1. Curate data – the ability to automatically identify and classify a data file
2. Govern sensitive data – automatically identify sensitive data files, apply any necessary usage restrictions to that data, and
3. Keep accurate records - of who, how, and when a user or application interacts with a sensitive data file


Roger is a Senior Architect leading the Machine Learning and Analytics Library team at LexisNexis Risk Solutions. Roger has been involved in the implementation and utilization of machine learning and AI techniques for many years, and he has more than 20 patents in diverse areas of software technology.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google