The Healthy Approach – Organic Data Enrichment Through Entity Extraction
The Healthy Approach – Organic Data Enrichment Through Entity Extraction


Assembling comprehensive datasets is often the biggest challenge in creating effective analytics and machine learning models. This requires enriching existing datasets with external sources; examples such as product catalogs or company records can complement existing data gaps. However, external data sources can be a time consuming and expensive resource. Instead, what if you could use machine learning to extract previously inaccessible information from existing data? A solution which reliably extracts important data from text can enrich datasets in a repeatable fashion. With a relatively small amount of manual labelling, a Named Entity Recognition (NER) model can be trained to identify and extract entities.
In this session, we discuss how Named Entity Recognition algorithms can expand the accessible information in a dataset by extracting known entities from unstructured attributes. As an example, we’ll use a Recurrent Neural Net (RNN) to identify and retrieve product properties from supply chain datasets. These extracted attributes can be used for downstream data curation and analytics, such as units of measure standardization or price comparison. We benchmark the performance of our algorithms against more traditional extraction methods, including regular expressions. Finally, we show how the RNN models can be provided in a simple API for data enrichment at scale.


Julia is the Director of Analytics at Tamr, where she is expanding the company's analytics and data science solutions. Before joining Tamr, she led end-to-end modeling and development of data science products at Aon's Intellectual Property Solutions group. Her previous experience includes technology-focused litigation consulting, quantitative finance, and private equity. Julia has a PhD in Physics from Harvard.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google