Going from Text to Knowledge Graphs: Putting Natural Language Processing and Graph Databases to Work
Going from Text to Knowledge Graphs: Putting Natural Language Processing and Graph Databases to Work


In order to turn data into action we must know the context of that data. Traditionally humans were required to provide that context, however recently more and more context is available through data science approaches. This is achieved through the conversion of text into entities such as nouns like people and places and the verbs that describe their actions. In this way, we can obtain the nodes representing those nouns along with the verbs representing the relationships or edges between these nouns. We can further augment these nodes and edges by identifying words like adjectives further describing the nouns or word occurrences that can add additional relationships between nodes. This approach of named entity recognition can then be used in a variety of problems, such as creating better search engines or recommender systems.

In this workshop, we will start with an open source data set of text and convert it to a knowledge graph. We will use standard natural language processing (NLP) packages and approaches in Python to clean that text and create a knowledge graph data model within a graph database that can be queried and turned into data insights. This data model will include the nodes, edges, and attributes identified through the NLP process that can be used to create the necessary ontologies for the graph. We will experience the problems associated with generating such knowledge graphs, such as entity disambiguation and the lack of sufficient training data (zero-shot learning). Attendees of this workshop will create and put to use a complete pipeline for knowledge graph generation and analysis.


Dr. Clair Sullivan is currently a graph data science advocate at Neo4j, working to expand the community of data scientists and machine learning engineers using graphs to solve challenging problems. She received her doctorate degree in nuclear engineering from the University of Michigan in 2002. After that, she began her career in nuclear emergency response at Los Alamos National Laboratory where her research involved signal processing of spectroscopic data. She spent 4 years working in the federal government on related subjects and returned to academic research in 2012 as an assistant professor in the Department of Nuclear, Plasma, and Radiological Engineering at the University of Illinois at Urbana-Champaign. While there, her research focused on using machine learning to analyze the data from large sensor networks. Deciding to focus more on machine learning, she accepted a job at GitHub as a machine learning engineer while maintaining adjunct assistant professor status at the University of Illinois. Additionally, she founded a company, La Neige Analytics, whose purpose is to provide data science expertise to the ski industry. She has authored 4 book chapters, over 20 peer-reviewed papers, and more than 30 conference papers. Dr. Sullivan was the recipient of the DARPA Young Faculty Award in 2014 and the American Nuclear Society's Mary J. Oestmann Professional Women's Achievement Award in 2015.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google