Abstract: Text analytics or text mining is an important branch of analytics that allows machines to break down text data. As a data scientist, I often use text-specific techniques to interpret data that I'm working with for my analysis. During this workshop, I plan to walk through an end-to-end project covering text pre-processing techniques, machine learning techniques and Python libraries for text analysis.
Text pre-processing techniques include data cleaning and tokenization. Once in a standard format, various machine learning techniques can be applied to better understand the data. This includes using popular modeling techniques to classify emails as spam or not, or to score the sentiment of a tweet on Twitter. In addition, unsupervised learning techniques such as topic modeling with Latent Dirichlet Allocation or matrix factorization can be applied to text data to pull out hidden themes in the text. Other techniques such as text generation can be applied using Markov chains or deep learning.
We will walk through an example in Jupyter Notebook that goes through all of the steps of a text analysis project, using several text analysis libraries in Python including NLTK, TextBlob and gensim along with the standard machine learning libraries including pandas and scikit-learn.
Bio: Alice Zhao is currently a Senior Data Scientist at Metis, where she teaches 12-week data science bootcamps. Previously, she worked at Cars.com, where she started as the company's first data scientist, supporting multiple functions from Marketing to Technology. During that time, she also co-founded a data science education startup, Best Fit Analytics Workshop, teaching weekend courses to professionals at 1871 in Chicago. Prior to becoming a data scientist, she worked at Redfin as an analyst and at Accenture as a consultant. She has her M.S. in Analytics and B.S. in Electrical Engineering, both from Northwestern University. She blogs about analytics and pop culture on A Dash of Data. Her blog post, "How Text Messages Change From Dating to Marriage" made it onto the front page of Reddit, gaining over half a million views in the first week. She is passionate about teaching and mentoring, and loves using data to tell fun and compelling stories.