Using Word Embeddings to Interpret Reasons for Hospital Emergency Department Visits
Using Word Embeddings to Interpret Reasons for Hospital Emergency Department Visits


For clinical prediction problems, short free-text fields often hold valuable information. However, feature engineering from non-standardized fields can be difficult without manual curation. Word embedding approaches such as word2vec (Mikolov et al. 2013) or GloVe (Pennington et al. 2014) represent a mechanism for unsupervised and data-driven feature engineering for free text but suffer from a lack of interpretability necessary for applications in the clinical domain. Previous feature engineering approaches for short clinical text have relied on bag of words techniques or mapping concept unique identifiers from the Unified Medical Language System (UMLS) (Bodenreider 2004) to create features while others studies have used raw word embeddings. Combining information from pre-existing clinical ontologies from the UMLS and data-driven word embeddings to create interpretable features from short free-text could improve performance for clinical prediction problems. We combined word embeddings generated from the Global Vectors, or GloVe, method (Pennington et al. 2014) with clinical ontologies with an approach utilizing category word lists and the Bhattacharya distance to map embedding dimensions to interpretable categories (Senel et al. 2017). We applied the approach to generate features from emergency department chief complaints, the principle reason for visit, and predicted clinical orders placed during the visit. We compared functions for combining multiple words in a single chief complaint, variations on words lists and categories generated from distinct UMLS vocabularies, and utilizing interpretable features versus raw concept identifiers and raw word embeddings. We provide an automated and unsupervised framework for combining a priori knowledge and data-driven approaches for feature engineering from short free-text. This approach can be generalized to other clinical free-text and prediction problems beyond clinical orders.


Haley Hunter-Zinck is a health science specialist at the VA Boston Healthcare System. She has a Ph.D. in computational biology from Cornell University and transitioned to medical informatics during a postdoc in Porto Alegre, Brazil working with Brazilian public hospitals and a fellowship at VA Boston. She applies and develops machine learning techniques and visualization tools to improve hospital patient flow with a focus on the emergency department.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google