
Allison Hegel
I'm an AI Resident at Microsoft Research specializing in natural language processing and text generation with experience as a Data Scientist at Apple and HackerRank. I've applied machine learning, deep learning, and natural language processing in both industry and research settings to better understand user behavior and preferences using Python, PyTorch, and SQL. I have experience presenting results and training at 14 international conferences and universities, teaching 9 classes, and winning 19 grants and awards.
Recent Work

Tone Classification and Rewrite Suggestion
As an AI Resident at Microsoft Research, I am collaborating with Office teams to build a neural system to detect impolite language and offer rewritten alternatives.

Text Generation and Content Transfer
As an AI Resident at Microsoft Research, I developed deep learning models to rewrite recipes based on dietary constraints using GPT-2 fine-tuned on over 1.2 million recipes. I also implemented state-of-the-art models for comparison including BERT, PPLM, and CTRL. Research submitted to EMNLP.

Comparison of Sentence Encoding Methods
This project compares several methods for computing sentence similarity, including Jaccard similarity, TF-IDF, GloVe, BERT, and RoBERTa. Jaccard similarity is able to achieve the highest precision with a strict threshold that only aligns a small portion of the data, while neural methods offer a good balance of precision and alignment percentage.

Apple Crash Prediction
As a data scientist at Apple, I developed machine learning models to predict crash rates of new software releases on billions of devices using Python (sklearn) and Hadoop.

HackerRank Candidate Feedback App
As a data scientist at HackerRank, I used Python (sklearn) and Django to create an internal web app that uses machine learning to automatically tag customer feedback comments and route it to the relevant department, as well as view trends updated hourly.

HackerRank Test Health Dashboard
As a data scientist at HackerRank, I developed metrics and deployed the data pipeline for a major product launch that helps customers understand how effectively they are using the product and make data-driven improvements. The data pipeline used Python, MySQL, Redshift, and Airflow.

Game Developer Dashboard
A web app that helps game developers design their marketing strategy, powered by a logistic regression model in Python (sklearn) that predicts whether a video game will be successful with 90% accuracy and identifies the language reviewers use to talk about successful video games. I deployed the model as an interactive web app using Flask, Bootstrap, and Bokeh.

Classifying Genre with Machine Learning
I used a support vector model in Python (sklearn) to classify Goodreads book reviews based on genre, revealing the associated vocabulary of user reviews for each genre and trends in word use over time. This work was part of my dissertation project.

Clustering Genres with Machine Learning
I implemented the t-SNE algorithm in Python (sklearn) to cluster high-dimensional, sparse textual data derived from Goodreads users' shelving activity, identifying latent genres that better represent user activity than traditional product categories. This work was part of my dissertation project.

Extracting Events from Plot Summaries with NLP
I used natural language processing to identify characters and plot events in book reviews, revealing the themes reviewers care about most. I conducted the analysis in Python, using Stanford CoreNLP to parse sentence grammar. This work was part of my dissertation project.

Supervised Topic Classification
I used a support vector model in Python (sklearn) to classify the topic of book review sentences, identifying what reviewers are most concerned about on different websites and within different product categories. This work was part of my dissertation project.

BookSplice Recommendation App
On a team of 5 developers, we created a website that recommends books to users based on sentiment analysis of book reviews. I scraped and cleaned the book review database and implemented the sentiment analysis code using Python (Scrapy and NLTK) and SQL.

Text Analysis with Collocations
I wrote a Python script to find the most common collocations in a text. I walked through implementing the analysis with a class of undergraduates as part of "Collocations Analysis: A Hands-On Workshop," which I ran with Matthew Lavin at the University of Pittsburgh on April 7, 2017.

Amazon Product Categories Data Visualization
I designed an interactive sunburst chart of Amazon product categories using D3.js to display the magnitude of each category and sub-category of books sold on the site.
Resume
Experience
- AI Resident, Microsoft Research, Redmond, WA, September 2019 - Present
- Data Scientist, Apple, Cupertino, CA, April 2019 - August 2019
- Data Scientist, HackerRank, Mountain View, CA, April 2018 - April 2019
- PhD Researcher, UCLA Department of English, Los Angeles, CA, September 2013 - May 2018
- Digital Project Manager, Cruikshank Digital Archive, December 2015 - January 2018
- Graduate Fellow, UCLA Digital Gateway for English Majors, September 2015 - August 2017
- General Editor, The Programming Historian, October 2013 - November 2016
- Graduate Teaching Associate, UCLA, September 2014 - August 2016
- Systems Developer, McMaster-Carr, Elmhurst, IL, June 2012 - July 2013
Skills
Code
Python, SQL, Git, HTML, CSS
Machine Learning
PyTorch, Scikit-Learn, Pandas, Numpy, SpaCy, CoreNLP, NLTK, Scrapy, Jupyter Notebooks
Data Deployment
Docker, Airflow, Hadoop, Django, Flask, Tableau
Education
UCLA
PhD in English, with a focus on natural language processing and machine learning
Program ranked #6 in field
Dissertation: Using machine learning and natural language processing to better understand readers' online behavior and preferences by analyzing 400K+ book reviews
University of Chicago
BA, double major with honors, English and Fundamentals
Phi Beta Kappa
Varsity Soccer: Mary Jean Mulvaney Scholar Athlete Award (awarded annually to one female varsity athlete), UAA Champions, 3x NCAA Tournament Bid
Speaking
- Being Human, Seeming Human, Modern Language Association Annual Convention, Seattle, January 2020
- Panel of Working Data Scientists with Different Learning Backgrounds, Kaggle CareerCon, April 2019
- Goodreads and the Black Box of Online Reading, Modern Language Association Annual Convention, New York City, January 2018
- Context at Scale: Text Analysis of Amateur and Professional Book Reviews, Digital Humanities Research Network speaker series, University of Pittsburgh, April 2017
- Genre Play: Text Mining Book Reviews for New Genres, Post45 Graduate Symposium, University of California, Berkeley, February 2017
- Genres of Goodreads, Lab Day, Stanford Literary Lab, February 2017
- Reading Genre in Online Book Reviews, How to Do Things With Millions of Words, University of British Columbia, November 2016
- Adapting the Book on Social Media, UCLA Southland Conference, June 2016
- Classification on the Web, Archives Unleashed: Web Archive Hackathon 2.0, Library of Congress, June 2016
- Tracking Discourse on Social Media, Archives Unleashed: Web Archive Hackathon, University of Toronto, March 2016
- How Mainstream is Twitter? Mining the Baltimore Protests, UCLA Social Data Analysis Seminar, June 2015
- Data Visualization with Tableau, Getty Foundation Summer Institute: Beyond the Digitized Slide Library, June 2015
- Literary Californias, UCLA Digital Humanities Working Group special meeting on topic modeling, May 2015
- Reading Machines, Reading People, on panel “New Material: Reading Algorithmic Translations,” UCLA Southland Conference, May 2014
Grants & Awards
- AWS Deep Learning Fellowship, Amazon Web Services & Fast.AI, $1,500 tuition awarded to diverse international applicants, 2017
- Data Science Scholarship, Metis & Women Who Code, $1,900 tuition awarded to one female coder, 2017
- Udacity Scholarship, Women Techmakers & Google, $2,400 awarded to 100 women worldwide, 2017
- Dissertation Year Fellowship, UCLA, university-wide competitive stipend funding, 2017-2018
- Graduate Student Support Grant, UCHRI, 2017-2018
- Travel Grant, Humanists@Work Graduate Career Workshop, UCHRI, 2017
- Faculty Research Grant, UCLA, with Cruikshank’s Eye Digital Archive team, 2016 and 2017
- Mellon Professionalization Initiative Fellowship, UCLA, 2016
- Mellon Pedagogy Fellowship: Digital Gateway for English Majors, UCLA, 2015 and 2016
- Graduate Summer Research Mentorship, UCLA, 2014
- Stephen P. Milner Award, UCLA, 2013
- Dean’s Mellon Fellowship, UCLA, 2013
- Mary Jean Mulvaney Scholar Athlete Award, University of Chicago, awarded annually to one female varsity athlete, 2012
- Phi Beta Kappa, University of Chicago, 2012
- National Merit Scholarship, Computer Sciences Corporation, 2008-2012
- James Fulton Maclear Scholarship, University of Chicago, 2008-2012
- Summer Research Grant, Fundamentals Department, University of Chicago, 2011
- Jeff Metcalf Fellow, University of Chicago, 2010 and 2011
- Segal Memorial Scholarship, National Multiple Sclerosis Society, 2008
Teaching & Workshops
- Collocations Analysis, workshop at the University of Pittsburgh, 2017
- Information Overload, sole instructor of two UCLA English courses, winter and spring 2016
- Introduction to Environmental Humanities, teaching assistant for UCLA English course, 2015
- Data Visualization with Tableau, workshop at Getty Foundation Summer Institute, 2015
- Digital Humanities Capstone, course assistant for UCLA Digital Humanities course, 2015
- Science Fiction and the Reinvention of Nature, teaching assistant for UCLA English course, 2015
- Major American Authors, teaching assistant for UCLA English course, 2015
- Law and Literature, teaching assistant for UCLA English course, 2014
Website Design: HTML5 UP