Donald or Hillary? Tweetcast predicts your vote
- Terms predicting voting preference include ‘lying,’ ‘illegal,’ ‘humanity,’ ‘rights’
- Accuracy of voter preference is 80 percent
- TweetCast algorithm focuses on words, hashtags, tagged usernames, websites
EVANSTON - What you tweet says a lot about your politics and who you are going to vote for in this highly volatile presidential election, according to TweetCast, an online tool developed by Northwestern University computer scientists.
The algorithm, trained on Twitter users, can predict whether citizens will vote for Donald Trump or Hillary Clinton. Perhaps more surprising, the tool also predicts which states will go blue or red (Democrat or Republican).
Tweeting the words “lying,” “liberal,” “illegal” and “money,” for example, indicates a vote for Trump. Using the words “single,” “humanity,” “rights” and “y’all,” on the other hand, predicts a vote for Clinton.
“These are not the most prevalent terms that voters use on Twitter,” said Larry Birnbaum, professor of computer science in Northwestern’s McCormick School of Engineering. “They are the most predictive terms.”
TweetCast uses a machine-learning algorithm to examine words, hashtags, tagged usernames and mentioned websites to uncover which terms are most predictive of voting preference. TweetCast’s prediction accuracy of voter preference is 80 percent.
Birnbaum’s team did not develop the algorithm used in TweetCast, but the researchers are the first to apply this approach to determining political preferences by analyzing tweets.
Birnbaum and his students first launched a version of TweetCast for the 2012 presidential election. The tool was included in PBS MediaShift Idea Lab’s “Our Picks for the Most Innovative Election Coverage.”
The algorithm was trained on Twitter users who have publicly declared support for one of the two candidates. During training, the algorithm found patterns in those users’ activity and applied those patterns to users across Twitter.
For this presidential election, Birnbaum and Ph.D. student Jason Cohn expanded the tool to predict the states Trump will take and the states Clinton will take.
By using Twitter’s geo-location feature, the algorithm randomly sampled approximately 80,000 Twitter users from each state. Based on those users’ predictive words, TweetCast could make a prediction for which states will most likely vote blue (New York, California and Illinois, for example) or red (Mississippi, Arkansas and Texas).
TweetCast is still experimental and has encountered some issues. States with fewer Twitter users, such as Wyoming and Montana, are trickier to predict. Birnbaum also points out that Twitter users skew young and liberal. His team currently is working with machine-learning expert Douglas Downey, associate professor of computer science at McCormick, to explore ways to compensate for these biases.
One can imagine how TweetCast’s information can help campaigns target voters and use Twitter to push voter turnout, but Birnbaum said it also shows that many preferences can be gleaned from Twitter.
“TweetCast is a good example of what we can tell about you from Twitter,” Birnbaum said. “We can determine a lot from the language you use, including which restaurants you like, books you read, sports you enjoy, news you consume — and who you’ll vote for.”
Anyone can try TweetCast to see if it correctly predicts which presidential candidate a user supports.
-Megan Fellman, science and engineering editor in University Relations, and Amanda Morris, writer/editor at the McCormick School of Engineering, contributed to the story.