Blog

February 2019 Recap – What Lies Beneath Sentiment Analysis

Despite the crappy weather, many in the group recognized this event would have been a terrible talk to miss out on!*

If you didn’t manage to make it to our February event with Dr. Marie-Catherine de Marneffe from the Ohio State linguistics department you might wonder why my writing is even (slightly) more convoluted than usual. Those who did attended will certainly recognize this as an example of a sentence that would be judged as positive in sentiment by a human, but perhaps negative by a computer.

Dr. de Marneffe provided the group with fascinating insights about how sentiment analysis engines really work, and also when they might fall down on the job. These systems can be incredibly powerful and useful, but before relying on their output for real-world decisions we should really understand some basics, including:

  • What data the model was trained with. If the test data is similar to the training model (for example determining the sentiment of a movie review with a system trained upon movie reviews), then we might hope and expect to get some pretty accurate results! But take that same classifier and apply it to all the tweets you find about your product and maybe not!
  • What kind of output does the model create? If you’re making real-world decisions based upon what the model tells you, maybe get some more details than happy face vs. sad face? It’s not an all-knowing magic black box and dangerous things can happen when we treat it as such.
  • What are the biases inherent the training data? The decisions made by the system are reflective of the data, warts and all! (Anyone remember the crazy Microsoft AI twitter bot?)

Real world examples of additional language data encoded in non-word form (yes, I mean their hands).

There will be no March meetup, but we encourage everyone to join us at the Women in Analytics conference at the convention center!

In April, please come back to Rev1 to hear Sharon Santino and Kevin Boehm talk about autonomous vehicles.


*So how did our first sentence do when run through the Stanford NLP lab sentiment analysis demo? Well, if you believe the computers this must have been a pretty lackluster event…, wrong again! Maybe that’s why they call it “machine learning”, not “machine knowing”?