Will AI ever ‘understand’ satire?


Tiernan Ray for ZDNet

A lot of nuances of writing are lost on the internet — things such as irony. 

That’s why satirical publications like the writing of Andy Borowitz on the website of The New Yorker magazine have to be labeled as satire, to make sure we know.  

Scientists in recent years have become concerned: What about writing that isn’t properly understood, such as satire mistaken for the truth, or, conversely, deliberate disinformation campaigns that are disguised as innocent satire?

And so began a quest, to divine some form of machine learning technology that could automatically identify satire as such and distinguish it from deliberate lies. 

In truth, a machine can’t understand much of anything, really, and it certainly can’t understand satire. But it may be able to quantify aspects of satirical writing, which might help to deal with the flood of fake news on the Internet. 

Case in point: A paper presented this week at the 2019 Conference on Empirical Methods in Natural Language Processing, in Hong Kong, authored by researchers from the tech startup AdVerifai, The George Washington University in Washington, DC, and Amazon’s AWS cloud division.

Also: No, this AI hasn’t mastered eighth-grade science

The paper, Identifying Nuances in Fake News vs. Satire: Using Semantic and Linguistic Cues, builds upon years of work modeling differences between misleading, factually inaccurate news articles, on the one hand, and satire on the other hand. (There’s also a slide deck prepared for EMNLP.)

The pressing concern, as lead author Or Levi, of AdVerifai, and his colleagues, write, is that it can be difficult in practice to tell satire from fake news. That means legitimate satire can get banned while misleading information may get undeserved attention because it masquerades as satire. 

“For users, incorrectly classifying satire as fake news may deprive them from desirable entertainment content, while identifying a fake news story as legitimate satire may expose them to misinformation,” is how Levi and colleagues describe the situation. 

The idea of all this research is that, although a person should know satire given a modicum of sense and topical knowledge, society may need to more precisely articulate and measure the aspects of satirical writing in a machine-readable fashion.

Past efforts to distinguish satire from genuinely misleading news have employed some simple machine learning approaches, such as using a “bag of words” approach, where a “support vector machine,” or SVM, classifies a text-based on very basic aspects of the writing. 

Also: No, this AI can’t finish your sentence

For example, a study in 2016 by researchers at the University of Western Ontario, cited by Levi and colleagues, aimed to produce what they called an “automatic satire detection system.” That approach looked at things like whether the final sentence of an article contained references to persons, places, and locations — what are known as “named entities” — that are at variance with the entities mentioned in the rest of the article. The hunch was that the sudden, surprising references could be a measure of “absurdity,” according to the authors, which could be a clue to satiric intent. 

That kind of approach, in other words, involves simply counting occurrences of words, and is based on expert linguists’ theories about what makes up satire. 

In the approach of Levi and colleagues, machine learning moves a little bit beyond that of human feature engineering. They employ Google’s very popular “BERT” natural language processing tool, a deep learning network that has achieved impressive benchmarks for a variety of language understanding tests in recent years. 

They took a “pre-trained” version of BERT, and then they “fine-tuned” it by running it through another training session based on a special corpus comprised of published articles of both satire and fake news. The dataset was built last year by researchers at the University of Maryland and includes 283 fake news articles and 203 satirical articles from January 2016 to October 2017 on the topic of US politics. The articles were curated by humans and labeled as either fake or satirical. The Onion was a source of satirical texts, but they included other sources so that the system wouldn’t simply be picking up cues in the style of the source.

Levi and colleagues found that BERT does a pretty good job of accurately classifying articles as satire or fake news in the test set — better, in fact, than the simple SVM approach of the kind used in the earlier research. 

Also: Why is AI reporting so bad?

Problem is, how it does that is mysterious. “While the pre-trained model of BERT gives the best result, it is not easily interpretable,” they write. There is some kind of semantic pattern detection going on inside BERT, they hypothesize, but they can’t say what it is. 

To deal with that, the authors also ran another analysis, where they classified the two kinds of writing based on a set of rules put together a decade ago by psychologist Danielle McNamara and colleagues, then at the University of Memphis, called “Coh-Metrix.” The tool is meant to asses how easy or hard a given text is for a human to understand given the level of “cohesion” and “coherence” in the text. It’s based on insights from the field of computational linguistics. 

Related Topics:

Big Data Analytics

Digital Transformation


Internet of Things


Enterprise Software