This research surveys the current state-of-the-art technologies that are instrumental in the adoption and development of fake news detection. "Fake news detection" is defined as the task of categorizing news along a continuum of veracity, with an associated measure of certainty. Veracity is compromised by the occurrence of intentional deceptions. The nature of online news publication has changed, such that traditional fact checking and vetting from potential deception is impossible against the flood arising from content generators, as well as various formats and genres.The paper provides a typology of several varieties of veracity assessment methods emerging from two major categories -linguistic cue approaches (with machine learning), and network analysis approaches. We see promise in an innovative hybrid approach that combines linguistic cue and machine learning, with network-based behavioral data. Although designing a fake news detector is not a straightforward problem, we propose operational guidelines for a feasible fake news detecting system.
Satire is an attractive subject in deception detection research: it is a type of deception that intentionally incorporates cues revealing its own deceptiveness. Whereas other types of fabrications aim to instill a false sense of truth in the reader, a successful satirical hoax must eventually be exposed as a jest. This paper provides a conceptual overview of satire and humor, elaborating and illustrating the unique features of satirical news, which mimics the format and style of journalistic reporting. Satirical news stories were carefully matched and examined in contrast with their legitimate news counterparts in 12 contemporary news topics in 4 domains (civics, science, business, and "soft" news). Building on previous work in satire detection, we proposed an SVMbased algorithm, enriched with 5 predictive features (Absurdity, Humor, Grammar, Negative Affect, and Punctuation) and tested their combinations on 360 news articles. Our best predicting feature combination (Absurdity, Grammar and Punctuation) detects satirical news with a 90% precision and 84% recall (F-score=87%). Our work in algorithmically identifying satirical news pieces can aid in minimizing the potential deceptive impact of satire.
A fake news detection system aims to assist users in detecting and filtering out varieties of potentially deceptive news. The prediction of the chances that a particular news item is intentionally deceptive is based on the analysis of previously seen truthful and deceptive news. A scarcity of deceptive news, available as corpora for predictive modeling, is a major stumbling block in this field of natural language processing (NLP) and deception detection. This paper discusses three types of fake news, each in contrast to genuine serious reporting, and weighs their pros and cons as a corpus for text analytics and predictive modeling. Filtering, vetting, and verifying online information continues to be essential in library and information science (LIS), as the lines between traditional news and online information are blurring.
This paper argues that big data can possess different characteristics, which affect its quality. Depending on its origin, data processing technologies, and methodologies used for data collection and scientific discoveries, big data can have biases, ambiguities, and inaccuracies which need to be identified and accounted for to reduce inference errors and improve the accuracy of generated insights. Big data veracity is now being recognized as a necessary property for its utilization, complementing the three previously established quality dimensions (volume, variety, and velocity), But there has been little discussion of the concept of veracity thus far. This paper provides a roadmap for theoretical and empirical definitions of veracity along with its practical implications. We explore veracity across three main dimensions: 1) objectivity/subjectivity, 2) truthfulness/deception, 3) credibility/implausibility -and propose to operationalize each of these dimensions with either existing computational tools or potential ones, relevant particularly to textual data analytics. We combine the measures of veracity dimensions into one composite index -the big data veracity index. This newly developed veracity index provides a useful way of assessing systematic variations in big data quality across datasets with textual information. The paper contributes to the big data research by categorizing the range of existing tools to measure the suggested dimensions, and to Library and Information Science (LIS) by proposing to account for heterogeneity of diverse big data, and to identify information quality dimensions important for each big data type.
Widespread adoption of internet technologies has changed the way that news is created and consumed. The current online news environment is one that incentivizes speed and spectacle in reporting, at the cost of fact-checking and verification. The line between user generated content and traditional news has also become increasingly blurred. This poster reviews some of the professional and cultural issues surrounding online news and argues for a two-pronged approach inspired by Hemingway's "automatic crap detector" (Manning, 1965) in order to address these problems: a) proactive public engagement by educators, librarians, and information specialists to promote digital literacy practices; b) the development of automated tools and technologies to assist journalists in vetting, verifying, and fact-checking, and to assist news readers by filtering and flagging dubious information.
This chapter presents a theoretical framework and preliminary results for manual categorization of explicit certainty information in 32 English newspaper articles. Our contribution is in a proposed categorization model and analytical framework for certainty identification. Certainty is presented as a type of subjective information available in texts. Statements with explicit certainty markers were identified and categorized according to four hypothesized dimensions -level, perspective, focus, and time of certainty.The preliminary results reveal an overall promising picture of the presence of certainty information in texts, and establish its susceptibility to manual identification within the proposed four-dimensional certainty categorization analytical framework. Our findings are that the editorial sample group had a significantly higher frequency of markers per sentence than did the sample group of news stories. For editorials, high level of certainty, writer's point of view, and future and present time were the most populated categories. For news stories, the most common were high and moderate levels, directly involved third party's point of view, and past time. These patterns have positive practical implications for automation.Keywords: certainty, certainty identification, certainty categorization model, subjectivity, manual tagging, natural language processing, linguistics, information extraction, information retrieval; uncertainty, doubt, epistemic comments, evidentials, hedges, hedging, certainty expressions; levels of certainty, point of view, annotating opinions; newspaper article analysis, analysis of editorials.1 Analytical Framework Introduction: What is Certainty Identification and Why is it Important?The fields of Information Extraction (IE) and Natural Language Processing (NLP) have not yet addressed the task of certainty identification. It presents an ongoing theoretical and implementation challenge. Even though the linguistics literature has abundant intellectual investigations of closely related concepts, it has not yet provided NLP with a holistic certainty identification approach that would include clear definitions, theoretical underpinnings, validated analysis results, and a vision for practical applications. Unravelling the potential and demonstrating the usefulness of certainty analysis in an information-seeking situation is the driving force behind this preliminary research effort.Certainty identification is defined here as an automated process of extracting information from certainty-qualified texts or individual statements along four hypothesized dimensions of certainty, namely:• what degree of certainty is indicated (LEVEL),• whose certainty is involved (PERSPECTIVE),• what the object of certainty is (FOCUS), and • what time the certainty is expressed (TIME).Some writers consciously strive to produce a particular effect of certainty due to training or overt instructions. Others may do it inadvertently. A writer's certainty level may remain constant in a text and be unnoticed by...
Recent improvements in effectiveness and accuracy of the emerging field of automated deception detection and the associated potential of language technologies have triggered increased interest in mass media and general public. Computational tools capable of alerting users to potentially deceptive content in computer–mediated messages are invaluable for supporting undisrupted, computer–mediated communication and information practices, credibility assessment and decision–making. The goal of this ongoing research is to inform creation of such automated capabilities. In this study we elicit a sample of 90 computer–mediated personal stories with varying levels of deception. Each story has 10 associated human deception level judgments, confidence scores, and explanations. In total, 990 unique respondents participated in the study. Three approaches are taken to the data analysis of the sample: human judges, linguistic detection cues, and machine learning. Comparable to previous research results, human judgments achieve 50–63 percent success rates, depending on what is considered deceptive. Actual deception levels negatively correlate with their confident judgment as being deceptive (r = -0.35, df = 88, ρ = 0.008). The highest-performing machine learning algorithms reach 65 percent accuracy. Linguistic cues are extracted, calculated, and modeled with logistic regression, but are found not to be significant predictors of deception level, confidence score, or an authors’ ability to fool a reader. We address the associated challenges with error analysis. The respondents’ stories and explanations are manually content–analyzed and result in a faceted deception classification (theme, centrality, realism, essence, self–distancing) and a stated perceived cue typology. Deception detection remains novel, challenging, and important in natural language processing, machine learning, and the broader library information science and technology community.
Deception in computer-mediated communication is defined as a message knowingly and intentionally transmitted by a sender to foster a false belief or conclusion by the perceiver. Stated beliefs about deception and deceptive messages or incidents are content analyzed in a sample of 324 computer-mediated communications. Relevant stated beliefs are obtained through systematic sampling and querying of the blogosphere based on 80 English words commonly used to describe deceptive incidents. Deception is conceptualized broader than lying and includes a variety of deceptive strategies: falsification, concealment (omitting material facts) and equivocation (dodging or skirting issues). The stated beliefs are argued to be valuable toward the creation of a unified multi-faceted ontology of deception, stratified along several classificatory facets such as (1) contextual domain (e.g., personal relations, politics, finances & insurance), (2) deception content (e.g., events, time, place, abstract notions), (3) message format (e.g., a complaint: they lied to us, a victim story: I was lied to or tricked, or a direct accusation: you're lying), and (4) deception variety, each tied to particular verbal cues (e.g., misinforming, scheming, misrepresenting, or cheating). The paper positions automated deception detection within the field of library and information science (LIS), as a feasible natural language processing (NLP) task. Key findings and important constructs in deception research from interpersonal communication, psychology, criminology, and language technology studies are synthesized into an overview. Deception research is juxtaposed to several benevolent constructs in LIS research: trust, credibility, certainty, and authority.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.