Review of Papers about Information Credibility

Keywords for Searching

It seems that “fake” and “trust” expressions performs better in search than “credibility” ones, especially than “information quality”.

(information/text) quality/credibility (assessment/estimation)
fake (news)(detection)
rumor (verification)
[^a-zA-Z][tT]rust → trustful(l)ness, trustworthiness, trusted, etc.
fact checking
information pollution

Papers

All papers that I find under keywords above are categorized according to conferences and years.

Papers in italic are what I am confused about classification as they somehow have relations with this topic. But it is self-evident that research about evidence credibility in adjudicatory decisions has no direct contributions to fake news detection. Anyway, they need further discussion. Another case is that readability is a part of text quality but I have no clue whether it should be involved in a review of credibility.

Besides, credibility is also related to authority, trust and persuasion. [1] Thus, it is a interdisciplinary filed integrating data science, psychology, rhetoric and even sociology, which may help improve the selection of features.

ACL 2019

Assessing Arabic Weblog Credibility via Deep Co-learning
BREAKING! Presenting Fake News Corpus for Automated Fact Checking
Evidence-based Trustworthiness
Gradual Argumentation Evaluation for Stance Aggregation in Automated Fake News Detection
Rumor Detection by Exploiting User Credibility Information, Attention and Multi-task Learning

ACL 2018

A Stylometric Inquiry into Hyperpartisan and Fake News
Eyes are the Windows to the Soul: Predicting the Rating of Text Quality Using Gaze Behaviour

Improving Topic Quality by Promoting Named Entities in Topic Modeling

EMNLP 2018

Belittling the Source: Trustworthiness Indicators to Obfuscate Fake News on the Web
DeClarE: Debunking Fake News and False Claims
Stance Detection in Fake News: A Combined Feature Representation
Towards Automatic Fake News Detection: Cross-Level Stance Detection in News Articles
Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content

EMNLP 2018 Keynote I: “Truth or Lie? Spoken Indicators of Deception in Speech” Julia Hirschberg, Columbia University

A Neural Local Coherence Model for Text Quality Assessment

Evidence Types, Credibility Factors, and Patterns or Soft Rules

Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection

NAACL 2019

Fake News Detection using Deep Markov Random Fields
Learning Hierarchical Discourse-level Structure for Fake News Detection
Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media

NAACL 2018

Predicting Human Trustfulness from Facebook Language

COLING 2018

A Retrospective Analysis of the Fake News Challenge Stance-Detection Task
Attending Sentences to detect Satirical Fake News
Automatic Detection of Fake News
Multi-Source Multi-Class Fake News Detection

Matrix of Papers

Paper No.	Field	Anterior Topic	Current Problem / challenge	Solution / Main Contribution	More (explanations / conclusions)
1	blog	social media credibility	lack of sufficient training data	a semi-supervised end-to-end deep learning approach	compare with fully supervised deep learning models, ensemble models
2	fake news (compelling stories)		lack of fake news datasets	1) introduce and analyze dataset; 2) classification model	1) based on linguistic features; 2) plan to extend the dataset
3	fake news		claims are inferred from evidence provided by source	a family of probabilistic models	jointly estimate the credibility of sources and claims
4	fake news; stance detection	aggregation of multiple stance labels from different text sources		a gradual argumentation semantics to bipolar argumentation frameworks mined using stance detection
5	fake news; stance detection			a multi-task learning approach / a neural network has a shared layer and two task specific layers	apply attention mechanism
6	fake news; hyperpartisan	hyperpartisan; satirical news		style-based analysis	1) partisan detection; 2) pre-screening for semi-automatic detection
7	readers’ rating of text quality	traditional textual features	relationship between gaze behavior and predicting quality	gaze behavior	model text quality with three properties: organization, coherence and cohesion
8	fake news	source credibility	web indicators have defects	classify websites into a credibility scale	reputation cues
9	fake news	methods based on supervised learning	1) ignore external evidence; 2) require substantial feature modeling and rich lexicons	an end-toned model without human intervention	1) aggregate signals from external evidence articles; 2) ablation studies
10	stance detection; fake news	FNC-1 dataset		combine lexical, word embeddings and n-gram features	investigate the importance of different lexicons in the detection
11	fake news; stance detection	1) four-staged pipeline proposed by Zubiaga et al.; 2) FNC-1; 3) Journalism Studies		identify asymmetry in length as a key characteristic of stance detection	model the internal structure of an article and its interactions with a claim
12	rumor verification	utilizes multimedia as input features	ignore external information	find external information in other news platforms	a new features set, cross-lingual cross-platform features that leverage the semantic similarity between the rumors and the external information
13	fake news	deep-learning based models	ignore correlations among news articles (consider individually)	a graph-theoretic method	inference problem in MRF + iterative mean-field algorithm
14	fake news	discourse-level structure	1) rely on annotated corpora (not available for fake news); 2) how to extract out useful information from such structures	hierarchical discourse-level structure (HDSF)	1) structure-related properties help understand fake news; 2) difference between in such structures between real and fake news
15	fake news; stance detection	hyper-partisanship	how to estimate entire news outlets	multi-task ordinal regression framework	1) political ideology detection; 2) compare joint and individual model
16	stance detection	FNC-1	what problems lie in FNC-1	1) retrospective analysis; 2) a stacked LSTM model	new dataset
17	satirical news	SVM and hierarchical neural networks with hand-engineered features	ignoring differences between sentence and document	incorporating pluggable generic neural networks detecting both sentences and documents	reveal key sentences in satirical news
18	fake news		how to detect fake news	1) two novel datasets; 2) detectors of 76% accuracy	1) describe the collection, annotation, and validation process in detail; 2) compare the automatic and manual identification
19	fake news	detection merely based on news content	1) fake news contain true evidence; 2) multiple sources	Multi-source Multi-class Fake news Detection framework (MMFD)	discriminate different degrees of fakeness

Overlapping Classes

Work on assessing information credibility on the Web can be classified into a number of overlapping classes: [2]

rankSVM
decision tree

Fake News Detection

SVM：TF-IDF + absurdity、humor、grammar、negative、affect、punctuation
CNNBi-LSTM

Estimating Blog Credibility

SVM
Naïve Bayes: URL segmented into tokens as input
bias, sentiment, reasonability, objectivity

Overview

Almost all research can be summarized as training machine learning or deep learning models that involve certain features on datasets to evaluate the credibility and comparing the performance with baselines. Then the general problem can be tackled step by step.

source credibility

external information

Further Review

Models

Co-learning CNN 1

bi-directional LSTM 2

Datasets

Dataset	Paper
FakeNewsNet	6

Buzzfeed

Papers introduce datasets: 2; 18

Baselines

Baseline	Paper	Evaluate
linear SVM using the TF-IDF scores	1	the effectiveness
Individual CNN (Word-CNN, Char-CNN)	1
Ensemble CNN	1

Features & Credibility

Credibility

There are some concluded universal parameters influencing credibility.

The credibility of information is mainly evaluated on credibility of the source, which includes the credibility of the communicator, the credibility of the media and the credibility of the content. The three dimensions can be subdivided into reliability, professionalism and identifiability. [1:1]

Features

message-based features
user-based features
topic-based features
propagation-based features [2:1]
continuous bag of words (CBOW) word embeddings, character-level embeddings

Information Quality

Information credibility is an important concept or dimension embedded in the evaluation of IQ (information quality).

Categories of IQ

Properties of Information Itself	Properties from User Perspective
Accuracy	Accessible
Completeness	Interpret
Timeliness	Useful
Consistency	Believable

Pipeline

graph LR
A[source]
B[meta-information]
subgraph communication
A-->B
C[other]
C-->B
A-->D
D[text]
E[content]
D-->E
F[user]
E-->F
G[direct claims]
H[indirect claims]
I[representation]
J[source2]
A-->J
F-->J
J-->K
B-->I
E-->I
J-->I
K-->I
end
K[sourceX]
subgraph data & model
F-->I
I-->L
M-->L
end
L[model]
M[training data]
N[credibility]
E-->G
E-->H
G-->N
H-->N
L-->N

Questions

I think there are two-dimensional meanings of “credibility” as long as I do not have a wrongheaded understanding of this word:
1. the information is objectively true and reliable;
2. the information is subjectively believed research about subjective trustfulness
Should detection of fake information be viewed as a kind of denoising (if in a process of building a corpus)?
The structure of a fake news model in general?
Features weight?

Danielson, S. Y. R. D. R. . (2010). Credibility: a multidisciplinary framework. Annual Review of Information Science & Technology, 41(1), 307-364. ↩︎ ↩︎
Assessing Arabic Weblog Credibility via Deep Co-learning ↩︎ ↩︎

research

cs nlp

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

Markdown模板 Previous

Hexo博客的搭建与优化小结 Next