Review of Papers about Information Credibility
Archive
This is where I search and download papers.
Keywords for Searching
It seems that “fake” and “trust” expressions performs better in search than “credibility” ones, especially than “information quality”.
- (information/text) quality/credibility (assessment/estimation)
- fake (news)(detection)
- rumor (verification)
- [^a-zA-Z][tT]rust → trustful(l)ness, trustworthiness, trusted, etc.
- fact checking
- information pollution
Papers
All papers that I find under keywords above are categorized according to conferences and years.
Papers in italic are what I am confused about classification as they somehow have relations with this topic. But it is self-evident that research about evidence credibility in adjudicatory decisions has no direct contributions to fake news detection. Anyway, they need further discussion. Another case is that readability is a part of text quality but I have no clue whether it should be involved in a review of credibility.
Besides, credibility is also related to authority, trust and persuasion. [1] Thus, it is a interdisciplinary filed integrating data science, psychology, rhetoric and even sociology, which may help improve the selection of features.
ACL 2019
- Assessing Arabic Weblog Credibility via Deep Co-learning
- BREAKING! Presenting Fake News Corpus for Automated Fact Checking
- Evidence-based Trustworthiness
- Gradual Argumentation Evaluation for Stance Aggregation in Automated Fake News Detection
- Rumor Detection by Exploiting User Credibility Information, Attention and Multi-task Learning
ACL 2018
- A Stylometric Inquiry into Hyperpartisan and Fake News
- Eyes are the Windows to the Soul: Predicting the Rating of Text Quality Using Gaze Behaviour
Improving Topic Quality by Promoting Named Entities in Topic Modeling
EMNLP 2018
- Belittling the Source: Trustworthiness Indicators to Obfuscate Fake News on the Web
- DeClarE: Debunking Fake News and False Claims
- Stance Detection in Fake News: A Combined Feature Representation
- Towards Automatic Fake News Detection: Cross-Level Stance Detection in News Articles
- Cross-Lingual Cross-Platform Rumor Verification Pivoting on Multimedia Content
A Neural Local Coherence Model for Text Quality Assessment
Evidence Types, Credibility Factors, and Patterns or Soft Rules
Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection
NAACL 2019
- Fake News Detection using Deep Markov Random Fields
- Learning Hierarchical Discourse-level Structure for Fake News Detection
- Multi-Task Ordinal Regression for Jointly Predicting the Trustworthiness and the Leading Political Ideology of News Media
NAACL 2018
Predicting Human Trustfulness from Facebook Language
COLING 2018
- A Retrospective Analysis of the Fake News Challenge Stance-Detection Task
- Attending Sentences to detect Satirical Fake News
- Automatic Detection of Fake News
- Multi-Source Multi-Class Fake News Detection
Matrix of Papers
Paper No. | Field | Anterior Topic | Current Problem / challenge | Solution / Main Contribution | More (explanations / conclusions) | Value |
---|---|---|---|---|---|---|
1 | blog | social media credibility | lack of sufficient training data | a semi-supervised end-to-end deep learning approach | compare with fully supervised deep learning models, ensemble models | |
2 | fake news (compelling stories) | lack of fake news datasets | 1) introduce and analyze dataset; 2) classification model | 1) based on linguistic features; 2) plan to extend the dataset | ||
3 | fake news | claims are inferred from evidence provided by source | a family of probabilistic models | jointly estimate the credibility of sources and claims | ||
4 | fake news; stance detection | aggregation of multiple stance labels from different text sources | a gradual argumentation semantics to bipolar argumentation frameworks mined using stance detection | |||
5 | fake news; stance detection | a multi-task learning approach / a neural network has a shared layer and two task specific layers | apply attention mechanism | |||
6 | fake news; hyperpartisan | hyperpartisan; satirical news | style-based analysis | 1) partisan detection; 2) pre-screening for semi-automatic detection | ||
7 | readers’ rating of text quality | traditional textual features | relationship between gaze behavior and predicting quality | gaze behavior | model text quality with three properties: organization, coherence and cohesion | |
8 | fake news | source credibility | web indicators have defects | classify websites into a credibility scale | reputation cues | |
9 | fake news | methods based on supervised learning | 1) ignore external evidence; 2) require substantial feature modeling and rich lexicons | an end-toned model without human intervention | 1) aggregate signals from external evidence articles; 2) ablation studies | |
10 | stance detection; fake news | FNC-1 dataset | combine lexical, word embeddings and n-gram features | investigate the importance of different lexicons in the detection | ||
11 | fake news; stance detection | 1) four-staged pipeline proposed by Zubiaga et al.; 2) FNC-1; 3) Journalism Studies | identify asymmetry in length as a key characteristic of stance detection | model the internal structure of an article and its interactions with a claim | ||
12 | rumor verification | utilizes multimedia as input features | ignore external information | find external information in other news platforms | a new features set, cross-lingual cross-platform features that leverage the **semantic similarity **between the rumors and the external information | |
13 | fake news | deep-learning based models | ignore correlations among news articles (consider individually) | a graph-theoretic method | inference problem in MRF + iterative mean-field algorithm | |
14 | fake news | discourse-level structure | 1) rely on annotated corpora (not available for fake news); 2) how to extract out useful information from such structures | hierarchical discourse-level structure (HDSF) | 1) structure-related properties help understand fake news; 2) difference between in such structures between real and fake news | |
15 | fake news; stance detection | hyper-partisanship | how to estimate entire news outlets | multi-task ordinal regression framework | 1) political ideology detection; 2) compare joint and individual model | |
16 | stance detection | FNC-1 | what problems lie in FNC-1 | 1) retrospective analysis; 2) a stacked LSTM model | new dataset | |
17 | satirical news | SVM and hierarchical neural networks with hand-engineered features | ignoring differences between sentence and document | incorporating pluggable generic neural networks detecting both sentences and documents | reveal key sentences in satirical news | |
18 | fake news | how to detect fake news | 1) two novel datasets; 2) detectors of 76% accuracy | 1) describe the collection, annotation, and validation process in detail; 2) compare the automatic and manual identification | ||
19 | fake news | detection merely based on news content | 1) fake news contain true evidence; 2) multiple sources | **Multi-source Multi-class **Fake news Detection framework (MMFD) | discriminate different degrees of fakeness |
Overlapping Classes
Work on assessing information credibility on the Web can be classified into a number of overlapping classes: [2]
Assessing Credibility in Social Media
- rankSVM
- decision tree
Fake News Detection
- SVM:TF-IDF + absurdity、humor、grammar、negative、affect、punctuation
- CNNBi-LSTM
Estimating Blog Credibility
- SVM
- Naïve Bayes: URL segmented into tokens as input
- bias, sentiment, reasonability, objectivity
Overview
Almost all research can be summarized as training machine learning or deep learning models that involve certain features on datasets to evaluate the credibility and comparing the performance with baselines. Then the general problem can be tackled step by step.
source credibility
external information
Further Review
Models
Co-learning CNN 1
bi-directional LSTM 2
Datasets
Dataset | Paper |
---|---|
FakeNewsNet | 6 |
Papers introduce datasets: 2; 18
Baselines
Baseline | Paper | Evaluate |
---|---|---|
linear SVM using the TF-IDF scores | 1 | the effectiveness |
Individual CNN (Word-CNN, Char-CNN) | 1 | |
Ensemble CNN | 1 |
Features & Credibility
Credibility
There are some concluded universal parameters influencing credibility.
The credibility of information is mainly evaluated on credibility of the source, which includes the credibility of the communicator, the credibility of the media and the credibility of the content. The three dimensions can be subdivided into reliability, professionalism and identifiability. [1:1]
Features
- message-based features
- user-based features
- topic-based features
- propagation-based features [2:1]
- continuous bag of words (CBOW) word embeddings, character-level embeddings
Information Quality
Information credibility is an important concept or dimension embedded in the evaluation of IQ (information quality).
Categories of IQ
Properties of Information Itself | Properties from User Perspective |
---|---|
Accuracy | Accessible |
Completeness | Interpret |
Timeliness | Useful |
Consistency | Believable |
Pipeline
graph LR
A[source]
B[meta-information]
subgraph communication
A-->B
C[other]
C-->B
A-->D
D[text]
E[content]
D-->E
F[user]
E-->F
G[direct claims]
H[indirect claims]
I[representation]
J[source2]
A-->J
F-->J
J-->K
B-->I
E-->I
J-->I
K-->I
end
K[sourceX]
subgraph data & model
F-->I
I-->L
M-->L
end
L[model]
M[training data]
N[credibility]
E-->G
E-->H
G-->N
H-->N
L-->N
Questions
- I think there are two-dimensional meanings of “credibility” as long as I do not have a wrongheaded understanding of this word:
- the information is objectively true and reliable;
- the information is subjectively believed research about subjective trustfulness
- Should detection of fake information be viewed as a kind of denoising (if in a process of building a corpus)?
- The structure of a fake news model in general?
- Features weight?
本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!