Skip to content

Topic-Aware Hierarchical Document Representation for News Biased Detection

Notifications You must be signed in to change notification settings

yjiang18/LDA-HAN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Topic-Aware Hierarchical Document Representation for News Biased Detection

This is Keras implementation of the Hierarchical Network with Attention architecture (Yang et al, 2016). Instead of using standard word embedding alone, this work applys a topic-aware word embedding which combines word vectors with transposed topic-word distribution and such distribution is a global weighting of the dimensions of the word-topic vector. Once the model got the document representation, it also combines with the document-topic distribution before feeding to softmax at the final prediction.

Experiments

All results are calculated by the mean of 10-fold cross validation five times on the Hyperpartisan News Detection by-article training set data (645 news articles in total).

Model Accuracy
Transformer 72.12%
LDA-Transformer 71.56%
Kim-CNN 72.95%
LDA-Kim-CNN 73.47%
RNN-Attention 73.63%
LDA-RNN-Attention 73.75%
ESRC 71.81%
LDA-ESRC 73.69%
HAN 75.69%
LDA-HAN 76.52%

Preparation / Requirements

  • Python 3.6 (Anaconda will work best)
  • Tensorflow version 1.13.0
  • Keras version 2.2.4
  • Gensim 3.8.0
  • Spacy version 2.1.16
  • flask 1.1.1

Preparation steps:

  1. mkdir checkpoints data embeddings history lda_stuff for store trained models, training data set, glove & lda embeddings, training logs, gensim models respectively.
  2. Convert original data xml file to tsv format. (see this for how it works).
  3. python ./utils/lda_gen.py -train True -H True -T 425 for generating lda topic embedding with 425 topics.
  4. python main.py -train True for training the model.
  5. python visulization.py for building the web application and visualize attention distribution and predictions.

Example 1:

alt text

Example 2:

alt text

About

Topic-Aware Hierarchical Document Representation for News Biased Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published