Skip to content

lazareGirardin/Sentiment-Analysis

Repository files navigation

Sentiment-Analysis

Sentiment Analysis on movie reviews. The aim is for a neural network to classify reviews left by users.

Workload (approximation):

  • 40% coding
  • 60% Analysis of methods and results

Dataset: https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews/data (Reviews labeled in 5 classes, from Rottentomatoes)

(Eventually use IMDB dataset)

Pre-Processing in juypter-notebook Exploration.ipynb:

  • Keep only full sentences
  • Binarize Sentiment (1 or 0)
  • Tokenize sentences
  • Remove stopwords and similar (***** CHANGE THIS! KEEP IMPORTANT ONES *********)
  • ...

To Do:

  • Choose vocabulary size (tf-id? "by-hand" select few good features?...)
  • simple word2Vec, GloVe, ... pre-processing
  • Compare performance with different pre-processing
  • Compare performance with different algorithms (simple perceptron, Neural Nets, basic ML algo?)
  • Best performance: upload on Kaggle to compare with existing solutions?

Preprocessing:

  • Tokenizer, without stopwords : 15273 unique words
  • Tokenizer and Stemmer: 10493 unique words!

Binary Classification task

Sentiment Analysis on a binarized version of the RottenTomatoes dataset.

Two binarization strategies:

  • Naive: Every reviews with y <= 2 mapped to 0 otherwise 1.
  • Ambiguous reviews removed: Removal of every reviews where y = 2 (neutral). Otherwise, f(y) = 0 if y < 2 and f(y) if y > 2.

Classification Model:

  • pre-processing: Tokenization, GloVe embedding
  • Convolutional Neural Network

Related files:

  • binary_mapping_01.ipynb, binary_mapping_02.ipynb: ipython notebooks used to visualize and map the reviews sentiments into binary values.
  • Data/binary_naive.csv, Data/binary_2removed.csv: csv files containing the two binarized datasets after mapping.
  • binary_cnn.py: python script training and running the CNN model on either one of the latter dataset. See comment of scripts for execution.

About

Sentiment Analysis on movie reviews

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published