Skip to content

Latest commit

 

History

History
19 lines (14 loc) · 2.01 KB

README.md

File metadata and controls

19 lines (14 loc) · 2.01 KB

Using Sentiment Multi-label Analysis for MARVEL Character Review

This project describes a model that predicts whether movie text line belongs to one or more emotional classes. After model is trained over one data-set of movie lines, it is used for character analysis of other data-set - MARVEL movie lines. This part includes exploring what emotions characters encounter through a movie. For character analysis dataset of MARVEL movie lines is used, where most important characters are analysed. This model uses features derived from word and char n-grams, parts-ofspeech, word embedding and Opinion Lexicon.

DATA

  • XED dataset consists of emotion annotated movie subtitles (data/en-annotated.tsv). Movie lines in this dataset have following distribution: image

  • Marvel Universe dataset is created from the transcripts of Marvel Universe movies (data/mcu.csv). This dataset contains lines from over 600 characters. In this project only the most important ones are considered: image

  • GloVe - Global Vectors for Word Representation

METHODS

Two approaches for classification are compared: LinearRegression and LinearSVC (Suport Vector Classifier) classification algorithms. To translate these into multi-label problem, OneVsRestClassifier was used. This estimator uses the binary relevance method, which involves training one binary classifier independently for each label.

REPORT

In file Sentiment_multi_label_MARVEL.pdf you can find detailed project description. This includes preprocessing and feature extraction as well as presentation of results.