Movie reviews sentiment analysis is a project which is based on natural language processing, where we use NLP techniques to extract useful words of each review and based on these words we can use binary classification to predict the movie sentiment if it's positive or negative
This is list of required packages and modules for the project to be installed :
- Python 3.x
- Pandas
- Numpy
- re
- Scikit-learn
- NLTK
Install all required packages :
pip install -r requirements.txt
Human activites dataset contain about 50000 record which is a sample of movie's review
and a target column "sentiment" which describe the sentiment of the viewer about the movie either it is positove or negative
Dataset features and target :
Dataset head :
In this part we will see the project code divided to sections as follows:
-
Section 1 | Data Preprocessing :
In this section we aim to do some operations on the dataset before training the model on it,
processes like :- Loading the dataset
- Encoding ouput to binary (Positive : 1 , Negative : 0)
- Data cleaning : Remove HTML tags
- Data cleaning : Remove special characters
- Data cleaning : Convert everything to lowercase
- Data cleaning : Remove stopwords
- Data cleaning : Stemming
-
Section 2 | Model Creation :
The dataset is ready for training, so we create a Naive Bayes model using scikit-learn and then fit it to the data. -
Section 3 | Model Evaluation :
Finally we evaluate the model by getting accuracy, classification report and confusion matrix.
- Clone the repo
git clone https://github.com/omaarelsherif/Movie-Reviews-Sentiment-Analysis-Using-Machine-Learning.git
- Run the code from cmd
python movie_reviews_sentiment_analysis.py
Now let's see the project output after running the code :
Dataset after output encoding :
Review sample after removing HTML tags :
Review sample after removing special characters :
Review sample after converting words to lowercase :
Review sample after removing stopwords :
Review sample after stemming words :
These links may help you to better understanding of the project idea and techniques used :
- Natural Language Processing (NLP) : https://ibm.co/38bN03T
- Sentiment analysis : https://bit.ly/3yi9BGq
- Naive Bayes classifier : https://bit.ly/3zhoWIO
- Model evaluation : https://bit.ly/3B12VOO
- E-mail : [email protected]
- LinkedIn : https://www.linkedin.com/in/omaarelsherif/
- Facebook : https://www.facebook.com/omaarelshereif