SENTIMENT ANALYSIS
(NLP and Text Mining)
Problem Statement
The main objective in this Internship Project is to predict the sentiment for a number of movie reviews obtained from the Internet Movie Database (IMDb). This dataset contains 50,000 movie reviews that have been pre-labeled with “positive” and “negative” sentiment class labels based on the review content. Besides this, there are additional movie reviews that are unlabeled.
The dataset can be obtained from http://ai.stanford.edu/~amaas/data/sentiment/ , courtesy of Stanford University and Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. They have datasets in the form of raw text as well as already processed bag of words formats. We will only be using the raw labeled movie reviews for our analyses.
Hence our task will be to predict the sentiment of 15,000 labeled movie reviews and use the remaining 35,000 reviews for training our supervised models.
Sentiment analysis is also popularly known as opinion analysis or opinion mining. The key idea is to use techniques from text analytics, NLP, Machine Learning, and linguistics to extract important information or data points from unstructured text. This in turn can help us derive qualitative outputs like the overall sentiment being on a positive, neutral, or negative scale and quantitative outputs like the sentiment polarity, subjectivity, and objectivity proportions.
In this Coding Internship project by Suven Consultants and Technology Pvt. Ltd. , we focus on trying to analyze a large corpus of movie reviews and derive the sentiment.
We would cover a two varieties of techniques for analyzing sentiment, which include the following.
- Traditional supervised Machine Learning models
- Unsupervised lexicon-based models
Supervised Learning
- Setting up Dependencies
2. Text Normalisation(using Text_normalizer.py) & Feature Engineering
A text corpus consists of multiple text documents and each document can be as simple as a single sentence to a complete document with multiple paragraphs. Textual data, in spite of being highly unstructured, can be classified into two major types of documents. Factual documents that typically depict some form of statements or facts with no specific feelings or emotion attached to them. These are also known as objective documents. Subjective documents on the other hand have text that expresses feelings, moods, emotions, and opinions.
3. Model Training , Prediction and evaluation using Model_evaluation_util.py
Here is the Jupyter Notebook of Supervised Learning
4. Summary :-
The F1-score of the model using traditional Supervised Learning is 89.68% and an accuracy of 89.69% approximately .
Unsupervised Lexicon Model
Unsupervised Lexicon Model :-
There are several popular lexicon models used for sentiment analysis. We would be using 3 lexicon Models mentioned below :-
· AFINN Lexicon
· SentiWordNet Lexicon
· VADER Lexicon
- Setting up Dependencies
- Sentiment Analysis using AFINN
- Model training,Prediction and Evaluation
Sentiment polarity is typically a numeric score that’s assigned to both the positive and negative aspects of a text document based on subjective parameters like specific words and phrases expressing feelings and emotion. Neutral sentiment typically has 0 polarity since it does not express and specific sentiment, positive sentiment will have polarity > 0, and negative < 0. Of course, you can always change these thresholds based on the type of text you are dealing with; there are no hard constraints on this.
3. Sentiment Analysis using SentiWordNet
4. Sentiment Analysis using VADER
Here is the Jupyter Notebook of Unsupervised Lexicon Model
5. SUMMARY :-
Method |
F1 Score |
Accuracy |
AFINN |
70.6 |
72.8 |
SentiWordNet |
68.3 |
68.7 |
VADER |
70.6 |
72.4 |
Therefore the best Unsupervised Lexicon Model is AFINN.
CONCLUSION
On comparing the overall F1-Score and model accuracy of Supervised ML Model with the best
Unsupervised Lexicon Model ,we conclude that Supervised Leaning gives us an
more accurate and good model than Unsupervised Lexicon Model .
This uses both Supervised and Unsupervised learning?!
ReplyDeleteWhoa!!
No its an comparison of supervised and unsupervised learning models after which you can decide which one you want to use according to your needs
DeleteQuite Informative :)
ReplyDeleteVery informative.
ReplyDeleteNice one man! Perfect analysis.
ReplyDeleteWe handle world-class NLP tasks and solve tough problems. Whether you are looking to identify emerging trends, manage customer complaints or automate analyst tasks, we provide a variety of NLP solutions.https://divedeep.ai/natural-language-processing-and-sentiment-analysis-solutions
ReplyDeletehttps://francoselectric.com/
ReplyDeleteOur electricians can install anything from new security lighting for your outdoors to a whole home generator that will keep your appliances working during a power outage. Our installation services are always done promptly and safely.
Full-service electrical layout, design
Wiring and installation/upgrades
Emergency power solutions (generators)
Virtually any electrical needs you have – just ask!
hy i am francoselectric
ReplyDeleteWe offer a variety of electrical services for both residential and commercial properties, including upgrades, repairs, replacements, and installations.
Electrical Safety and Maintenance Inspections
Electrical panel upgrades
Recessed lighting
Electrical troubleshooting
Dedicated circuits
Wiring
Electrical Surge Protection
Attic and bath exhaust fans
Ceiling fan installation
Fixture repair and replacement
Outdoor/landscape lighting
Hot tub and appliance wiring
Dedicated Circuits and sub panels
Dimmer and light fixture installation
Breaker replacements