% Defines indentation at the beginning fo a new paragraph.
\pagestyle{plain}
% Empty header line, page number in the center of the footer line.
\newcommand{\bs}{\boldsymbol}
% Shortcut to produce fat symbols in the math environment
\usepackage{blindtext}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% (3.) Beginning of Document
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\onehalfspacing
% Sets the line spacing to 1,5
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% (4.) Title page
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% In principle you can design the title page on your own. The following requirements should however
% be met: Centered title and description, information about the author and date listed on the title page.
\pagenumbering{roman}
% Use roman numbers for page numbering
\begin{titlepage}
% Wrapper for title page definitions
\thispagestyle{empty}
% No page numbering on the title page
% Title page text that shall be displayed in the center of the page
\begin{center}
\vspace*{2.5cm}
{\bf\Large Sentiment and Emotion Analysis of user reviews\\Elden Ring game }\\
\vspace*{3cm}
Mandatory Assignment 02 \\ Winter Term 2023-24\\ Inroduction to Applied Data Science\\
at the \\ Faculty of Business, Economics and Social Sciences\\
MSc. International Business and Economics\\
University of Hohenheim
% adapt to your needs and give the appropriate information
\end{center}
% Information about the author of the thesis
\vfill
\hfill\begin{minipage}{0.5\linewidth}
First examiner: Prof.\ Dr.\ Thomas Dimpfl \\
Second examiner: Sophia Koch \\
Third examiner: Dr. Johannes Bleher\\
% examiners only needed for master thesis
Submitted by: \\
Hetvi Ariwala (996729) \\
Sarish Aklujkar (991260)\\
Date of Submission: \today
\end{minipage}
\end{titlepage}
\newpage
% Enforces a page break
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% (5.) Table of Contents
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Includes a table of contents and a list of your figures and tables.
%\tableofcontents
%\listoffigures
%%\listoftables
%% In most journal publications, none of these is actually used.
%\newpage
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% (6.) Main Body
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\pagenumbering{arabic}
% Set page numbering back to arabic numbers
\setcounter{page}{1}
% Set counter back to 1
% You can use different levels of sections to structure the body of your work.
\section{Abstract}
% The label allows you to later reference this Section using \ref{Chapter:intro}.
% If at any stage you insert another section before, the reference is
% automatically updated.
%\subsection{Formal Guidelines}
This report presents a detailed sentiment analysis of user reviews for Elden Ring, utilizing data initially gathered in the first assignment where the top 100 games were extracted from the Steam Spy API, with game details and user reviews sourced from the SteamPowered API. Utilizing R software and sophisticated algorithms, this analysis assigns sentiment scores to each review, aiming to differentiate overall player satisfaction and pinpoint aspects of the game eliciting positive or negative reactions. The primary goal is to decode the emotional undertones in the reviews, offering an analytical view of the Elden Ring gaming experience. By analyzing this data, the report aims to illuminate the gaming community's reception of Elden Ring and enhance our comprehension of player feedback.
\section{Data Cleaning}
The data cleaning process for the 'gamereviews' dataset was precisely executed to prepare the user reviews for detailed analysis. Initially, regular expressions were utilized to remove non-ASCII characters and URLs, standardizing the textual content and eliminating irrelevant external references. This step also included stretching out shortened words to their full form to keep the language consistent and reducing repeated letters or symbols in words. Further cleaning steps included the systematic removal of numerical digits, punctuation, and single-character responses, enhancing the relevance and readability of the data. \\
Text normalization was a key aspect, with conversions to lowercase and the elimination of excess whitespace using regular expressions, thereby ensuring uniformity across the dataset. Stop words, barring the term "not," were strategically filtered out to focus on meaningful content, crucial for sentiment analysis. Additionally, the process involved spell checking against an English dictionary, reinforcing the textual accuracy and reliability. The cleaning process concludes with the generation of a word cloud, visually representing the most prevalent words in the refined reviews. This step provided a brief overview of dominant themes and insights, forming a foundation for subsequent in-depth textual analysis.
\section{Methodology}
The sentiment analysis of the 'gamereviews' dataset involved a multi-faceted approach to understand user sentiments and emotions conveyed in the reviews. Initially, the dataset was streamlined to focus on essential columns relevant for sentiment analysis. A key part of the process was the implementation of a manual sentiment analysis technique. This involved classifying each review as positive, negative, or neutral based on the occurrence of specific positive and negative words, and accounting for negation. The analysis also incorporated a check for consistency in sentiment using an external list of positive and negative words from Kaggle by Hu and Bing Liu. \\
To complement the manual approach, automated sentiment analysis methods were employed using various sentiment dictionaries. These methods allowed for a broader and more nuanced analysis, capturing different dimensions of sentiment expressed in the reviews. The results from both manual and automated methods were then compared to assess the consistency of sentiment scoring across different techniques. The sentiment analysis of the 'gamereviews' dataset was enhanced by incorporating two additional dictionaries, AFINN and NRC, broadening the scope of analysis. AFINN provided a direct sentiment score, while NRC offered a detailed view of various emotions. This combination allowed for a richer, more layered understanding of user sentiments and emotions. A key component of the analysis was the aggregation of emotion scores from the NRC dictionary. These scores were systematically compiled into a data frame, facilitating an organized assessment of the most to least prevalent emotions in user reviews. This step was critical for identifying dominant emotional trends and patterns.\\
The analysis concluded with a graphical depiction of the emotions using a bar plot, which employed a specially chosen color palette for enhanced clarity. This chart effectively illustrated the range of emotions present in the reviews. Moreover, the emotional weightages were quantified as percentages, offering a clear view of the predominant and minimal emotions. This provided valuable insights into the overall emotional landscape of the user reviews.\\
\section{Sentiment Analysis}
After conducting manual sentiment analysis on the user reviews from the "rev.sentiment" dataset, the results reveal an interesting distribution of sentiments. Among the total reviews analysed, 4481 were classified as positive, 1905 as negative, and 3232 as neutral. These findings provide valuable insights into the overall sentiment trends within the dataset. The higher count of positive reviews suggests a generally favourable sentiment among users, while the presence of negative and neutral reviews indicates a diverse range of opinions. \\
The comparison between manual sentiment analysis, conducted using a Kaggle-sourced dictionary, and the automated analysis performed with the Bing dictionary from Syuzhet package reveals a significant alignment in their outcomes. With 74.02\% of the results matching, there is a substantial level of agreement in how both methods classify sentiments in user reviews. The robustness of the automated approach in capturing sentiments, as validated by the high identical score, enhances confidence in the reliability of the automated sentiment analysis results. It also signifies that the syuzhet package, specifically its implementation with the Bing dictionary, can effectively replicate the sentiment assessments made through manual analysis. However, the 25.98\% difference in results implies some discrepancies between the two approaches. The disparity observed in the sentiment scores obtained from the Syuzhet function using Bing dictionary and the manual sentiment analysis reveals nuances in the scoring mechanisms employed by the two methods. The Syuzhet package offers a more varied spectrum of sentiment scores, with positive reviews receiving a range of values such as 3, 7, or 17, and negative reviews being assigned scores like -4, -7, or -16, depending on their intensity. Neutral reviews are consistently scored as 0. This contrasts with the manual sentiment analysis approach, which applies a more straightforward scoring system: -1 for negative reviews, 0 for neutral ones, and 1 for positive reviews.\\
This discrepancy in scoring mechanisms highlights the inherent differences in the underlying algorithms or dictionaries used by the Syuzhet package and the manual analysis approach. The broader range of scores in the Syuzhet function may reflect a more granular evaluation of sentiment, capturing subtle variations in the intensity of positive or negative expressions. On the other hand, the manual sentiment analysis, with its simplified scoring, provides a more categorical classification of sentiments.\\
% Including a graphic
\begin{figure}[!htbp] % You can force the position of the graphic using parameters in brackets: current setting is ``exactly here''
\caption{Graphic Title}
\vspace{5mm}
\label{fig:firstGraphic}% set label to reference the graphic
\centering
\includegraphics[scale=0.6]{Autocorrelation.eps}% width defines the width of the graphic
\begin{minipage}{\textwidth}
\vspace*{3pt}
\footnotesize{The figure should be described in a way that it is possible to understand it without reading the main body of your text first.}
\end{minipage}
\end{figure}
% Include another graphic
\begin{figure}% or you let tex determine where it fits best
\footnotesize{The figure should be described in a way that it is possible to understand it
without reading the main body of your text first.}
\end{minipage}
\end{figure}
The tilde between
\glqq\verb|Figure|\grqq{} and \glqq\verb|\ref{fig:secondGraphic}|\grqq{} prevents the number to be placed at the beginning of the next line in case of a line break. Analogously we can deal with Table~\ref{tab:table}.
In the following you may find some ideas about how to efficiently use the math environment:
\addcontentsline{toc}{section}{References}% Adds references to table of contents
\bibliography{bibexample}% Creates a bibliography at the end of the LaTeX document. The bib-file loaded here is what you would have created with jabref. You can also give an absolute path if the bib-file is not in the folder where your tex-file lies. Same holds true for anything that is loaded by the way (figures, for example)
\newpage
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% (11.) Appendix
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Appendix}
This could be the appendix if you really need one.
\newpage
\blindtext[5]
\begin{table}[!htbp]
\centering
\begin{threeparttable}
\caption{Small Sample Table}
\label{tab:table}
\begin{tabular}{lc|r}
\toprule
A very & small sample & table\\
\hline
first colum left & second column centered & third column right \\
& underlined second column &\\
\cline{2-2}
\multicolumn{2}{c|}{Write across two columns}& Third column \\
\bottomrule
\end{tabular}
\begin{tablenotes}
\item\footnotesize{Table \ref{tab:table} should be described in a way that it is possible to understand it without reading the main body of your text first.}