Commit d2cc051c authored by Sarish Aklujkar's avatar Sarish Aklujkar
Browse files

Tex file of report.

parent 943efee3
Loading
Loading
Loading
Loading

.DS_Store

0 → 100644
+6 KiB

File added.

No diff preview for this file type.

.Rapp.history

0 → 100644
+0 −0

Empty file added.

02_code/Report.tex

0 → 100644
+439 −0
Original line number Diff line number Diff line

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%	Institute of Financial Management
%	Department of Business Mathematics and Data Science
% 	Examplary Latex Document for Writing a Seminar Paper, 
%   Bachelor or Master Thesis
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% 	This is an exemplary document that shall explain the 
%   professional use of Latex in a scientific application. 
%   Latex has the following advantages:
%
% 	1.) Dealing with mathematical notation:
% 	    Layout and writing equations are generally easier using LaTeX 
%       compared to other editors.
%
% 	2.) Consistent handling of intra-document references and bibliography: 
%	    While the major WYSIWYG editors can perform similar tasks, handling
%   	and consistency of numbering, cross-references, and bibliographic items
%       is easier and more flexible in LaTeX.
%
%	3.) Separation of content and style:
%		In principle this means that you can write your document without
%       caring about how it is formatted, and at the end of the day wrap 
%       it in the style-file provided by a journal publisher or University to
%       conform to the required style. 
%        
%	4.) Tables and illustrations:
%       LaTeX allows to easily include high quality graphics (.eps) and many
%       software packages (e.g. STATA) can produce output tables in latex format
%       such that they can be included without further formatting necessary. 
%
%		We highly recommend the usage of LaTeX as it is some kind of scientific
%       standard. The earlier you get used to it, the easier it will be for you
%       to hand in professional looking assignments and thesis papers. The 
%       following packages and commands are only a limited selection of what is
%       possible, but it will get you started. You may want to adapt the header
%       to your needs. If you have an idea but do not know how to implement it in
%       LaTeX, don't hesitate and try to google it. You will see that nearly any
%       problem that you may face has been discussed before and there are many
%       solutions available online.
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%	(1.) Set Up a Document 		
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% 	Base LaTeX offers five classes of document: book, report, article
%   and letter. For each class, LaTeX provides a class file. The user 
%   arranges to use it via a \documentclass command at the top of the 
%   document. Additionally the user can specify the paper format and 
%   may change the font size.

\documentclass[a4paper,12pt]{article}

% 	After the document is set up, a variety of packages is loaded to 
%   customize the environment.

\usepackage[longnamesfirst, round]{natbib}  
%   The bun­dle pro­vides a pack­age that im­ple­ments both au­thor-year and
%   num­bered ref­er­ences, as well as much de­tailed of sup­port for other
%   bib­li­og­raphy use.

\usepackage[latin1]{inputenc}   
% 	The pack­age trans­lates var­i­ous stan­dard and other in­put en­cod­ings
%   into a ‘LATEX in­ter­nal lan­guage’. The in­ter­nal lan­guage is ex­pressed
%   en­tirely in TEX's base en­cod­ing (stan­dard ASCII print­able char­ac­ters,
%   car­riage con­trol to­kens and TEX con­trol se­quences, the lat­ter 
%	mostly de­fined by LATEX).
%   If German special characters are needed and do not work, try utf8
%   instead of latin1. Settings depend on your operating system.

\usepackage[T1]{fontenc}        
%	The pack­age al­lows the user to se­lect font en­cod­ings, and for each
%   en­cod­ing pro­vides an in­ter­face to ‘font-en­cod­ing-spe­cific’ com­mands 
%   for each font. Its most pow­er­ful ef­fect is to  en­able hy­phen­ation 
%   to op­er­ate on texts con­tain­ing any char­ac­ter (especially umlaut) 
%   in the font.

\usepackage{color}              
%	The color pack­age pro­vides both fore­ground (text, rules, etc.) and 
%   back­ground colour man­age­ment; it uses the de­vice driver con­fig­u­ra­tion 
%   mech­a­nisms of the graph­ics pack­age to de­ter­mine how to con­trol its oup­tut.

\usepackage{amsmath,amsfonts,amssymb}   
%	The prin­ci­pal pack­age in the AMS-LATEX dis­tri­bu­tion. It adapts for 
%   use in LATEX most of the math­e­mat­i­cal fea­tures found in AMS-TEX; it 
%   is highly rec­om­mended as an ad­junct to se­ri­ous math­e­mat­i­cal type­set­ting 
%   in LATEX. When ams­math is loaded, AMS-LATEX pack­ages ams­bsy (for bold 
%   sym­bols), am­sopn (for op­er­a­tor names) and am­s­text (for text em­bed­ded in 
%   math­e­mat­ics) are also loaded.

%\usepackage{ngerman}           
%	Sup­ports the new Ger­man or­thog­ra­phy (neue deutsche Rechtschrei­bung).

\usepackage[english]{babel}     
% 	The pack­age pro­vides the lan­guage def­i­ni­tion file for sup­port of English
%   in ba­bel. Care is taken to se­lect british hy­phen­ation pat­terns for Bri­tish
%   English and Aus­tralian text, and de­fault (‘amer­i­can’) pat­terns for Cana­dian
%   and USA text.

\usepackage{ae}                 
%	A set of vir­tual fonts which em­u­lates T1 coded fonts us­ing the stan­dard CM
%   fonts. The pack­age name, AE fonts, sup­pos­edly stands for “Al­most Euro­pean”.
%   The main use of the pack­age was to pro­duce PDF files us­ing Adobe Type 1 
%   ver­sions of the CM fonts in­stead of bitmapped EC fonts.

\usepackage{graphicx}           
% 	The pack­age builds upon the graph­ics pack­age, pro­vid­ing a key-value 
%   in­ter­face for op­tional ar­gu­ments to the \in­clude­graph­ics com­mand. It allows
%   to include graphics in all conventional formats (pdf, jpg, tif, ...).

\usepackage{epstopdf}
%	Allows to include .eps graphics which are converted on the fly to pdf.

\usepackage{longtable}          
% 	Longtable al­lows you to write ta­bles that con­tinue to the next page. 
%   You can write cap­tions within the ta­ble (typ­i­cally at the start of the 
%   ta­ble), and head­ers and trail­ers for pages of ta­ble. Longtable ar­ranges
%   that the columns on suc­ces­sive pages have the same widths.

\usepackage{booktabs}
% 	Allows to set nice vertical lines in the table environment.

\usepackage[flushleft]{threeparttable}
%	Allows to nicely write a description at the bottom of a table

\usepackage{multirow}         
% 	Allows to connect rows in tables.

\usepackage{url}
%	The com­mand \url is a form of ver­ba­tim com­mand that al­lows line­breaks
%   at cer­tain char­ac­ters or com­bi­na­tions of char­ac­ters, ac­cepts 
%   re­con­fig­u­ra­tion, and can usu­ally be used in the ar­gu­ment to an­other 
%   com­mand. The com­mand is in­tended for email ad­dresses, hy­per­text links, 
%   di­rec­to­ries/paths, etc., which nor­mally have no spaces, so by de­fault 
%   the pack­age ig­nores spaces in its ar­gu­ment. How­ever, a pack­age op­tion 
%   “al­lows spaces”, which is use­ful for op­er­at­ing sys­tems where spaces 
%   are a com­mon part of file names.

\usepackage{setspace}
%   Provides commands to adjust line spacing.

\usepackage{pdfpages}
%   Allows to include the pdf of the examinations office
%   (Eigenstaendigkeitserklaerung)


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%	(2.) Further Document Definitions 		
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\bibliographystyle{plainnat}    
%	Defines the style of the bibliography.

\oddsidemargin 0.1in \evensidemargin 0.1in \textwidth 15.5cm \topmargin -0.4in \textheight 24.5cm   
% 	Defines width of margins. 

\parindent 0cm  
% Defines indentation at the beginning fo a new paragraph.

\pagestyle{plain}          
% 		Empty header line, page number in the center of the footer line.

\newcommand{\bs}{\boldsymbol}  
% 		Shortcut to produce fat symbols in the math environment

\usepackage{blindtext}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%	(3.) Beginning of Document		
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{document}

\onehalfspacing
% Sets the line spacing to 1,5

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%	(4.) Title page		
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% In principle you can design the title page on your own. The following requirements should however
% be met: Centered title and description, information about the author and date listed on the title page.

\pagenumbering{roman}   
% 		Use roman numbers for page numbering 

\begin{titlepage}       
% 		Wrapper for title page definitions

\thispagestyle{empty}   
% 		No page numbering on the title page

% 		Title page text that shall be displayed in the center of the page
\begin{center}
\vspace*{2.5cm}
{\bf  \Large Sentiment and Emotion Analysis of user reviews\\Elden Ring game } \\
\vspace*{3cm} 
Mandatory Assignment 02 \\ Winter Term 2023-24\\ Inroduction to Applied Data Science\\
at the \\  Faculty of Business, Economics and Social Sciences\\ 
MSc. International Business and Economics\\
University of Hohenheim
% adapt to your needs and give the appropriate information 
\end{center}

%		Information about the author of the thesis
\vfill

\hfill \begin{minipage}{0.5\linewidth}
 First examiner: Prof.\ Dr.\ Thomas Dimpfl \\
 Second examiner: Sophia Koch \\
 Third examiner: Dr. Johannes Bleher\\
 % examiners only needed for master thesis
 
 Submitted by: \\
 Hetvi Ariwala (996729) \\
 Sarish Aklujkar (991260)\\
 
 Date of Submission: \today 
\end{minipage}


\end{titlepage}

\newpage                
% Enforces a page break

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%	(5.) Table of Contents		
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%	Includes a table of contents and a list of your figures and tables.
%\tableofcontents
%\listoffigures
%%\listoftables
%%   In most journal publications, none of these is actually used.
%\newpage

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%	(6.) Main Body
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\pagenumbering{arabic}      
% 		Set page numbering back to arabic numbers
\setcounter{page}{1}  
% 		Set counter back to 1  

% 	You can use different levels of sections to structure the body of your work. 
\section{Abstract}  
%   The label allows you to later reference this Section using \ref{Chapter:intro}. 
%   If at any stage you insert another section before, the reference is 
%   automatically updated.

%\subsection{Formal Guidelines}

This report presents a detailed sentiment analysis of user reviews for Elden Ring, utilizing data initially gathered in the first assignment where the top 100 games were extracted from the Steam Spy API, with game details and user reviews sourced from the SteamPowered API. Utilizing R software and sophisticated algorithms, this analysis assigns sentiment scores to each review, aiming to differentiate overall player satisfaction and pinpoint aspects of the game eliciting positive or negative reactions. The primary goal is to decode the emotional undertones in the reviews, offering an analytical view of the Elden Ring gaming experience. By analyzing this data, the report aims to illuminate the gaming community's reception of Elden Ring and enhance our comprehension of player feedback.

\section{Data Cleaning}

The data cleaning process for the 'gamereviews' dataset was precisely executed to prepare the user reviews for detailed analysis. Initially, regular expressions were utilized to remove non-ASCII characters and URLs, standardizing the textual content and eliminating irrelevant external references. This step also included stretching out shortened words to their full form to keep the language consistent and reducing repeated letters or symbols in words. Further cleaning steps included the systematic removal of numerical digits, punctuation, and single-character responses, enhancing the relevance and readability of the data. \\

Text normalization was a key aspect, with conversions to lowercase and the elimination of excess whitespace using regular expressions, thereby ensuring uniformity across the dataset. Stop words, barring the term "not," were strategically filtered out to focus on meaningful content, crucial for sentiment analysis. Additionally, the process involved spell checking against an English dictionary, reinforcing the textual accuracy and reliability. The cleaning process concludes with the generation of a word cloud, visually representing the most prevalent words in the refined reviews. This step provided a brief overview of dominant themes and insights, forming a foundation for subsequent in-depth textual analysis. 

\section{Methodology}

The sentiment analysis of the 'gamereviews' dataset involved a multi-faceted approach to understand user sentiments and emotions conveyed in the reviews. Initially, the dataset was streamlined to focus on essential columns relevant for sentiment analysis. A key part of the process was the implementation of a manual sentiment analysis technique. This involved classifying each review as positive, negative, or neutral based on the occurrence of specific positive and negative words, and accounting for negation. The analysis also incorporated a check for consistency in sentiment using an external list of positive and negative words from Kaggle by Hu and Bing Liu. \\

To complement the manual approach, automated sentiment analysis methods were employed using various sentiment dictionaries. These methods allowed for a broader and more nuanced analysis, capturing different dimensions of sentiment expressed in the reviews. The results from both manual and automated methods were then compared to assess the consistency of sentiment scoring across different techniques. The sentiment analysis of the 'gamereviews' dataset was enhanced by incorporating two additional dictionaries, AFINN and NRC, broadening the scope of analysis. AFINN provided a direct sentiment score, while NRC offered a detailed view of various emotions. This combination allowed for a richer, more layered understanding of user sentiments and emotions. A key component of the analysis was the aggregation of emotion scores from the NRC dictionary. These scores were systematically compiled into a data frame, facilitating an organized assessment of the most to least prevalent emotions in user reviews. This step was critical for identifying dominant emotional trends and patterns.\\

The analysis concluded with a graphical depiction of the emotions using a bar plot, which employed a specially chosen color palette for enhanced clarity. This chart effectively illustrated the range of emotions present in the reviews. Moreover, the emotional weightages were quantified as percentages, offering a clear view of the predominant and minimal emotions. This provided valuable insights into the overall emotional landscape of the user reviews.\\

\section{Sentiment Analysis}

After conducting manual sentiment analysis on the user reviews from the "rev.sentiment" dataset, the results reveal an interesting distribution of sentiments. Among the total reviews analysed, 4481 were classified as positive, 1905 as negative, and 3232 as neutral. These findings provide valuable insights into the overall sentiment trends within the dataset. The higher count of positive reviews suggests a generally favourable sentiment among users, while the presence of negative and neutral reviews indicates a diverse range of opinions. \\

The comparison between manual sentiment analysis, conducted using a Kaggle-sourced dictionary, and the automated analysis performed with the Bing dictionary from Syuzhet package reveals a significant alignment in their outcomes. With 74.02\% of the results matching, there is a substantial level of agreement in how both methods classify sentiments in user reviews. The robustness of the automated approach in capturing sentiments, as validated by the high identical score, enhances confidence in the reliability of the automated sentiment analysis results. It also signifies that the syuzhet package, specifically its implementation with the Bing dictionary, can effectively replicate the sentiment assessments made through manual analysis. However, the 25.98\% difference in results implies some discrepancies between the two approaches. The disparity observed in the sentiment scores obtained from the Syuzhet function using Bing dictionary and the manual sentiment analysis reveals nuances in the scoring mechanisms employed by the two methods. The Syuzhet package offers a more varied spectrum of sentiment scores, with positive reviews receiving a range of values such as 3, 7, or 17, and negative reviews being assigned scores like -4, -7, or -16, depending on their intensity. Neutral reviews are consistently scored as 0. This contrasts with the manual sentiment analysis approach, which applies a more straightforward scoring system: -1 for negative reviews, 0 for neutral ones, and 1 for positive reviews.\\

This discrepancy in scoring mechanisms highlights the inherent differences in the underlying algorithms or dictionaries used by the Syuzhet package and the manual analysis approach. The broader range of scores in the Syuzhet function may reflect a more granular evaluation of sentiment, capturing subtle variations in the intensity of positive or negative expressions. On the other hand, the manual sentiment analysis, with its simplified scoring, provides a more categorical classification of sentiments.\\

% Including a graphic
\begin{figure}[!htbp] % You can force the position of the graphic using parameters in brackets: current setting is ``exactly here''
\caption{Graphic Title} 
\vspace{5mm}
\label{fig:firstGraphic}	% set label to reference the graphic
\centering
\includegraphics[scale=0.6]{Autocorrelation.eps}  % width defines the width of the graphic
\begin{minipage}{\textwidth}
\vspace*{3pt}
\footnotesize{The figure should be described in a way that it is possible to understand it without reading the main body of your text first.}
\end{minipage}
\end{figure}

% Include another graphic
\begin{figure}  % or you let tex determine where it fits best
\caption{Graphic Title}
\vspace{5mm}
\label{fig:secondGraphic} \centering
\centering
\includegraphics[width=0.9\linewidth]{condvgarch1.eps}
\begin{minipage}{\textwidth}
\vspace*{3pt}
\footnotesize{The figure should be described in a way that it is possible to understand it
 without reading the main body of your text first.}
\end{minipage}
\end{figure}
 
The tilde between 
\glqq \verb|Figure|\grqq{} and \glqq \verb|\ref{fig:secondGraphic}|\grqq{} prevents the number to be placed at the beginning of the next line in case of a line break.  Analogously we can deal with Table~\ref{tab:table}. 

In the following you may find some ideas about how to efficiently use the math environment:

\begin{align}
\lim_{x \to \infty} \exp(-x) &= 0\\
\frac{n!}{k!(n-k)!} &= \binom{n}{k}\\
\sqrt[n]{1+x+x^2+x^3+\dots+x^n}&=\text{$n^{th}$ root}\\
( \big( \Big( \bigg( \Bigg( \sum_{i=1}^{10} t_i &\ne \int_0^\infty \mathrm{e}^{-x}\,\mathrm{d}x \Bigg) 
\bigg) \Big) \big) )  \\
\Rightarrow A_{m,n} &= 
 \begin{pmatrix}
  a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
  a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
  \vdots  & \vdots  & \ddots & \vdots  \\
  a_{m,1} & a_{m,2} & \cdots & a_{m,n} 
 \end{pmatrix}
\end{align}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%										(7.) Required programs
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Required programs}

Windows:
\begin{itemize}
   \item Miktex (\url{http://miktex.org/})
   \item an editor, according to taste e.g. WinEdt (\url{http://www.winedt.com/}; fee-based student version) 
or other freeware, e.g. TeXnicCenter (\url{www.texniccenter.org/})
   \item ghostview and ghostscript (\url{http://pages.cs.wisc.edu/~ghost/}
\end{itemize}
Linux:
\begin{itemize}
   \item Latex is mostly available in all distributions, e.g. tetex in Suse (in case it is not, install it via yast)
   \item as an editor we recommend Kile
\end{itemize}
for bibliography management you may use:\\
for example JabRef (\url{http://jabref.sourceforge.net/})

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%	(8.) Presentations
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Presentations}

You find examples and templates for document class 'beamer':

\url{http://www.informatik.uni-freiburg.de/~frank/latex-kurs/latex-kurs-3/Latex-Kurs-3.html}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%	(9.) End of main body
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\newpage

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%	(10.) Bibliography
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\addcontentsline{toc}{section}{References}        % Adds references to table of contents
\bibliography{bibexample}                         % Creates a bibliography at the end of the LaTeX document. The bib-file loaded here is what you would have created with jabref. You can also give an absolute path if the bib-file is not in the folder where your tex-file lies. Same holds true for anything that is loaded by the way (figures, for example)

\newpage

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
%	(11.) Appendix
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Appendix} 

This could be the appendix if you really need one. 





\newpage

\blindtext[5]
\begin{table}[!htbp] 
\centering
\begin{threeparttable}
\caption{Small Sample Table}
\label{tab:table}
 \begin{tabular}{lc|r}
\toprule 
   A very & small sample & table\\
\hline
   first colum left & second column centered & third column right \\
   & underlined second column  & \\
\cline{2-2}
   \multicolumn{2}{c|}{Write across two columns} & Third column \\
\bottomrule
\end{tabular}   
\begin{tablenotes}
\item \footnotesize{Table \ref{tab:table} should be described in a way that it is possible to understand it without reading the main body of your text first.}
\end{tablenotes}
\end{threeparttable}
\end{table}

\end{document}