Tex file of report. (d2cc051c) · Commits · Hetvi Ariwala / IntroADS_Ass2_Team18

.DS_Store

0 → 100644

+6 KiB

File added.

No diff preview for this file type.

View file

.Rapp.history

0 → 100644

+0 −0

Empty file added.

02_code/Report.tex

0 → 100644

+439 −0

Original line number	Diff line number	Diff line

		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
		%
		% Institute of Financial Management
		% Department of Business Mathematics and Data Science
		% Examplary Latex Document for Writing a Seminar Paper,
		% Bachelor or Master Thesis
		%
		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
		%
		% This is an exemplary document that shall explain the
		% professional use of Latex in a scientific application.
		% Latex has the following advantages:
		%
		% 1.) Dealing with mathematical notation:
		% Layout and writing equations are generally easier using LaTeX
		% compared to other editors.
		%
		% 2.) Consistent handling of intra-document references and bibliography:
		% While the major WYSIWYG editors can perform similar tasks, handling
		% and consistency of numbering, cross-references, and bibliographic items
		% is easier and more flexible in LaTeX.
		%
		% 3.) Separation of content and style:
		% In principle this means that you can write your document without
		% caring about how it is formatted, and at the end of the day wrap
		% it in the style-file provided by a journal publisher or University to
		% conform to the required style.
		%
		% 4.) Tables and illustrations:
		% LaTeX allows to easily include high quality graphics (.eps) and many
		% software packages (e.g. STATA) can produce output tables in latex format
		% such that they can be included without further formatting necessary.
		%
		% We highly recommend the usage of LaTeX as it is some kind of scientific
		% standard. The earlier you get used to it, the easier it will be for you
		% to hand in professional looking assignments and thesis papers. The
		% following packages and commands are only a limited selection of what is
		% possible, but it will get you started. You may want to adapt the header
		% to your needs. If you have an idea but do not know how to implement it in
		% LaTeX, don't hesitate and try to google it. You will see that nearly any
		% problem that you may face has been discussed before and there are many
		% solutions available online.
		%
		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
		%
		% (1.) Set Up a Document
		%
		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

		% Base LaTeX offers five classes of document: book, report, article
		% and letter. For each class, LaTeX provides a class file. The user
		% arranges to use it via a \documentclass command at the top of the
		% document. Additionally the user can specify the paper format and
		% may change the font size.

		\documentclass[a4paper,12pt]{article}

		% After the document is set up, a variety of packages is loaded to
		% customize the environment.

		\usepackage[longnamesfirst, round]{natbib}
		% The bundle provides a package that implements both author-year and
		% numbered references, as well as much detailed of support for other
		% bibliography use.

		\usepackage[latin1]{inputenc}
		% The package translates various standard and other input encodings
		% into a ‘LATEX internal language’. The internal language is expressed
		% entirely in TEX's base encoding (standard ASCII printable characters,
		% carriage control tokens and TEX control sequences, the latter
		% mostly defined by LATEX).
		% If German special characters are needed and do not work, try utf8
		% instead of latin1. Settings depend on your operating system.

		\usepackage[T1]{fontenc}
		% The package allows the user to select font encodings, and for each
		% encoding provides an interface to ‘font-encoding-specific’ commands
		% for each font. Its most powerful effect is to enable hyphenation
		% to operate on texts containing any character (especially umlaut)
		% in the font.

		\usepackage{color}
		% The color package provides both foreground (text, rules, etc.) and
		% background colour management; it uses the device driver configuration
		% mechanisms of the graphics package to determine how to control its ouptut.

		\usepackage{amsmath,amsfonts,amssymb}
		% The principal package in the AMS-LATEX distribution. It adapts for
		% use in LATEX most of the mathematical features found in AMS-TEX; it
		% is highly recommended as an adjunct to serious mathematical typesetting
		% in LATEX. When amsmath is loaded, AMS-LATEX packages amsbsy (for bold
		% symbols), amsopn (for operator names) and amstext (for text embedded in
		% mathematics) are also loaded.

		%\usepackage{ngerman}
		% Supports the new German orthography (neue deutsche Rechtschreibung).

		\usepackage[english]{babel}
		% The package provides the language definition file for support of English
		% in babel. Care is taken to select british hyphenation patterns for British
		% English and Australian text, and default (‘american’) patterns for Canadian
		% and USA text.

		\usepackage{ae}
		% A set of virtual fonts which emulates T1 coded fonts using the standard CM
		% fonts. The package name, AE fonts, supposedly stands for “Almost European”.
		% The main use of the package was to produce PDF files using Adobe Type 1
		% versions of the CM fonts instead of bitmapped EC fonts.

		\usepackage{graphicx}
		% The package builds upon the graphics package, providing a key-value
		% interface for optional arguments to the \includegraphics command. It allows
		% to include graphics in all conventional formats (pdf, jpg, tif, ...).

		\usepackage{epstopdf}
		% Allows to include .eps graphics which are converted on the fly to pdf.

		\usepackage{longtable}
		% Longtable allows you to write tables that continue to the next page.
		% You can write captions within the table (typically at the start of the
		% table), and headers and trailers for pages of table. Longtable arranges
		% that the columns on successive pages have the same widths.

		\usepackage{booktabs}
		% Allows to set nice vertical lines in the table environment.

		\usepackage[flushleft]{threeparttable}
		% Allows to nicely write a description at the bottom of a table

		\usepackage{multirow}
		% Allows to connect rows in tables.

		\usepackage{url}
		% The command \url is a form of verbatim command that allows linebreaks
		% at certain characters or combinations of characters, accepts
		% reconfiguration, and can usually be used in the argument to another
		% command. The command is intended for email addresses, hypertext links,
		% directories/paths, etc., which normally have no spaces, so by default
		% the package ignores spaces in its argument. However, a package option
		% “allows spaces”, which is useful for operating systems where spaces
		% are a common part of file names.

		\usepackage{setspace}
		% Provides commands to adjust line spacing.

		\usepackage{pdfpages}
		% Allows to include the pdf of the examinations office
		% (Eigenstaendigkeitserklaerung)


		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
		%
		% (2.) Further Document Definitions
		%
		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

		\bibliographystyle{plainnat}
		% Defines the style of the bibliography.

		\oddsidemargin 0.1in \evensidemargin 0.1in \textwidth 15.5cm \topmargin -0.4in \textheight 24.5cm
		% Defines width of margins.

		\parindent 0cm
		% Defines indentation at the beginning fo a new paragraph.

		\pagestyle{plain}
		% Empty header line, page number in the center of the footer line.

		\newcommand{\bs}{\boldsymbol}
		% Shortcut to produce fat symbols in the math environment

		\usepackage{blindtext}

		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
		%
		% (3.) Beginning of Document
		%
		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

		\begin{document}

		\onehalfspacing
		% Sets the line spacing to 1,5

		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
		%
		% (4.) Title page
		%
		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

		% In principle you can design the title page on your own. The following requirements should however
		% be met: Centered title and description, information about the author and date listed on the title page.

		\pagenumbering{roman}
		% Use roman numbers for page numbering

		\begin{titlepage}
		% Wrapper for title page definitions

		\thispagestyle{empty}
		% No page numbering on the title page

		% Title page text that shall be displayed in the center of the page
		\begin{center}
		\vspace*{2.5cm}
		{\bf \Large Sentiment and Emotion Analysis of user reviews\\Elden Ring game } \\
		\vspace*{3cm}
		Mandatory Assignment 02 \\ Winter Term 2023-24\\ Inroduction to Applied Data Science\\
		at the \\ Faculty of Business, Economics and Social Sciences\\
		MSc. International Business and Economics\\
		University of Hohenheim
		% adapt to your needs and give the appropriate information
		\end{center}

		% Information about the author of the thesis
		\vfill

		\hfill \begin{minipage}{0.5\linewidth}
		First examiner: Prof.\ Dr.\ Thomas Dimpfl \\
		Second examiner: Sophia Koch \\
		Third examiner: Dr. Johannes Bleher\\
		% examiners only needed for master thesis

		Submitted by: \\
		Hetvi Ariwala (996729) \\
		Sarish Aklujkar (991260)\\

		Date of Submission: \today
		\end{minipage}


		\end{titlepage}

		\newpage
		% Enforces a page break

		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
		%
		% (5.) Table of Contents
		%
		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

		% Includes a table of contents and a list of your figures and tables.
		%\tableofcontents
		%\listoffigures
		%%\listoftables
		%% In most journal publications, none of these is actually used.
		%\newpage

		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
		%
		% (6.) Main Body
		%
		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

		\pagenumbering{arabic}
		% Set page numbering back to arabic numbers
		\setcounter{page}{1}
		% Set counter back to 1

		% You can use different levels of sections to structure the body of your work.
		\section{Abstract}
		% The label allows you to later reference this Section using \ref{Chapter:intro}.
		% If at any stage you insert another section before, the reference is
		% automatically updated.

		%\subsection{Formal Guidelines}

		This report presents a detailed sentiment analysis of user reviews for Elden Ring, utilizing data initially gathered in the first assignment where the top 100 games were extracted from the Steam Spy API, with game details and user reviews sourced from the SteamPowered API. Utilizing R software and sophisticated algorithms, this analysis assigns sentiment scores to each review, aiming to differentiate overall player satisfaction and pinpoint aspects of the game eliciting positive or negative reactions. The primary goal is to decode the emotional undertones in the reviews, offering an analytical view of the Elden Ring gaming experience. By analyzing this data, the report aims to illuminate the gaming community's reception of Elden Ring and enhance our comprehension of player feedback.

		\section{Data Cleaning}

		The data cleaning process for the 'gamereviews' dataset was precisely executed to prepare the user reviews for detailed analysis. Initially, regular expressions were utilized to remove non-ASCII characters and URLs, standardizing the textual content and eliminating irrelevant external references. This step also included stretching out shortened words to their full form to keep the language consistent and reducing repeated letters or symbols in words. Further cleaning steps included the systematic removal of numerical digits, punctuation, and single-character responses, enhancing the relevance and readability of the data. \\

		Text normalization was a key aspect, with conversions to lowercase and the elimination of excess whitespace using regular expressions, thereby ensuring uniformity across the dataset. Stop words, barring the term "not," were strategically filtered out to focus on meaningful content, crucial for sentiment analysis. Additionally, the process involved spell checking against an English dictionary, reinforcing the textual accuracy and reliability. The cleaning process concludes with the generation of a word cloud, visually representing the most prevalent words in the refined reviews. This step provided a brief overview of dominant themes and insights, forming a foundation for subsequent in-depth textual analysis.

		\section{Methodology}

		The sentiment analysis of the 'gamereviews' dataset involved a multi-faceted approach to understand user sentiments and emotions conveyed in the reviews. Initially, the dataset was streamlined to focus on essential columns relevant for sentiment analysis. A key part of the process was the implementation of a manual sentiment analysis technique. This involved classifying each review as positive, negative, or neutral based on the occurrence of specific positive and negative words, and accounting for negation. The analysis also incorporated a check for consistency in sentiment using an external list of positive and negative words from Kaggle by Hu and Bing Liu. \\

		To complement the manual approach, automated sentiment analysis methods were employed using various sentiment dictionaries. These methods allowed for a broader and more nuanced analysis, capturing different dimensions of sentiment expressed in the reviews. The results from both manual and automated methods were then compared to assess the consistency of sentiment scoring across different techniques. The sentiment analysis of the 'gamereviews' dataset was enhanced by incorporating two additional dictionaries, AFINN and NRC, broadening the scope of analysis. AFINN provided a direct sentiment score, while NRC offered a detailed view of various emotions. This combination allowed for a richer, more layered understanding of user sentiments and emotions. A key component of the analysis was the aggregation of emotion scores from the NRC dictionary. These scores were systematically compiled into a data frame, facilitating an organized assessment of the most to least prevalent emotions in user reviews. This step was critical for identifying dominant emotional trends and patterns.\\

		The analysis concluded with a graphical depiction of the emotions using a bar plot, which employed a specially chosen color palette for enhanced clarity. This chart effectively illustrated the range of emotions present in the reviews. Moreover, the emotional weightages were quantified as percentages, offering a clear view of the predominant and minimal emotions. This provided valuable insights into the overall emotional landscape of the user reviews.\\

		\section{Sentiment Analysis}

		After conducting manual sentiment analysis on the user reviews from the "rev.sentiment" dataset, the results reveal an interesting distribution of sentiments. Among the total reviews analysed, 4481 were classified as positive, 1905 as negative, and 3232 as neutral. These findings provide valuable insights into the overall sentiment trends within the dataset. The higher count of positive reviews suggests a generally favourable sentiment among users, while the presence of negative and neutral reviews indicates a diverse range of opinions. \\

		The comparison between manual sentiment analysis, conducted using a Kaggle-sourced dictionary, and the automated analysis performed with the Bing dictionary from Syuzhet package reveals a significant alignment in their outcomes. With 74.02\% of the results matching, there is a substantial level of agreement in how both methods classify sentiments in user reviews. The robustness of the automated approach in capturing sentiments, as validated by the high identical score, enhances confidence in the reliability of the automated sentiment analysis results. It also signifies that the syuzhet package, specifically its implementation with the Bing dictionary, can effectively replicate the sentiment assessments made through manual analysis. However, the 25.98\% difference in results implies some discrepancies between the two approaches. The disparity observed in the sentiment scores obtained from the Syuzhet function using Bing dictionary and the manual sentiment analysis reveals nuances in the scoring mechanisms employed by the two methods. The Syuzhet package offers a more varied spectrum of sentiment scores, with positive reviews receiving a range of values such as 3, 7, or 17, and negative reviews being assigned scores like -4, -7, or -16, depending on their intensity. Neutral reviews are consistently scored as 0. This contrasts with the manual sentiment analysis approach, which applies a more straightforward scoring system: -1 for negative reviews, 0 for neutral ones, and 1 for positive reviews.\\

		This discrepancy in scoring mechanisms highlights the inherent differences in the underlying algorithms or dictionaries used by the Syuzhet package and the manual analysis approach. The broader range of scores in the Syuzhet function may reflect a more granular evaluation of sentiment, capturing subtle variations in the intensity of positive or negative expressions. On the other hand, the manual sentiment analysis, with its simplified scoring, provides a more categorical classification of sentiments.\\

		% Including a graphic
		\begin{figure}[!htbp] % You can force the position of the graphic using parameters in brackets: current setting is ``exactly here''
		\caption{Graphic Title}
		\vspace{5mm}
		\label{fig:firstGraphic} % set label to reference the graphic
		\centering
		\includegraphics[scale=0.6]{Autocorrelation.eps} % width defines the width of the graphic
		\begin{minipage}{\textwidth}
		\vspace*{3pt}
		\footnotesize{The figure should be described in a way that it is possible to understand it without reading the main body of your text first.}
		\end{minipage}
		\end{figure}

		% Include another graphic
		\begin{figure} % or you let tex determine where it fits best
		\caption{Graphic Title}
		\vspace{5mm}
		\label{fig:secondGraphic} \centering
		\centering
		\includegraphics[width=0.9\linewidth]{condvgarch1.eps}
		\begin{minipage}{\textwidth}
		\vspace*{3pt}
		\footnotesize{The figure should be described in a way that it is possible to understand it
		without reading the main body of your text first.}
		\end{minipage}
		\end{figure}

		The tilde between
		\glqq \verb\|Figure\|\grqq{} and \glqq \verb\|\ref{fig:secondGraphic}\|\grqq{} prevents the number to be placed at the beginning of the next line in case of a line break. Analogously we can deal with Table~\ref{tab:table}.

		In the following you may find some ideas about how to efficiently use the math environment:

		\begin{align}
		\lim_{x \to \infty} \exp(-x) &= 0\\
		\frac{n!}{k!(n-k)!} &= \binom{n}{k}\\
		\sqrt[n]{1+x+x^2+x^3+\dots+x^n}&=\text{$n^{th}$ root}\\
		( \big( \Big( \bigg( \Bigg( \sum_{i=1}^{10} t_i &\ne \int_0^\infty \mathrm{e}^{-x}\,\mathrm{d}x \Bigg)
		\bigg) \Big) \big) ) \\
		\Rightarrow A_{m,n} &=
		\begin{pmatrix}
		a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\
		a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\
		\vdots & \vdots & \ddots & \vdots \\
		a_{m,1} & a_{m,2} & \cdots & a_{m,n}
		\end{pmatrix}
		\end{align}


		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
		%
		% (7.) Required programs
		%
		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

		\section{Required programs}

		Windows:
		\begin{itemize}
		\item Miktex (\url{http://miktex.org/})
		\item an editor, according to taste e.g. WinEdt (\url{http://www.winedt.com/}; fee-based student version)
		or other freeware, e.g. TeXnicCenter (\url{www.texniccenter.org/})
		\item ghostview and ghostscript (\url{http://pages.cs.wisc.edu/~ghost/}
		\end{itemize}
		Linux:
		\begin{itemize}
		\item Latex is mostly available in all distributions, e.g. tetex in Suse (in case it is not, install it via yast)
		\item as an editor we recommend Kile
		\end{itemize}
		for bibliography management you may use:\\
		for example JabRef (\url{http://jabref.sourceforge.net/})

		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
		%
		% (8.) Presentations
		%
		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

		\section{Presentations}

		You find examples and templates for document class 'beamer':

		\url{http://www.informatik.uni-freiburg.de/~frank/latex-kurs/latex-kurs-3/Latex-Kurs-3.html}

		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
		%
		% (9.) End of main body
		%
		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


		\newpage

		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
		%
		% (10.) Bibliography
		%
		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


		\addcontentsline{toc}{section}{References} % Adds references to table of contents
		\bibliography{bibexample} % Creates a bibliography at the end of the LaTeX document. The bib-file loaded here is what you would have created with jabref. You can also give an absolute path if the bib-file is not in the folder where your tex-file lies. Same holds true for anything that is loaded by the way (figures, for example)

		\newpage

		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
		%
		% (11.) Appendix
		%
		%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

		\section{Appendix}

		This could be the appendix if you really need one.





		\newpage

		\blindtext[5]
		\begin{table}[!htbp]
		\centering
		\begin{threeparttable}
		\caption{Small Sample Table}
		\label{tab:table}
		\begin{tabular}{lc\|r}
		\toprule
		A very & small sample & table\\
		\hline
		first colum left & second column centered & third column right \\
		& underlined second column & \\
		\cline{2-2}
		\multicolumn{2}{c\|}{Write across two columns} & Third column \\
		\bottomrule
		\end{tabular}
		\begin{tablenotes}
		\item \footnotesize{Table \ref{tab:table} should be described in a way that it is possible to understand it without reading the main body of your text first.}
		\end{tablenotes}
		\end{threeparttable}
		\end{table}

		\end{document}