Commit 0533914d authored by Erik Senn's avatar Erik Senn
Browse files

Upload New File

parent cfe8afe4
Loading
Loading
Loading
Loading

data/notes_on_data.txt

0 → 100644
+10 −0
Original line number Diff line number Diff line
Binary classification:
Subset of IMBD dataset of movie reviews (positive, negative): Download the full dataset here https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
Email spam detection (Spam or Ham): https://www.kaggle.com/datasets/venky73/spam-mails-dataset

Multi-class classification
Financial phrase bank (positive, neutral, negative): https://www.kaggle.com/datasets/ankurzing/sentiment-analysis-for-financial-news

Short story for educational pretraining
the-verdict.txt  (as found and used in https://github.com/rasbt/LLM-workshop-2024/tree/main/02_data)