Subset of IMBD dataset of movie reviews (positive, negative): Download the full dataset here https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews
Email spam detection (Spam or Ham): https://www.kaggle.com/datasets/venky73/spam-mails-dataset
Multi-class classification
Financial phrase bank (positive, neutral, negative): https://www.kaggle.com/datasets/ankurzing/sentiment-analysis-for-financial-news
Short story for educational pretraining
the-verdict.txt (as found and used in https://github.com/rasbt/LLM-workshop-2024/tree/main/02_data)