Upload New File (d3d958c0) · Commits · Erik Senn / llm_class_public

notebooks/5_optional_bert_classification_pipeline_huggingface.ipynb

0 → 100644

+76 −0

Original line number	Diff line number	Diff line
		%% Cell type:markdown id: tags:

		# Setup and data

		%% Cell type:code id: tags:

		``` python
		from transformers import BertTokenizer, BertForSequenceClassification
		from transformers import pipeline
		```

		%% Output

		c:\Users\ESenn\Miniconda3\envs\llm_class\Lib\site-packages\tqdm\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
		from .autonotebook import tqdm as notebook_tqdm

		%% Cell type:code id: tags:

		``` python
		# load your data here
		```

		%% Cell type:markdown id: tags:

		# Transformers Library for Sentiment Classification using BERT

		%% Cell type:markdown id: tags:

		Task: Review classification

		Model: A pretrained BERT model including a sentiment classification head (which is not trained on our task).

		We use the hidden states of BERT for Sentiment Classification.
		The classifier is not trained. See notebooks for chapter 2 for training, and chapter 5 for a step-by-stey description of how to train it.

		%% Cell type:code id: tags:

		``` python
		# Load pre-trained BERT tokenizer and model for sentiment classification
		tokenizer = BertTokenizer.from_pretrained(
		"nlptown/bert-base-multilingual-uncased-sentiment"
		)
		model = BertForSequenceClassification.from_pretrained(
		"nlptown/bert-base-multilingual-uncased-sentiment"
		)

		# Load the classification pipeline
		classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

		# Example texts for classification
		texts = [
		"This movie was fantastic! I loved it.",
		"The food was terrible, I will never come back.",
		"The service was just okay, nothing special.",
		]

		# Perform classification
		results = classifier(texts)

		# Display the results
		for text, result in zip(texts, results):
		print(f"Text: {text}")
		print(f"Sentiment: {result['label']}")
		```

		%% Output

		c:\Users\ESenn\Miniconda3\envs\llm_class\Lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
		warnings.warn(

		Text: This movie was fantastic! I loved it.
		Sentiment: 5 stars
		Text: The food was terrible, I will never come back.
		Sentiment: 1 star
		Text: The service was just okay, nothing special.
		Sentiment: 3 stars