text="Does Donald J. Trump have a better golf handicap than Biden?"
print("Original text:",text)
tokens=tokenizer.tokenize(text)
print("Tokens:",tokens)
ids=tokenizer.encode(text)# Automatically adds special tokens at start and end
print("Token IDs:",ids)
reconstructured_text=tokenizer.decode(ids)
print("Reconstructed text:",reconstructured_text)
```
%% Cell type:markdown id: tags:
## Token Embeddings
%% Cell type:markdown id: tags:
### From scratch
%% Cell type:markdown id: tags:
Initalize the token embedding layer randomly using a torch embedding layer.
*Note*: This basically builds a standard matrix with random entries which can be used for the deep learning components later (e.g. can compute gradients). If you are curious, check out the optional notebook "optional_tensor_intro".
Experiment with the visualization of token embeddings of some words.
- Do similar words have more similar token embeddings?
- Look at the difference of the words "king"/ "queen" and "man"/ "woman". What do you notice?
- Test some tokens you are interested in. (Make sure they are part of the vocab)
%% Cell type:markdown id: tags:
#### Task*
Manually compute the input embeddings for a text document in two ways (without positional embeddings):
1) **By an index lookup**: Implement manually what the torch.embedding_layer does - for each token-id, look up the corresponding row in the token embedding matrix to construct the token embeddings.
2) **By matrix multiplication**: Transform each token-id to a one-hot-encoded vector of length $v$ (vocabulary) with many 0s and 1 only for corresponding token. Multiply the one-hot-encoding representation of the document (its a matrix!) with the embedding matrix.
Are the results of 1. and 2. equivalent? Which approach do you prefer?
%% Cell type:markdown id: tags:
## Positional embeddings
%% Cell type:markdown id: tags:
### From scratch
%% Cell type:markdown id: tags:
Randomly intialize a positional embedding layer:
%% Cell type:code id: tags:
``` python
max_length=4# max length of a text document (small for illustraction purpose)
We can now further investigate the influence of positional embeddings on the final input embedding of a token by plotting the input representation of a token for different positions:
For the words that switch position between two sentences, the final embedding is slightly different due to the positional embeddings.
This is how positional embeddings help to keep information about word order.