More powerful deep learning with transformers (Ep. 84) - a podcast by Francesco Gadaleta
from 2019-10-27T07:59:59
::
::
Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture.
Such architecture is built on top of another important concept already known to the community: self-attention.
In this episode I explain what these mechanisms are, how they work and why they are so powerful.
Don't forget to subscribe to our Newsletter or join the discussion on our Discord server
References
- Attention is all you need
https://arxiv.org/abs/1706.03762 - The illustrated transformer
https://jalammar.github.io/illustrated-transformer - Self-attention for generative models
http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf
Further episodes of Data Science at Home
Further podcasts by Francesco Gadaleta
Website of Francesco Gadaleta