98 - Analyzing Information Flow In Transformers, With Elena Voita - a podcast by Allen Institute for Artificial Intelligence
from 2019-12-09T17:51:08
What function do the different attention heads serve in multi-headed attention models? In this episode, Lena describes how to use attribution methods to assess the importance and contribution of different heads in several tasks, and describes a gating mechanism to prune the number of effective heads used when combined with an auxiliary loss. Then, we discuss Lena’s work on studying the evolution of representations of individual tokens in transformers model.
Lena’s homepage:
https://lena-voita.github.io/
Blog posts:
https://lena-voita.github.io/posts/acl19_heads.html
https://lena-voita.github.io/posts/emnlp19_evolution.html
Papers:
https://arxiv.org/abs/1905.09418
https://arxiv.org/abs/1909.01380
Further episodes of NLP Highlights
Further podcasts by Allen Institute for Artificial Intelligence
Website of Allen Institute for Artificial Intelligence