A downloadable project

We look for induction heads by feeding in a random sequence of tokens repeated twice and looking for heads that attend from a second copy of a token to the token just after the first copy. This is to test the generality of methods for detecting induction heads by looking at attention scores on a linear scale, which leaves open questions about whether the method itself is reproducible. We make observations about the mean attention score used to determine if an attention head is an induction head in SoLU-8l-old compared to GPT2-small. This is a reproducible experiment, linked here https://github.com/poppingtonic/transformer-visualization/blob/main/SOLU-8l-old-Observations.ipynb

Download

Download
Brian Muhia - Observing and validating Induction heads in SOLU-8l-old (1).pdf 206 kB

Leave a comment

Log in with itch.io to leave a comment.