Keyword Transformer: A Self-Attention Model for Keyword Spotting

Published in Interspeech 2021, 2021

Recommended citation: Berg, A., O’Connor, M., Cruz, M.T. (2021) Keyword Transformer: A Self-Attention Model for Keyword Spotting. Proc. Interspeech 2021, 4249-4253, doi: 10.21437/Interspeech.2021-1286 https://www.isca-speech.org/archive/pdfs/interspeech_2021/berg21_interspeech.pdf

In this paper, we apply the popular Transformer architecture to keyword spotting, where the task is to classify short audio snippets into different categories. By partitioning the audio spectrogram into different time windows and applying self-attention, we show that the Keyword Transformer outperforms other network architectures while maintaining a low latency at inference time.

Code: https://github.com/ARM-software/keyword-transformer

Blog post: Fast and accurate keyword spotting using Transformers

Video presentation available on Youtube here