Publications

Keyword Transformer: A Self-Attention Model for Keyword Spotting

Published in Interspeech 2021, 2021

In this paper, we apply the popular Transformer architecture to keyword spotting, where the task is to classify short audio snippets into different categories. By partitioning the audio spectrogram into different time windows and applying self-attention, we show that the Keyword Transformer outperforms other network architectures while maintaining a low latency at inference time.

Recommended citation: Berg, A., O’Connor, M., Cruz, M.T. (2021) Keyword Transformer: A Self-Attention Model for Keyword Spotting. Proc. Interspeech 2021, 4249-4253, doi: 10.21437/Interspeech.2021-1286 https://www.isca-speech.org/archive/pdfs/interspeech_2021/berg21_interspeech.pdf