Keyword Transformer: A Self-Attention Model for Keyword Spotting
Published in Interspeech 2021, 2021
Recommended citation: Berg, A., O’Connor, M., Cruz, M.T. (2021) Keyword Transformer: A Self-Attention Model for Keyword Spotting. Proc. Interspeech 2021, 4249-4253, doi: 10.21437/Interspeech.2021-1286 https://www.isca-speech.org/archive/pdfs/interspeech_2021/berg21_interspeech.pdf
In this paper, we apply the popular Transformer architecture to keyword spotting, where the task is to classify short audio snippets into different categories. By partitioning the audio spectrogram into different time windows and applying self-attention, we show that the Keyword Transformer outperforms other network architectures while maintaining a low latency at inference time.
Code: https://github.com/ARM-software/keyword-transformer
Blog post: Fast and accurate keyword spotting using Transformers
Video presentation available on Youtube here