iNNterpret - Representation of speech, articulatory dynamics, prosody and language in layers

What models know?

innterpret

Special Session, S2 at ICON 2021: 18th International Conference on Natural Language Processing

16th-19th December, 2021

Motivation

Neural architectures have accelerated the growth of speech applications in a wide variety of fields not just restricted to traditional automatic speech recognition (ASR) and text-to-speech (TTS) synthesis - these include, deducing and increasing the interpretability of neural networks, articulatory-acoustic-mapping, and neural language models, and extensive and precise modeling of speech prosody. This session will be dedicated to provide talks from a wide-variety of such perspectives to bring together advances that highlight how computational and linguistic approaches converge in both providing technological and scholarly solutions to classical problems in speech – acoustics, articulation, prosody, and technology.

The non-linear dynamics between speech articulation and acoustics is both elegance and a source of confounds for building viable applications and knowledge representations. We will begin this session with an overview of how neural architectures have been used for learning phonological patterns, following that we will look into how speech prosody and phonation can help build representations that are holistic. We will also investigate how the non-linearity between articulation and acoustics can be examined to offer insights into the acoustic-articulatory mapping and how this mapping can be used for prediction in several domains. Our special invited lecture will shed light on how Generative Adversarial Networks are used in forming generalizations akin to phonological representations.

Speakers

Organizer

Indranil Dutta, School of Languages and Linguistics, Jadavpur University

Program

19th December, 2021. Times are in Indian Standard Time, GMT +5:30

Title	Time	Presenter
Probing Phonological Encoding with Small RNNs	9:00-9:25 am	Fred Mailhot
Representations for multiple dependencies in prosodic structures	9:30-9:55 am	Kristine Yu
Invariant and variant characteristics in speech articulation	10:00-10:25 am	Prasanta Kumar Ghosh
Automated Speech Processing of Bengali using SPPAS software	10:30-10:55 am	Brigitte Bigi, Shakuntala Mahanta, Moumita Pakrashi
Interpreting internal representations of deep convolutional neural networks trained on raw speech - Special invited lecture	11:00-11:40 am	Gašper Beguš
Panel discussion	11:50 am-12:30 pm	Q&A with all panelists