Abstract
A model of gestural sequencing in speech is proposed that aspires to producing biologically plausible fluent and efficient movement in generating an utterance. We have previously proposed a modification of the well-known task dynamic implementation of articulatory phonology such that any given articulatory movement can be associated with a quantification of effort (Simko & Cummins, 2010). To this we add a quantitative cost that decreases as speech gestures become more precise, and hence intelligible, and a third cost component that places a premium on the duration of an utterance. Together, these three cost elements allow us to derive algorithmically optimal sequences of gestures and dynamical parameters for generating articulator movement. We show that the optimized movement displays many timing characteristics that are representative of real speech movement, capturing subtle details of relative timing between gestures. Optimal movement sequences also display invariances in timing that suggest syllable-level coordination for CV sequences. We explore the behavior of the model as prosodic context is manipulated in two dimensions: clarity of articulation and speech rate. Smooth, fluid, and efficient movements result