The overall target objective is a physically validated voice simulation system, suited to modern massively parallel computer architectures, for performing state-of-the-art voice simulations; a system that can be operated remotely by other researchers, educators and clinicians across the Internet. The mechanical replicas built in the project will strengthen the validity of the simulations, and also have a high educational value as demonstrators.
Several novel methods will need to be developed and/or applied and combined in order to meet the main objective; and these methods are the main research objectives of Eunison. We could try to pin down success criteria for each of these methods, but to make these measureable and verifiable is not easy. If we knew beforehand what would be found, it would hardly be research. Also, such general criteria would probably not provide imperative incentives for the Work Package leaders and the Scientific Coordinator. There is also the risk that a proposed approach fails to meet its targets in time. Instead, we have chosen to define the intermediate objectives in terms of what the prototype simulation model should be able to do at the respective milestones. This means that the Technical Board (of WP leaders) has a strong incentive to continually review the progress, and to select technical steps that implement a demonstrable capacity by each milestone. This leads us to the following Milestones/Objectives.
Objective #1 (month 12): Static sounds
The Vocal Fold (VF) model should demonstrate (a) self-oscillating behaviour with incompressible fluid-structure interaction and a collision model; (b) consistency with the mechanical replica(s); and (c) configurable posturing, i.e. of subglottal pressure, prephonatory position, and vocal fold length and tension. The Vocal Tract (VT) model should demonstrate the ability to simulate acoustic propagation and resonance properties that are consistent with the static physical VT replicas. The VF output should be connectable to the VT input so that vowel sounds can be produced, though not necessarily in the completely unified way. A rudimentary control interface should be implemented.
Objective #2 (month 24): Changing sounds
Using a prototype gestural score interface (or equivalent), a combined VF+VT model should be able to produce diphthongs. The simulated material properties should now be selectable to match those of human tissues as well as those of the silicon/acrylic mechanical replicas; ideally, the differences will be minor. A FEM solver should be ready for the compressible case. In parallel, a less computationally expensive ALE-based approach has been used to implement variable VT acoustics. A scheme for controlling the gross geometry of the combined model using simulations of neural muscle activation signals is presented.
Objective #3 (month 32): Syllables and unification
The combined model as well as the corresponding physical replicas are now able to render selected fricative and plosive sounds, both voiced and unvoiced. The FEM codes are unified such that an acoustic waveform is represented in the same domain as the solid and fluid dynamics. A phonetic interface can control the basic rendering of syllables using IPA text strings as input. This interface can generate control output both directly for the geometrical interface and for the neural activation interface.
Objective #4 (month 36): Remoting and release
A research workshop has been held with a group of prospective end users, helping us to define the remote user interface. The combined numerical voice model can be driven at any of the three levels of control representation. While we do not expect the model to be able to produce all conceivable phonetic classes of sounds (e.g., probably not tremulants such as a rolled ‘r’), it is complete enough to speak and sing when appropriately controlled. There is a procedure by which external users can request Eunison simulations on the parallel computers at KTH, using their own control data (certain limitations must apply). There is a public website with numerous examples of pre-computed simulations with sound and video.