Multimodal speech synthesis architecture for unsupervised speaker adaptation

Speech samples for the paper "Multimodal speech synthesis architecture for unsupervised speaker adaptation" presented at Interspeech 2018, Hyderabad, India.

More information, early access to preprint papers of related works can be found at my homepage.

Supervised approach adapt the model to target speakers by using both speech and transcription while the unsupervised only using speech data.

1st sample

Supervised Unsupervised
10 320 10 320
Natural ► Play
Vanilla ► Play ► Play
Step-by-step ► Play ► Play ► Play ► Play
Joint-Goals ► Play ► Play ► Play ► Play
Tied-Layers ► Play ► Play ► Play ► Play
JG+TL ► Play ► Play ► Play ► Play
2nd sample

Supervised Unsupervised
10 320 10 320
Natural ► Play
Vanilla ► Play ► Play
Step-by-step ► Play ► Play ► Play ► Play
Joint-Goals ► Play ► Play ► Play ► Play
Tied-Layers ► Play ► Play ► Play ► Play
JG+TL ► Play ► Play ► Play ► Play

3rd sample

Supervised Unsupervised
10 320 10 320
Natural ► Play
Vanilla ► Play ► Play
Step-by-step ► Play ► Play ► Play ► Play
Joint-Goals ► Play ► Play ► Play ► Play
Tied-Layers ► Play ► Play ► Play ► Play
JG+TL ► Play ► Play ► Play ► Play

4th sample

Supervised Unsupervised
10 320 10 320
Natural ► Play
Vanilla ► Play ► Play
Step-by-step ► Play ► Play ► Play ► Play
Joint-Goals ► Play ► Play ► Play ► Play
Tied-Layers ► Play ► Play ► Play ► Play
JG+TL ► Play ► Play ► Play ► Play

asdasdas