Multimodal speech synthesis architecture for unsupervised speaker adaptation

Speech samples for the paper "Multimodal speech synthesis architecture for unsupervised speaker adaptation" presented at Interspeech 2018, Hyderabad, India.

More information, early access to preprint papers of related works can be found at my homepage.

Supervised approach adapt the model to target speakers by using both speech and transcription while the unsupervised only using speech data.

1st sample

	Supervised		Unsupervised
	10	320	10	320
Natural	► Play
Vanilla	► Play	► Play
Step-by-step	► Play	► Play	► Play	► Play
Joint-Goals	► Play	► Play	► Play	► Play
Tied-Layers	► Play	► Play	► Play	► Play
JG+TL	► Play	► Play	► Play	► Play

2nd sample

	Supervised		Unsupervised
	10	320	10	320
Natural	► Play
Vanilla	► Play	► Play
Step-by-step	► Play	► Play	► Play	► Play
Joint-Goals	► Play	► Play	► Play	► Play
Tied-Layers	► Play	► Play	► Play	► Play
JG+TL	► Play	► Play	► Play	► Play

3rd sample

	Supervised		Unsupervised
	10	320	10	320
Natural	► Play
Vanilla	► Play	► Play
Step-by-step	► Play	► Play	► Play	► Play
Joint-Goals	► Play	► Play	► Play	► Play
Tied-Layers	► Play	► Play	► Play	► Play
JG+TL	► Play	► Play	► Play	► Play

4th sample

	Supervised		Unsupervised
	10	320	10	320
Natural	► Play
Vanilla	► Play	► Play
Step-by-step	► Play	► Play	► Play	► Play
Joint-Goals	► Play	► Play	► Play	► Play
Tied-Layers	► Play	► Play	► Play	► Play
JG+TL	► Play	► Play	► Play	► Play