Overview
We propose Hierarchical Expressive Vector (HE-Vector), a two-stage method for Emotional Dialectal TTS. In the first stage, we construct different task vectors to model dialectal and emotional styles independently, and then enhance single-style synthesis by adjusting their weights, a method we refer to as Expressive Vector (E-Vector). For the second stage, we hierarchically integrate these vectors to achieve controllable emotionally expressive dialect synthesis without requiring jointly labeled data, corresponding to Hierarchical Expressive Vector (HE-Vector).
(a) Construction of the E-Vector and enhancement of F5-TTS
(b) Fully merging strategy for dialect and emotion E-Vectors
(c) Hierarchically merging strategy for dialect and emotion E-Vectors
Dialect Synthesis
Synthesizing dialectal speech from a Mandarin prompt.
| Target Text | Target Speaker | Target Dialect | CosyVoice2 | FT | FT-last | Enhanced | LoRA Enhanced |
|---|---|---|---|---|---|---|---|
| 所以啊这个豆子就生不出来哈所以后头他们打兵的时候他就就那个就非常的非常的 |
于是军士归心,因此这雄主二字可谓是搔到了他的痒处。
|
四川话 | |||||
| 当时真害怕风雨过来揭掉屋顶铁皮 |
我说你这只大鸟,真是不讲理,我对你做什么了呀,你就要吞了我!
|
陕西话 | |||||
| 你唔讲系人都知你系神经质儿童嚟噶啦 |
前段时间我面试了六个年轻人,我是倒吸了一口凉气。
|
广东话 | |||||
| 给俺设置上下管加风扇一百二十度六小时 |
于是军士归心,因此这雄主二字可谓是搔到了他的痒处。
|
山东话 | |||||
| 不用拍马屁帮我挑时间 |
主播说联播,今天我来说
|
郑州话 | |||||
| 读书的辰光学堂里从小就教英文所以讲也会的讲英文 |
人生就像一场马拉松比赛,重要的是坚持不懈地向前跑,而不仅仅是关注眼前的一小段路程。
|
上海话 | |||||
| 关于此事的报道成了人们生活中议论的焦点 |
前段时间我面试了六个年轻人,我是倒吸了一口凉气。
|
天津话 | |||||
| 被黑滴童鞋们要团结起来 |
于是军士归心,因此这雄主二字可谓是搔到了他的痒处。
|
长沙话 |
Controllable Degree of Emotional Speech Synthesis
Synthesizing speech from a neual prompt together with the target emotion label.
By adjusting the enhancement coefficient β, we can control the degree of emotional expression in the synthesized speech. The following examples illustrate the effect of varying β from 0.0 to 2.5 for target emotions:
| Target Text | Target Speaker | Target Emotion | β = 0.0 | β = 0.5 | β = 1.0 | β = 1.5 | β = 2.0 | β = 2.5 |
|---|---|---|---|---|---|---|---|---|
| 我也想去看可爱的熊猫。 |
我们乘船漂游了三峡,真是刺激。
|
happy | ||||||
| 跳舞好难呀,我还正在练基本功。 |
我老家在北京。
|
sad | ||||||
| 我们俩合不来,还经常吵架,我拿她真没办法。 |
前几天我碰见了一件有趣的事儿。
|
angry | ||||||
| 真想不到,游泳竟有如此多的好处,我下周还想来。 |
除非你打飞碟球,但这是不可能的。
|
surprise |
Emotionally Expressive Dialectal Speech Synthesis
Note: In real-world scenarios, when the same speaker expresses different emotions, the perceived timbre often changes as well, as illustrated in the following examples:
| Example | Text | Neutral | Happy | Sad | Angry | Suriprise |
|---|---|---|---|---|---|---|
| Male | 英国的哲学家曾经说过 | |||||
| Female | 不管怎么说,主队好像志在夺魁。 |
Synthesizing speech from a Mandarin prompt together with the target emotion and dialect labels.
| Target Text | Target Speaker | Target Labels | CosyVoice2 | Two-stage | Direct Merge | LoRA Merge(lora rank = 8) | LoRA Merge(lora rank = 64) |
|---|---|---|---|---|---|---|---|
| 别吓我啊怎么解决 |
我老猪本是上界的天蓬元帅,不想下界之后错投了猪胎。
|
河南话+happy | |||||
| 抑或产业群聚集度高导致的成本低 |
哈尔滨亚冬会中国体育代表团正式成立了
|
天津话+sad | |||||
| 早上一只苹果和桃子我和豆豆一人一半 |
主播说联播,今天我来说。
|
上海话+angry | |||||
| 当时真害怕风雨过来揭掉屋顶铁皮 |
我说你这只大鸟,真是不讲理,我对你做什么了呀,你就要吞了我!
|
陕西话+surprise |