1. Overview
1.1 Service Capabilities
Text-to-Speech (TTS) converts written text into natural, smooth, and highly human-like speech. With support for multiple languages and the ability to convey emotional nuances and natural pauses, the generated speech is expressive, lifelike, and engaging. This feature delivers a flexible and high-quality voice generation solution ideal for various use cases, from audiobook narration and video dubbing to commercial ad voiceovers. The output speech is not only clear and natural but also emotionally resonant, making it a powerful tool for conveying information and capturing audience attention.
1.2 Sample Prompts and Outputs
|
Text |
Output Speech |
|
Although the peak summer travel season has yet to arrive, airfares of flights from interior provinces to Xinjiang have already quietly surged. |
|
|
From June 15 to 17, leaders of the G7 countries, Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States, will gather in Kananaskis, Canada, for the 51st G7 Summit. |
2Prompt engine
N/A
3. API Requests
3.1 Request URL
https://open-api.wondershare.cc/v1/open/capacity/application/tm_text2speech
3.2 Request Parameters
Method: POST
Headers
|
Parameter Name |
Value |
Required |
Example |
Description |
|
Content-Type |
application/json |
Yes |
|
|
|
Authorization |
|
Yes |
Basic xxx |
Security verification information, in the format of Basic {access_token}, where access_token is a token, generated using the given app_key and app_crit, with the generation method being base64 (app_key: app_crit) |
|
X-App-Key |
|
Yes |
Assigned appkey |
Body
|
Parameter Name |
Type |
Required |
Default Value |
Description |
Other Info |
|
text |
string |
Yes |
|
Text content supports Chinese:<1024 tokens (tokens>Chinese characters, English words, punctuation, etc.) |
|
|
wsid |
integer |
Yes |
|
User WSID. |
|
|
drive |
string |
No |
|
If you use cloud storage for video/image output, this field is required in JSON format. Example: { "space_id": 11111, // Cloud storage space ID "file_dest_path": "/path/sss", // Cloud storage destination path (directory) "file_tag": [ // File tags { "key": "key1", "value": "value1" }, { "key": "key2", "value": "value2" } ] } |
|
|
emotion_choice |
string |
No |
|
Emotional tone. Valid values: Neutral (default), Happy, Sad, Surprise, and Angry. |
|
|
speaker_choice |
string |
No |
|
Voice template. The default is a female voice. The following 15 voice types are supported: ['GEN_ZH_F_001', 'GEN_ZH_F_002', 'GEN_ZH_F_003', 'GEN_ZH_F_004', 'GEN_ZH_F_005', 'GEN_ZH_F_006', 'GEN_ZH_F_007', 'GEN_ZH_M_001', 'GEN_ZH_M_002', 'GEN_ZH_M_003', 'GEN_ZH_M_004', 'GEN_ZH_M_005', 'GEN_ZH_M_006', 'CHAR_ZH_M_001', 'CHAR_ZH_M_002'] |
|
|
ref_audio |
string |
No |
|
Reference audio required for voice modeling. Recommended duration: 5s–10s (min 3s, max 15s). Format: WAV. |
|
|
loudness_adjustment |
integer |
Yes |
|
Adjusts the output volume. Default: -23 dB. Range: -60 dB to 0 dB. Recommended: -35 dB to -10 dB, gap=1. |
|
|
key_adjustment |
integer |
Yes |
|
Adjusts the pitch. Unit: semitones. Default: 0. Range: -12 to 12, gap=1. |
|
|
speed_adjustment |
number |
Yes |
|
Adjusts the playback speed. Default: 1.0. Range: 0.5x to 2.0x. |
|
|
file_type |
integer |
No |
|
0: OSS; 5: cloud storage. |
|
|
is_clone |
boolean |
No |
|
Whether to model voice. false (default): standard TTS; true: models voice. |
|
|
callback |
string |
No |
|
Callback URL. |
|
|
params |
string |
No |
|
回调透明参数 |
|
|
priority |
number |
No |
|
Task priority. |
|
|
lang_code |
string |
No |
|
Language code (currently supports only Chinese). Default: zh-CN. |
|
3.3 Response
|
Parameter Name |
Type |
Required |
Default Value |
Description |
Other Info |
|
code |
number |
Yes |
|
Error code. |
|
|
msg |
string |
Yes |
|
Error message. |
|
|
data |
object |
No |
|
|
|
|
├─ task_id |
string |
No |
|
Task ID. |
|