ToMoviee Audio Models - Text-to-Speech | Skydome Creation Engine

1. Overview

1.1 Service Capabilities

Text-to-Speech (TTS) converts written text into natural, smooth, and highly human-like speech. With support for multiple languages and the ability to convey emotional nuances and natural pauses, the generated speech is expressive, lifelike, and engaging. This feature delivers a flexible and high-quality voice generation solution ideal for various use cases, from audiobook narration and video dubbing to commercial ad voiceovers. The output speech is not only clear and natural but also emotionally resonant, making it a powerful tool for conveying information and capturing audience attention.

1.2 Sample Prompts and Outputs

Text	Output Speech
Although the peak summer travel season has yet to arrive, airfares of flights from interior provinces to Xinjiang have already quietly surged.
From June 15 to 17, leaders of the G7 countries, Canada, France, Germany, Italy, Japan, the United Kingdom, and the United States, will gather in Kananaskis, Canada, for the 51st G7 Summit.

2Prompt engine

N/A

3. API Requests

3.1 Request URL

https://open-api.wondershare.cc/v1/open/capacity/application/tm_text2speech

3.2 Request Parameters

Method： POST

Headers

Parameter Name	Value	Required	Example	Description
Content-Type	application/json	Yes
Authorization		Yes	Basic xxx	Security verification information, in the format of Basic {access_token}, where access_token is a token, generated using the given app_key and app_crit, with the generation method being base64 (app_key: app_crit)
X-App-Key		Yes		Assigned appkey

Body

Parameter Name	Type	Required	Default Value	Description	Other Info
text	string	Yes		Text content supports Chinese:<1024 tokens (tokens>Chinese characters, English words, punctuation, etc.)
wsid	integer	Yes		User WSID.
drive	string	No		If you use cloud storage for video/image output, this field is required in JSON format. Example: { "space_id": 11111, // Cloud storage space ID "file_dest_path": "/path/sss", // Cloud storage destination path (directory) "file_tag": [ // File tags { "key": "key1", "value": "value1" }, { "key": "key2", "value": "value2" } ] }
emotion_choice	string	No		Emotional tone. Valid values: Neutral (default), Happy, Sad, Surprise, and Angry.
speaker_choice	string	No		Voice template. The default is a female voice. The following 15 voice types are supported: ['GEN_ZH_F_001', 'GEN_ZH_F_002', 'GEN_ZH_F_003', 'GEN_ZH_F_004', 'GEN_ZH_F_005', 'GEN_ZH_F_006', 'GEN_ZH_F_007', 'GEN_ZH_M_001', 'GEN_ZH_M_002', 'GEN_ZH_M_003', 'GEN_ZH_M_004', 'GEN_ZH_M_005', 'GEN_ZH_M_006', 'CHAR_ZH_M_001', 'CHAR_ZH_M_002']
ref_audio	string	No		Reference audio required for voice modeling. Recommended duration: 5s–10s (min 3s, max 15s). Format: WAV.
loudness_adjustment	integer	Yes		Adjusts the output volume. Default: -23 dB. Range: -60 dB to 0 dB. Recommended: -35 dB to -10 dB, gap=1.
key_adjustment	integer	Yes		Adjusts the pitch. Unit: semitones. Default: 0. Range: -12 to 12, gap=1.
speed_adjustment	number	Yes		Adjusts the playback speed. Default: 1.0. Range: 0.5x to 2.0x.
file_type	integer	No		0: OSS; 5: cloud storage.
is_clone	boolean	No		Whether to model voice. false (default): standard TTS; true: models voice.
callback	string	No		Callback URL.
params	string	No		回调透明参数
priority	number	No		Task priority.
lang_code	string	No		Language code (currently supports only Chinese). Default: zh-CN.

3.3 Response

Parameter Name	Type	Required	Default Value	Description	Other Info
code	number	Yes		Error code.
msg	string	Yes		Error message.
data	object	No
├─ task_id	string	No		Task ID.