Speaker Dataset Creation
Module Description

Speaker Dataset Creation reads the names displayed on screen text-inserts and associates them to the corresponding person. Create an individual and unique dataset of your own repository of personalities for audio-based datasets.
Module ID: speaker_dataset_creation
Module Parameters
| Name | Type | Default | Description |
|---|---|---|---|
| dataset_id | string | null | The dataset ID to store the extracted voices (See Dataset Resource). |
| detect_single_name | boolean | false | Speaker Dataset Creation is optimized on detecting full names. Enable this to detect single names as well. |
| apply_generic | boolean | true | Use the generic method to find names |
| name_dictionary_list | List of strings | null | You can provide a custom names dictionary to detect your custom names. Only names from this list will be detected. Example: ["Angela Merkel", "Markus Söder"] |
| min_face_size | integer | 112 | Minimum size of the smallest side of the face in pixels |
| sharpness_threshold | integer | 40 | Minimum quality of the face (lower number is more blurry) |
| offset_start | number | 0 | Additional audio extracted as a voice sample from the beginning of the detection in seconds. |
| offset_end | number | 0 | Additional audio extracted as a voice sample from the ending of the detection in seconds. |
| segment_merge_threshold | number | 1 | Segments will be merged if the gap between voice segments is lower than this threshold in seconds. |
Example
Send the following JSON as request body via POST to the /jobs/ endpoint:
{
"sources": [
"{url-to-your-image}"
],
"modules": {
"speaker_dataset_creation": {
"dataset_id": "{ID-of-your-dataset}"
}
}
}