Speaker Dataset Creation
Module Description
Speaker Dataset Creation reads the names displayed on screen text-inserts and associates them to the corresponding person. Create an individual and unique dataset of your own repository of personalities for audio-based datasets.
Module ID: speaker_dataset_creation
Module Parameters
Name | Type | Default | Description |
---|---|---|---|
dataset_id | string | null | The dataset ID to store the extracted voices (See Dataset Resource). |
detect_single_name | boolean | false | Speaker Dataset Creation is optimized on detecting full names. Enable this to detect single names as well. |
apply_generic | boolean | true | Use the generic method to find names |
name_dictionary_list | List of strings | null | You can provide a custom names dictionary to detect your custom names. Only names from this list will be detected. Example: ["Angela Merkel", "Markus Söder"] |
min_face_size | integer | 112 | Minimum size of the smallest side of the face in pixels |
sharpness_threshold | integer | 40 | Minimum quality of the face (lower number is more blurry) |
offset_start | number | 0 | Additional audio extracted as a voice sample from the beginning of the detection in seconds. |
offset_end | number | 0 | Additional audio extracted as a voice sample from the ending of the detection in seconds. |
segment_merge_threshold | number | 1 | Segments will be merged if the gap between voice segments is lower than this threshold in seconds. |
Example
Send the following JSON as request body via POST to the /jobs/
endpoint:
{
"sources": [
"{url-to-your-image}"
],
"modules": {
"speaker_dataset_creation": {
"dataset_id": "{ID-of-your-dataset}"
}
}
}