Skip to content

Speaker Dataset Creation

Module Description

Speaker Dataset Creation

Speaker Dataset Creation reads the names displayed on screen text-inserts and associates them to the corresponding person. Create an individual and unique dataset of your own repository of personalities for audio-based datasets.

Module ID: speaker_dataset_creation

Module Parameters

Name Type Default Description
dataset_id string null The dataset ID to store the extracted voices (See Dataset Resource).
detect_single_name boolean false Speaker Dataset Creation is optimized on detecting full names. Enable this to detect single names as well.
apply_generic boolean true Use the generic method to find names
name_dictionary_list List of strings null You can provide a custom names dictionary to detect your custom names. Only names from this list will be detected. Example: ["Angela Merkel", "Markus Söder"]
min_face_size integer 112 Minimum size of the smallest side of the face in pixels
sharpness_threshold integer 40 Minimum quality of the face (lower number is more blurry)
offset_start number 0 Additional audio extracted as a voice sample from the beginning of the detection in seconds.
offset_end number 0 Additional audio extracted as a voice sample from the ending of the detection in seconds.
segment_merge_threshold number 1 Segments will be merged if the gap between voice segments is lower than this threshold in seconds.

Example

Send the following JSON as request body via POST to the /jobs/ endpoint:

{
  "sources": [
    "{url-to-your-image}"
  ],
  "modules": {
    "speaker_dataset_creation": {
      "dataset_id": "{ID-of-your-dataset}"
    }
  }
}