Speaker Dataset Creation

Module Description

Speaker Dataset Creation reads the names displayed on screen text-inserts and associates them to the corresponding person. Create an individual and unique dataset of your own repository of personalities for audio-based datasets.

Module ID: speaker_dataset_creation

Module Parameters

Name	Type	Default	Description
dataset_id	string	null	The dataset ID to store the extracted voices (See Dataset Resource).
detect_single_name	boolean	false	Speaker Dataset Creation is optimized on detecting full names. Enable this to detect single names as well.
apply_generic	boolean	true	Use the generic method to find names
name_dictionary_list	List of strings	null	You can provide a custom names dictionary to detect your custom names. Only names from this list will be detected. Example: `["Angela Merkel", "Markus Söder"]`
min_face_size	integer	112	Minimum size of the smallest side of the face in pixels
sharpness_threshold	integer	40	Minimum quality of the face (lower number is more blurry)
offset_start	number	0	Additional audio extracted as a voice sample from the beginning of the detection in seconds.
offset_end	number	0	Additional audio extracted as a voice sample from the ending of the detection in seconds.
segment_merge_threshold	number	1	Segments will be merged if the gap between voice segments is lower than this threshold in seconds.

Example

Send the following JSON as request body via POST to the /jobs/ endpoint:

{
  "sources": [
    "{url-to-your-image}"
  ],
  "modules": {
    "speaker_dataset_creation": {
      "dataset_id": "{ID-of-your-dataset}"
    }
  }
}