Skip to content

Speaker Identification

Module Description

Speaker Identification detects and identifies the speakers in audio and video files (also known as Speaker Diarization).

Module ID: speaker_identification

Module Parameters

Name Type Default Description
model string celebrities The name or the ID of the model to use (See Model Resource).
min_similarity number 0.75 Thresholding, how similar (in terms of cosine similarity) the recognized identity is to the closest sample in the training data (range from 0.0 to 1.0).
cluster_unknowns boolean false Whether to give individual results for unknown identities or not (Unknown #1, Unknown #2, ..).
index_unknowns boolean false Whether to index unknown identities with a unique ID or not (The index is accessible on the left menu bar). This parameter overrides cluster_unknowns since clustering of unknown identities is required for indexing.
segment_merge_threshold number 1.0 Segments will be merged if the gap between voice segments is lower than this threshold in seconds (range from 0.0 to 10.0).

Pre-trained models

Name Description
celebrities Several personalities, including the world's most famous people and a vast majority of German politicians and athletes

Example

Send the following JSON as request body via POST to the /jobs/ endpoint:

{
  "sources": [
    "{url-to-your-image}"
  ],
  "modules": {
    "speaker_identification": {
      "model": "celebrities"
    }
  }
}

When requesting the job via GET on the /jobs/{JOB_ID}/ endpoint, the response looks like this:

coming soon