Object & Scene Recognition
Module Description
Object and scene recognition detects and labels various objects and scenes, from general to more specific ones. With this module you can immediately summarize the content of pictures or videos. It can be used to conveniently and reliably categorize and archive visual data with more than 1,500 object classes.
Module ID: object_scene_recognition
Module Parameters
Name | Type | Default | Description |
---|---|---|---|
model | string | general-c | The name or the ID of the model to use (See Model Resource). |
min_confidence | number | 70 | Only return predictions with at least higher confidence than this threshold (range from 0 to 100). |
language | number | 0 | Language of the predicted labels (0 = English, 1 = German). |
dictionaries | list of Dictionary Specification objects | null | A list of dictionaries with custom labels (See Dictionary Resource). Only applied if model zero-shot is selected. |
words | list of strings | null | A list of custom labels as alternative to a dictionary. Only applied if model zero-shot is selected. |
include_preset_labels | boolean | true | Include the general labels from the pre-trained model in the result. |
dictionaries | list of Dictionary Specification objects | [ ] | List of dictionaries to detect your own keywords/entities in the resulting transcription. |
enable_captioning | boolean | false | Enable the prediction of scene captions (description of the scene) included in the summarized results. |
Pre-trained models
Name | Description |
---|---|
zero-shot | A large set of pre-trained labels with powerful zero-shot generalization capabilities which is customizable with your own dictionary of labels |
general-c | Various objects and scenes, from general to more specific ones |
general-a and general-b are subsets. general-c is recommended.
Example
Send the following JSON as request body via POST to the /jobs/
endpoint:
{
"sources": [
"{url-to-your-image}"
],
"modules": {
"object_scene_recognition": {
"model": "general-c"
}
}
}
When requesting the job via GET on the /jobs/{JOB_ID}/
endpoint, the response looks like this:
{
"id": "878e6e61-6fa5-4cac-8d1e-dd4066d902df",
"tag": "",
"state": "completed",
"errors": [],
"progress": 1,
"duration": 49.409,
"time_created": "2021-05-20 09:43:14.525000",
"time_started": "2021-05-20 09:43:14.615000",
"time_completed": "2021-05-20 09:44:04.024000",
"sources": [
"storage://WQM2S9L9O0tDYacfNQN7"
],
"modules": {
"object_scene_recognition": {
"model": "general-c",
"state": "completed",
"progress": 1
}
},
"media_type": "video",
"result": {
"detailed_link": "http://api.deepva.com/api/v1/jobs/878e6e61-6fa5-4cac-8d1e-dd4066d902df/detailed-results",
"summary": [
{
"source": "storage://WQM2S9L9O0tDYacfNQN7",
"media_type": "video",
"info": {
"fps": 25.0,
"resolution": [
960,
540
],
"total_frames": 3636,
"duration": 145.44
},
"items": [
{
"type": "object",
"label": "Person",
"module": "object_scene_recognition"
},
{
"type": "object",
"label": "Cow",
"module": "object_scene_recognition"
}
]
}
]
}
}
To get detailed information about the predicted label (for example the time code or the confidence of the predicted label) you can request the /jobs/{JOB_ID}/detailed-results/
endpoint, the response looks like this:
{
"total": 60,
"offset": 0,
"limit": 10,
"next": "http://api.deepva.com/api/v1/jobs/878e6e61-6fa5-4cac-8d1e-dd4066d902df/detailed-results/?limit=10&offset=10",
"prev": "http://api.deepva.com/api/v1/jobs/878e6e61-6fa5-4cac-8d1e-dd4066d902df/detailed-results/?limit=10&offset=0",
"data": [
{
"id": "12b38867-7d4d-4c30-8222-2e55e3ca4e68",
"media_type": "video",
"frame_start": 1032,
"frame_end": 1283,
"source": "storage://WQM2S9L9O0tDYacfNQN7",
"module": "object_scene_recognition",
"meta": {
"label": "Person",
"mean_confidence": 1.0,
"parents": []
},
"thumbnail": null,
"detections": [],
"time_start": 41.28,
"time_end": 51.32,
"tc_start": "00:00:41:07",
"tc_end": "00:00:51:08"
},
{
"id": "51b3d4d5-e178-4b08-bf9f-290b94821b21",
"media_type": "video",
"frame_start": 1284,
"frame_end": 1379,
"source": "storage://WQM2S9L9O0tDYacfNQN7",
"module": "object_scene_recognition",
"meta": {
"label": "Cow",
"mean_confidence": 0.9986,
"parents": [
{
"label": "Cattle",
"parents": [
{
"label": "Mammal",
"parents": [
{
"label": "Animal",
"parents": []
}
]
}
]
}
]
},
"thumbnail": null,
"detections": [],
"time_start": 51.36,
"time_end": 55.16,
"tc_start": "00:00:51:08",
"tc_end": "00:00:55:03"
}
]
}