Skip to content

Object & Scene Recognition

Module Description

Object and Scene Recognition

Object and scene recognition detects and labels various objects and scenes, from general to more specific ones. With this module you can immediately summarize the content of pictures or videos. It can be used to conveniently and reliably categorize and archive visual data with more than 1,500 object classes.

Module ID: object_scene_recognition

Module Parameters

Name Type Default Description
model string general-c The name or the ID of the model to use (See Model Resource).
min_confidence number 70 Only return predictions with at least higher confidence than this threshold (range from 0 to 100).
language number 0 Language of the predicted labels (0 = English, 1 = German).
dictionaries list of Dictionary Specification objects null A list of dictionaries with custom labels (See Dictionary Resource). Only applied if model zero-shot is selected.
words list of strings null A list of custom labels as alternative to a dictionary. Only applied if model zero-shot is selected.
include_preset_labels boolean true Include the general labels from the pre-trained model in the result.
dictionaries list of Dictionary Specification objects [ ] List of dictionaries to detect your own keywords/entities in the resulting transcription.
enable_captioning boolean false Enable the prediction of scene captions (description of the scene) included in the summarized results.

Pre-trained models

Name Description
zero-shot A large set of pre-trained labels with powerful zero-shot generalization capabilities which is customizable with your own dictionary of labels
general-c Various objects and scenes, from general to more specific ones

general-a and general-b are subsets. general-c is recommended.

Example

Send the following JSON as request body via POST to the /jobs/ endpoint:

{
  "sources": [
    "{url-to-your-image}"
  ],
  "modules": {
    "object_scene_recognition": {
      "model": "general-c"
    }
  }
}

When requesting the job via GET on the /jobs/{JOB_ID}/ endpoint, the response looks like this:

{
    "id": "878e6e61-6fa5-4cac-8d1e-dd4066d902df",
    "tag": "",
    "state": "completed",
    "errors": [],
    "progress": 1,
    "duration": 49.409,
    "time_created": "2021-05-20 09:43:14.525000",
    "time_started": "2021-05-20 09:43:14.615000",
    "time_completed": "2021-05-20 09:44:04.024000",
    "sources": [
        "storage://WQM2S9L9O0tDYacfNQN7"
    ],
    "modules": {
        "object_scene_recognition": {
            "model": "general-c",
            "state": "completed",
            "progress": 1
        }
    },
    "media_type": "video",
    "result": {
        "detailed_link": "http://api.deepva.com/api/v1/jobs/878e6e61-6fa5-4cac-8d1e-dd4066d902df/detailed-results",
        "summary": [
            {
                "source": "storage://WQM2S9L9O0tDYacfNQN7",
                "media_type": "video",
                "info": {
                    "fps": 25.0,
                    "resolution": [
                        960,
                        540
                    ],
                    "total_frames": 3636,
                    "duration": 145.44
                },
                "items": [
                    {
                        "type": "object",
                        "label": "Person",
                        "module": "object_scene_recognition"
                    },
                    {
                        "type": "object",
                        "label": "Cow",
                        "module": "object_scene_recognition"
                    }
                ]
            }
        ]
    }
}

To get detailed information about the predicted label (for example the time code or the confidence of the predicted label) you can request the /jobs/{JOB_ID}/detailed-results/ endpoint, the response looks like this:

{
    "total": 60,
    "offset": 0,
    "limit": 10,
    "next": "http://api.deepva.com/api/v1/jobs/878e6e61-6fa5-4cac-8d1e-dd4066d902df/detailed-results/?limit=10&offset=10",
    "prev": "http://api.deepva.com/api/v1/jobs/878e6e61-6fa5-4cac-8d1e-dd4066d902df/detailed-results/?limit=10&offset=0",
    "data": [
        {
            "id": "12b38867-7d4d-4c30-8222-2e55e3ca4e68",
            "media_type": "video",
            "frame_start": 1032,
            "frame_end": 1283,
            "source": "storage://WQM2S9L9O0tDYacfNQN7",
            "module": "object_scene_recognition",
            "meta": {
                "label": "Person",
                "mean_confidence": 1.0,
                "parents": []
            },
            "thumbnail": null,
            "detections": [],
            "time_start": 41.28,
            "time_end": 51.32,
            "tc_start": "00:00:41:07",
            "tc_end": "00:00:51:08"
        },
        {
            "id": "51b3d4d5-e178-4b08-bf9f-290b94821b21",
            "media_type": "video",
            "frame_start": 1284,
            "frame_end": 1379,
            "source": "storage://WQM2S9L9O0tDYacfNQN7",
            "module": "object_scene_recognition",
            "meta": {
                "label": "Cow",
                "mean_confidence": 0.9986,
                "parents": [
                    {
                        "label": "Cattle",
                        "parents": [
                            {
                                "label": "Mammal",
                                "parents": [
                                    {
                                        "label": "Animal",
                                        "parents": []
                                    }
                                ]
                            }
                        ]
                    }
                ]
            },
            "thumbnail": null,
            "detections": [],
            "time_start": 51.36,
            "time_end": 55.16,
            "tc_start": "00:00:51:08",
            "tc_end": "00:00:55:03"
        }
    ]
}