Object & Scene Recognition

Module Description

Object and Scene Recognition

Object and scene recognition detects and labels various objects and scenes, from general to more specific ones. With this module you can immediately summarize the content of pictures or videos. It can be used to conveniently and reliably categorize and archive visual data with more than 1,500 object classes.

Module ID: object_scene_recognition

Module Parameters

Name	Type	Default	Description
model	string	general-c	The name or the ID of the model to use (See Model Resource).
min_confidence	number	70	Only return predictions with at least higher confidence than this threshold (range from 0 to 100).
language	number	0	Language of the predicted labels (0 = English, 1 = German).
dictionaries	list of Dictionary Specification objects	null	A list of dictionaries with custom labels (See Dictionary Resource). Only applied if model `zero-shot` is selected.
words	list of strings	null	A list of custom labels as alternative to a dictionary. Only applied if model `zero-shot` is selected.
include_preset_labels	boolean	true	Include the general labels from the pre-trained model in the result.
dictionaries	list of Dictionary Specification objects	[ ]	List of dictionaries to detect your own keywords/entities in the resulting transcription.
enable_captioning	boolean	false	Enable the prediction of scene captions (description of the scene) included in the summarized results.

Pre-trained models

Name	Description
zero-shot	A large set of pre-trained labels with powerful zero-shot generalization capabilities which is customizable with your own dictionary of labels
general-c	Various objects and scenes, from general to more specific ones

general-a and general-b are subsets. general-c is recommended.

Example

Send the following JSON as request body via POST to the /jobs/ endpoint:

{
  "sources": [
    "{url-to-your-image}"
  ],
  "modules": {
    "object_scene_recognition": {
      "model": "general-c"
    }
  }
}

When requesting the job via GET on the /jobs/{JOB_ID}/ endpoint, the response looks like this:

{
    "id": "878e6e61-6fa5-4cac-8d1e-dd4066d902df",
    "tag": "",
    "state": "completed",
    "errors": [],
    "progress": 1,
    "duration": 49.409,
    "time_created": "2021-05-20 09:43:14.525000",
    "time_started": "2021-05-20 09:43:14.615000",
    "time_completed": "2021-05-20 09:44:04.024000",
    "sources": [
        "storage://WQM2S9L9O0tDYacfNQN7"
    ],
    "modules": {
        "object_scene_recognition": {
            "model": "general-c",
            "state": "completed",
            "progress": 1
        }
    },
    "media_type": "video",
    "result": {
        "detailed_link": "http://api.deepva.com/api/v1/jobs/878e6e61-6fa5-4cac-8d1e-dd4066d902df/detailed-results",
        "summary": [
            {
                "source": "storage://WQM2S9L9O0tDYacfNQN7",
                "media_type": "video",
                "info": {
                    "fps": 25.0,
                    "resolution": [
                        960,
                        540
                    ],
                    "total_frames": 3636,
                    "duration": 145.44
                },
                "items": [
                    {
                        "type": "object",
                        "label": "Person",
                        "module": "object_scene_recognition"
                    },
                    {
                        "type": "object",
                        "label": "Cow",
                        "module": "object_scene_recognition"
                    }
                ]
            }
        ]
    }
}

To get detailed information about the predicted label (for example the time code or the confidence of the predicted label) you can request the /jobs/{JOB_ID}/detailed-results/ endpoint, the response looks like this:

{
    "total": 60,
    "offset": 0,
    "limit": 10,
    "next": "http://api.deepva.com/api/v1/jobs/878e6e61-6fa5-4cac-8d1e-dd4066d902df/detailed-results/?limit=10&offset=10",
    "prev": "http://api.deepva.com/api/v1/jobs/878e6e61-6fa5-4cac-8d1e-dd4066d902df/detailed-results/?limit=10&offset=0",
    "data": [
        {
            "id": "12b38867-7d4d-4c30-8222-2e55e3ca4e68",
            "media_type": "video",
            "frame_start": 1032,
            "frame_end": 1283,
            "source": "storage://WQM2S9L9O0tDYacfNQN7",
            "module": "object_scene_recognition",
            "meta": {
                "label": "Person",
                "mean_confidence": 1.0,
                "parents": []
            },
            "thumbnail": null,
            "detections": [],
            "time_start": 41.28,
            "time_end": 51.32,
            "tc_start": "00:00:41:07",
            "tc_end": "00:00:51:08"
        },
        {
            "id": "51b3d4d5-e178-4b08-bf9f-290b94821b21",
            "media_type": "video",
            "frame_start": 1284,
            "frame_end": 1379,
            "source": "storage://WQM2S9L9O0tDYacfNQN7",
            "module": "object_scene_recognition",
            "meta": {
                "label": "Cow",
                "mean_confidence": 0.9986,
                "parents": [
                    {
                        "label": "Cattle",
                        "parents": [
                            {
                                "label": "Mammal",
                                "parents": [
                                    {
                                        "label": "Animal",
                                        "parents": []
                                    }
                                ]
                            }
                        ]
                    }
                ]
            },
            "thumbnail": null,
            "detections": [],
            "time_start": 51.36,
            "time_end": 55.16,
            "tc_start": "00:00:51:08",
            "tc_end": "00:00:55:03"
        }
    ]
}