Visual Mining Modules

Overview

In the modules field, you specify which of the modules should be applied to your sources during creating a job. The following modules are available in the second beta period of DeepVA Cloud:

Visual Mining Module	ID	Description
Face Recognition	face_recognition	Face Recognition detects and identifies the faces of public figures in a variety of categories such as politics, sports, business and entertainment. This module covers more than 20.000 personalities, including the world's most famous people and a vast majority of German politicians and athletes.
Object & Scene Recognition	object_scene_recognition	Object and scene recognition detects and labels various objects and scenes, from general to more specific ones. With this module you can immediately summarize the content of pictures or videos. It can be used to conveniently and reliably categorize and archive visual data with more than 1,500 object classes.
Lower Third Recognition	lower_third_recognition	Lower Third Recognition reads the names displayed on screen text-inserts and associates them to the corresponding person. Create an individual and unique dataset of your own repository of personalities.
Face Dataset Creation	dataset_creation	Face Dataset Creation reads the names displayed on screen text-inserts and associates them to the corresponding person. Create an individual and unique dataset of your own repository of personalities for image-based datasets.
Speaker Dataset Creation	speaker_dataset_creation	Speaker Dataset Creation reads the names displayed on screen text-inserts and associates them to the corresponding person. Create an individual and unique dataset of your own repository of personalities for audio-based datasets.
Landmark Recognition	landmark_recognition	Landmark Recognition identifies all important sights, architectural structures and natural monuments across Europe. Easily archive and retrieve visual material showing places of interest for content creation.
Advanced Diversity Analysis	advanced_diversity_analysis	Advanced Diversity Analysis offers the possibility to determine the percentage of gender and age occurrence in images or videos. Ensure your desired ratio between male and female and age ranges in any content.
QR Code Detection	qrcode_detection	QR Code Detection offers the possibility to find and decode QR Codes but also EAN13 Codes (European Article Number) and their corresponding product names in your videos and images.
Face Attributes	face_attributes	Face Attributes recognizes emotions, ethnicity, gender or facial characteristics such as "beard", "eyes closed" or "glasses" of all persons appearing in pictures or videos.
Speech Recognition	speech_recognition	The Speech Recognition module transcribes spoken language into text (speech-to-text), it can detect the spoken language automatically, detect Named Entities and custom entities from a Dictionary and offers the translation of the transcript.
Subtitle Detection	subtitle_detection	The Subtitle Detection module detects the appearance of burned-in subtitles, its position, language and the actual text content of the subtitle.
Color Detection	color_detection	The Color Detection Module identifies and extracts dominant colors from an image based on the proportion of pixels they occupy.
Text Recognition	text_recognition	Text Recognition detects and extracts printed text from images and video frames. It supports multiple languages, outputs structured OCR results (including layout and positional metadata), and can be used for indexing, searching and analytics of visual documents.
Visual Understanding	visual_understanding	Performing visual language comprehension tasks, such as answering visual questions, understanding scenes, and making advanced deductions.

Module configurations

You can configure each module via the module parameters. The parameters can be set in the modules field of the job. See Job resource.

The Visal Mining Module is defined as key (the module ID is used). The value is a JSON dictionary object holding the module parameters.

All available parameters for each module you can find on the Visual Mining Module sub-page if you click on the module in the overview above.

The following example shows the content of the modules field specifying two modules and their parameters:

{
  "modules": {
      "face_recognition": {
        "model": "celebrities"
      },
      "object_scene_recognition": {
        "model": "general"
      }
  }
}