Skip to content

Dictionary Resource

Definition

This is an object representing a text-based dictionary that contains words and phrases. Dictionaries can be used in various mining modules in order to customize their results. In case of zero-shot models, they are typically used to provide additional labels for recognition.

A dictionary object has a file associated with it. An example of Dictionary object is provided bellow.

The following mining modules provide the support for dictionaries:

ENDPOINTS

GET /v1/dictionaries/

GET /v1/dictionaries/{DICTIONARY_ID}/

POST /v1/dictionaries/

DELETE /v1/datasets/{DICTIONARY_ID}/

Attributes

Name Type Description
id string Global identifier to access the actual resource
name string Name of the dictionary
type string Type of the dictionary. For more information, see below.
language string Language of the dictionary. For more information, see below.
time_created string Creation time of the dictionary (ISO Time String)
is_public boolean Flag indicating whether the dictionary is public or not. For user-uploaded dictionaries it is always false while for dictionaries provided by DeepVA the value is true. Public dictionaries are available to all users and cannot be updated or deleted.

Type

Different dictionary types define the format of the dictionary file and provide different behavior when used in a mining module. Please note that the type is not explicitly provided when creating a dictionary but it is inferred from the uploaded file's format.

To understand the details of how a specific mining module uses dictionaries, please refer to its documentation.

Simple dictionary (simple)

Simple dictionaries are used to provide category information for predicted labels. They are defined by a UTF-8 encoded text file where each line represents a single entry in the dictionary. Empty lines in the file are ignored.

Expected content type in multipart/form-data request while uploading the file is text/plain.

Example: dictionaries/simple.txt

Map dictionary (map)

The purpose of map dictionaries is to provide the label substitution behavior during the inference. Each map dictionary is defined by a UTF-8 encoded CSV file that should contain, at minimum, the header line and two columns without any missing values: source and target. Every time a label from the source column is predicted by a mining module that supports dictionaries, it will be replaced by the value provided in the target column.

Expected content type in multipart/form-data request while uploading the file is text/csv.

Example: dictionaries/map.csv

Language

Optionally, a language can be provided for each dictionary. A dictionary in a specific language will be used only for results in the same language during the mining.

The following values are supported for language field:

  • any - Represents a dictionary that can be applied to any language (default)
  • auto - Indicates that the language of the dictionary should be automatically detected. Initially, the language is set to any immediately after the upload and the language detection is triggered. After DeepVA detects the language, the field is updated to the corresponding language. If the language could not be recognized, the value remains any.
  • Any of the supported languages

JSON Example

The following JSON snippet is showing a Dictionary object.

{
    "id": "9538e44c-6f30-40b7-8d7c-73bda1d41a9e",
    "name": "City names",
    "type": "simple",
    "language": "english",
    "time_created": "2020-01-23 09:34:31.422000",
    "is_public": false
}