Dictionary Resource
Definition
This is an object representing a text-based dictionary that contains words and phrases. Dictionaries can be used in various mining modules in order to customize their results. In case of zero-shot models, they are typically used to provide additional labels for recognition.
A dictionary object has a file associated with it. An example of Dictionary object is provided bellow.
The following mining modules provide the support for dictionaries:
ENDPOINTS
GET /v1/dictionaries/
GET /v1/dictionaries/{DICTIONARY_ID}
/
POST /v1/dictionaries/
DELETE /v1/datasets/{DICTIONARY_ID}
/
Attributes
Name | Type | Description |
---|---|---|
id | string | Global identifier to access the actual resource |
name | string | Name of the dictionary |
type | string | Type of the dictionary. For more information, see below. |
language | string | Language of the dictionary. For more information, see below. |
time_created | string | Creation time of the dictionary (ISO Time String) |
is_public | boolean | Flag indicating whether the dictionary is public or not. For user-uploaded dictionaries it is always false while for dictionaries provided by DeepVA the value is true . Public dictionaries are available to all users and cannot be updated or deleted. |
Type
Different dictionary types define the format of the dictionary file and provide different behavior when used in a mining module. Please note that the type is not explicitly provided when creating a dictionary but it is inferred from the uploaded file's format.
To understand the details of how a specific mining module uses dictionaries, please refer to its documentation.
Simple dictionary (simple
)
Simple dictionaries are used to provide category information for predicted labels. They are defined by a UTF-8 encoded text file where each line represents a single entry in the dictionary. Empty lines in the file are ignored.
Expected content type in multipart/form-data
request while uploading the file is text/plain
.
Example: dictionaries/simple.txt
Map dictionary (map
)
The purpose of map dictionaries is to provide the label substitution behavior during the inference. Each map dictionary is defined by a UTF-8 encoded CSV file that should contain, at minimum, the header line and two columns without any missing values: source
and target
. Every time a label from the source column is predicted by a mining module that supports dictionaries, it will be replaced by the value provided in the target column.
Expected content type in multipart/form-data
request while uploading the file is text/csv
.
Example: dictionaries/map.csv
Language
Optionally, a language can be provided for each dictionary. A dictionary in a specific language will be used only for results in the same language during the mining.
The following values are supported for language
field:
any
- Represents a dictionary that can be applied to any language (default)auto
- Indicates that the language of the dictionary should be automatically detected. Initially, the language is set toany
immediately after the upload and the language detection is triggered. After DeepVA detects the language, the field is updated to the corresponding language. If the language could not be recognized, the value remainsany
.- Any of the supported languages
JSON Example
The following JSON snippet is showing a Dictionary object.
{
"id": "9538e44c-6f30-40b7-8d7c-73bda1d41a9e",
"name": "City names",
"type": "simple",
"language": "english",
"time_created": "2020-01-23 09:34:31.422000",
"is_public": false
}