Changelog

The changelog is a record of changes to the DeepVA software.

16. June 2025

Adding two new models to Visual Understanding mining module (Qwen-2.5-VL and SmolVLM)
Adding the support for structured output in Visual Understanding mining module

07. June 2025

Release of a new revised Text Recognition mining module with improved accuracy and performance
Adding the support of audio/x-wav content type for Speaker Identification training
Fixing an issue where requesting a dictionary in the Speech Recognition was failing

08. Apr 2025

Adding a new mining module Color Detection for detecting dominant colors in videos and images

25. Nov 2024

Releasing celebrities v35 model for Face Recognition mining module

15. Nov 2024

Releasing celebrities v34 model for Face Recognition mining module
UI improvements
Minor bugfixes

05. Nov 2024

Releasing celebrities v33 model for Face Recognition mining module

09. Oct 2024

Hotfix: Fixing an issue where transcript exports were failing due to a wrong offsetting of paragraphs

02. Oct 2024

Adding support for formatting VTT and SRT files when exporting transcripts (max_line_width and max_line_count) to align with industry standards
Adding a tag field to the export resource
Adding a transcript_tag field to the on_created event of the transcript version webhook
Adding the restriction in the selection of language field of a transcript variant by dropdown of ISO languages
Adding the support of *.ogg audio files (audio/ogg and application/ogg)
Adding the functionality to adjust the playback speed in the video player on the Transcript Editor page
Fixing an issue where saving a transcript would jump to the first page in the Transcript Editor
Adding a message for the HTTP 403 on the external Transcript Editor when permissions were restricted
Fixing an issue with the pagination functionality of the dictionaries page
Adding a global 404 page to the frontend
Minor UI improvements

27. Sep 2024

Changing default threshold for Dictionary Specification to a value of 0.90 due to better quality of matching
Fixing an issue where the order of paragraphs in the Transcript Editor were not correct after saving

18. Sep 2024

Adding the option to restrict the JTW token for authentication of the stand-alone version of the Transcript Editor to a specific transcript via a custom permissions claim
Changing the exp claim from optional to mandatory in the JWT token for authentication of the stand-alone version of the Transcript Editor

16. Sep 2024

Fixing an issue where empty segments from Voice Activity Detection were causing the whole Speech Recognition job to fail
Improving the robustness of downloads of external URLs as job sources by implementing retry logic and optimizing timeout handling

02. Sep 2024

Improving fuzzy dictionaries for word correction in Speech Recognition
Small bugfixes

19. Aug 2024

Optimization of the loading of transcript segments in the Transcript Editor

10. Aug 2024

Displaying more job-related information on the job overview and teh job detail page
Adding filter for tag field on the job overview page
Displaying the tag field of transcripts on the transcript overview page
Adding filter functionality to the transcripts page
Adding a spinner while loading the transcript paragraphs for a better user experience
Improving the color-coding for transcript word-level confidence in the Transcript Editor

22. July 2024

Displaying descriptive network error instead of the generic unknown error when service timeouts
Adding an action to the source list context menu to start a job from the source on the right info card when a job is selected
Small bugfixes

12. July 2024

Adding the fields export_id and job_id to the Artifacts resource for exports and job-based generated artifacts

11. July 2024

Adding three new webhook events for exports (deepva.export.on_started, deepva.export.on_completed and deepva.export.artifact.on_created)

08. July 2024

Fixing a bug where time_created field of the TranscriptVariantVersion was not updated correctly

04. July 2024

Adding a page to manage Access Keys to the Preferences

28. June 2024

Introducing Access Keys enabling integrating of various frontend components into user workflows
Adding a stand-alone version of the Transcript Editor accessible trough JWT token authentication to enable integration into custom solutions and workflows

20. June 2024

Adding a tag field to the Transcript resource allowing to tag a transcript with a custom ID or label for reference

17. June 2024

Adding DOXC (.docx) as a new export format for the Transcript Editor
Fixing a bug where the creation of a transcript from a job with multiple mining modules was failing
Introducing an API-wide webhook system to subscribe to certain events. See Webhooks for more information.

14. June 2024

Releasing celebrities v31 model for Face Recognition mining module (EURO 2024 update)

10. June 2024

Adding export functionality for the transcript editor to export the transcript in SRT and WebVTT subtitle format
Releasing celebrities v2 model for Speaker Identification mining module (350+ new identities added)
Introducing a new argument for the Speech Recognition module to disable the paragraph formatting which combines multiple segments into a paragraph of sentences (format_paragraph). This is useful for subtitles where short segments are expected.
Fixing a bug where the transcript view errors out when the linked job has been removed
Fixing a bug where an empty source was accepted for the job creation
Fixing a bug where the transcript editor did not jump to the next page while playing the video
UI improvements for the transcript editor
Improving the status visualization of exports
Minor bugfixes

17. May 2024

Releasing celebrities v30 model for Face Recognition mining module

03. May 2024

Release of the Transcript Editor (Beta 1) that allows users to edit and manage transcripts generated by the Speech Recognition mining module
Adding cancellation feature for ongoing uploads in the storage uploader dialog to ensure smoother user experience
Fixing various minor UI bugs across dictionaries, storage page, dataset class clusters and indexed identities
Fixing an issue related to dataset type loading based on selected model type in the training wizard
Fixing an issue where re-running jobs did not pre-select the correct model
Minor UI bugfixes and usability improvements

26. Apr 2024

Releasing celebrities v29 model for Face Recognition mining module (100+ new identities added)

19. Mar 2024

Adding the indexing capabilities to the Speaker Identification module which allows to "finger print" and index the speaker's voice
Introducing API endpoints for the upcoming Transcript Editor
Minor bugfixes

08. Mar 2024

Releasing celebrities v28 model for Face Recognition mining module (new identities added)

28. Feb 2024

Fixing a bug to prevent possible class name folder collision during dataset exports

12. Dez 2023

Releasing the Training and Evaluation for speaker datasets
Changing the default sorting of the job results in the UI with a video source to be by frame_start
Minor bugfixes

28. Nov 2023

New Visual Mining Module: Personal Data Anonymization which blurs faces and license plates in images and videos

16. Nov 2023

Releasing celebrities v27 model for Face Recognition mining module (new identities added)

14. Nov 2023

Adding a feature to the Object and Scene Recognition module which allows to enable captioning (enable_captioning module parameter)
Fixing a bug where IAIS Face Dataset Export and IAIS Audio Dataset Export did not export more than 100 classes
Adding a summary as CSV file for the IAIS Face Dataset
Minor bugfixes

27. Oct 2023

Adding MapDictionary as a new type of Dictionary
Adding the capability to allow users to specify a start and end range for partial video processing
Internal database upgrades
Migrate to explicit UUID for Knowledge Graph Node IDs as a preparation for the upcoming Knowledge Graph release
Improving the inference time for batch of images in Face Recognition mining module
Fixing a bug in custom IP address whitelisting on API-Keys
Minor bugfixes

24. Oct 2023

Releasing celebrities v26 model for Face Recognition mining module (new identities added)

10. Oct 2023

Releasing a new model (zero-shot) for Object and Scene Recognition including a large set of pre-trained labels with powerful zero-shot generalization capabilities which is customizable with your own dictionary of labels.
Adding face dataset export for IAIS format
Display the detected main language on Subtitle Detection result page
Correcting some translation errors for the German UI
Adding punctuation information to word-level timestamps for Speech Recognition

28. Sep 2023

Fixing an issue where unicode decoding errors were not handled properly for custom dictionaries
Introducing a new field for dataset samples used_in_training which indicates if the sample was used for training or not. The field is optional and will be part of the response object if the include_used_in_training query parameter is given during the requesting of the /samples, /audios or /images endpoints
Allowing partners to specify and permit specific CORS origins for API requests

05. Sep 2023

Releasing celebrities v25 model for Face Recognition mining module (new identities added and providing class IDs for our pre-trained model too)
Fixing an issue where empty results were causing the Speaker Dataset Creation job to fail
Minor UI bugfixes for Speech Recognition and Subtitle Detection result page

22. Aug 2023

Add Translation section to Speech Recognition results page
Small usability improvements for the Job result section

08. Aug 2023

Adding the support of multiple models (chain of models) for Face Recognition
New Visual Mining Module: Subtitle Detection which detects the appearance of burned-in subtitles, its position, language and the actual text content of the subtitle
Fixing an issue with Speech Recognition when the punctation ends up in the next token instead of the current one
Improving some minor performance issues in the DeepVA Worker deployments by limiting the resources
Minor UI bugfixes

25. July 2023

Improving the scrolling and user interaction for Speech Recognition transcripts while playing the video or audio on the job result page
Avoid showing redundant parent labels for Object and Scene Recognition in the UI
Adding E-Mail verification and password reset functionality
Fixing an issue where Face Recognition jobs were failing for specific edge cases

11. July 2023

Enable the support of live streams for Face Recognition
Automatic translation for Speech Recognition transcripts to several languages
Small UI bug fixes

27. June 2023

Adding the management of text-based dictionaries to the UI
Improve the performance of thumbnail loading in the UI
Overhauling of the webhook event names by introducing namespaces. See Webhook Resource for more information.
Removing the argument word_level_timestamps for Speech Recognition and enable it by default always
New status page to monitor the health of our services at status.deepva.com
Adding an audio player to the job result page for audio-based sources
Fixing an issue where the session was not updated when changing the password
Joining text segments of Speech Recognition to a paragraph in order to improve Named Entity Recognition (NER) results
Handling and reporting an error when the source video has no audio track
Introducing a new mode for Speech Recognition in order to choose quality over speed of the processing (mode)
Fixing an issue where filtering on dataset evaluation feedback was not working in the UI
Fixing an issue where white spaces and umlauts in the file name did not allow the video player to play the source from the storage
Minor bug fixes

13. June 2023

Releasing celebrities v24 model for Face Recognition mining module

26. May 2023

Improving performance of Speech Recognition mining module (improved timestamps and transcript quality)
Adding word-level timestamps for Speech Recognition mining module
Introducing Named Entity Recognition (NER) for Speech Recognition mining module
Introducing text-based dictionaries to customize the result of mining modules such as Speech Recognition and Lower Third Recognition which for example enables customized named entity recognition
Introducing editing mode of transcripts for Speech Recognition mining module
Adding the spoken language to a speech segment for Speech Recognition mining module
Releasing celebrities v23 model for Face Recognition mining module
Dropping video transcoding for files supported by the browser (direct play)
Introducing Job Batches to group Jobs
Introducing Diversity Reporting across jobs in a batch hierarchy
Increasing an internal network request timeout that was exceeded occasionally for result submission of large jobs
Fixing an issue where listing training sample sources from a prediction was not working ("Go to training source")
Fixing an issue where updating the value of custom fields of a dataset class where not handled correctly from inside the class view
Fixing an issue where the video player did not show up for failed jobs
Adding reverse chronological sorting on the storage page
Improving some default thresholds for Advanced Diversity Analysis mining module
Internal improvements and minor bug fixes

23. Feb 2023

Releasing celebrities v22 model for Face Recognition mining module
Hotfix: Fixing broken video player for public URLs and YouTube videos

22. Feb 2023

Hotfix: Fixing an issue where jobs with a large number of detailed-results were failing due to a limitation of the payload size
Minor fixes

17. Feb 2023

Adding a feature to the Face Recognition mining module which returns the top k most similar identities (enable_top_k module parameter).
Fixing an issue regarding the max. file name length on the storage (increased from 100 to 255)
Improving the performance of processing a batch of images in a job
Providing OpenAPI/Swagger specifications for all endpoints at https://api.deepva.com/swagger
Fixing an issue regarding the request limit on webserver level
Renaming the value for the field media_type from wav and mp3 to audio (following values are supported: image, video, audio, videostream, pdf, xml)
Introducing a ttl field to the Job object which allows to set a time-to-live in seconds until the job will be deleted
Improving the performance of the file upload to the storage
Improving the Speech Recognition mining module by adding Voice Activity Detection (enable_vad module parameter)
Fixing an issue where custom fields were not saved for datasets of type audio
Fixing the calculation of the job progress for audio based modules such as Speech Recognition
Fixing an issue where the filtering for "Go to training source" from a Face Recognition result did not work properly
Fixing the broken folder dropdown in the Visual Mining Job wizard
Adding the support for Instagram Reels and Tiktok Video URLs as job sources
Adding the support for m4a files
UI improvements
Minor fixes and improvements

13. Dec 2022

New Visual Mining Module: Speech Recognition which enables speech-to-text functionality
Fixing an issue where detailed_link in the Job and Detailed Results object were broken (wrong HTTP scheme)
Improving the handling of static files
Add expiration date of users
Prevent stopping of jobs in the state waiting
UI improvements
Minor fixes

30. Nov 2022

Introducing stop operation for jobs (Jobs can be stopped at any progress without loosing their results)
Adding support for audio files on the storage
Adding Dataset Management for audio datasets
Introducing abstraction for training samples by adding a general /samples endpoint + endpoints for /images and /audios
Adding the ability for annotation of audio segments via the UI
Improving the fairness of job queuing by introducing a "fair queue"

30. Sep 2022

New Visual Mining Module: Speaker Dataset Creation which enables to automate the retrieval of audio-based datasets (similar to Face Dataset Creation)

23. Feb 2023

Releasing celebrities v22 model for Face Recognition mining module
Hotfix: Fixing broken video player for public URLs and YouTube videos

22. Feb 2023

Hotfix: Fixing an issue where jobs with a large number of detailed-results were failing due to a limitation of the payload size
Minor fixes

17. Feb 2023

Adding a feature to the Face Recognition mining module which returns the top k most similar identities (enable_top_k module parameter).
Fixing an issue regarding the max. file name length on the storage (increased from 100 to 255)
Improving the performance of processing a batch of images in a job
Providing OpenAPI/Swagger specifications for all endpoints at https://api.deepva.com/swagger
Fixing an issue regarding the request limit on webserver level
Renaming the value for the field media_type from wav and mp3 to audio (following values are supported: image, video, audio, videostream, pdf, xml)
Introducing a ttl field to the Job object which allows to set a time-to-live in seconds until the job will be deleted
Improving the performance of the file upload to the storage
Improving the Speech Recognition mining module by adding Voice Activity Detection (enable_vad module parameter)
Fixing an issue where custom fields were not saved for datasets of type audio
Fixing the calculation of the job progress for audio based modules such as Speech Recognition
Fixing an issue where the filtering for "Go to training source" from a Face Recognition result did not work properly
Fixing the broken folder dropdown in the Visual Mining Job wizard
Adding the support for Instagram Reels and Tiktok Video URLs as job sources
Adding the support for m4a files
UI improvements
Minor fixes and improvements

13. Dec 2022

New Visual Mining Module: Speech Recognition which enables speech-to-text functionality
Fixing an issue where detailed_link in the Job and Detailed Results object were broken (wrong HTTP scheme)
Improving the handling of static files
Add expiration date of users
Prevent stopping of jobs in the state waiting
UI improvements
Minor fixes

30. Nov 2022

Introducing stop operation for jobs (Jobs can be stopped at any progress without loosing their results)
Adding support for audio files on the storage
Adding Dataset Management for audio datasets
Introducing abstraction for training samples by adding a general /samples endpoint + endpoints for /images and /audios
Adding the ability for annotation of audio segments via the UI
Improving the fairness of job queuing by introducing a "fair queue"

30. Sep 2022

New Visual Mining Module: Speaker Dataset Creation which enables to automate the retrieval of audio-based datasets (similar to Face Dataset Creation)

29. Mar 2022

New Visual Mining Module: Advanced Diversity Analysis which gives more detailed result than the previous Diversity Analysis. The previous module will be still available for backward compatibility reasons.
Introducing summarized results for jobs which is only used by Advanced Diversity Analysis so far. The ResultSummary of the job object will become deprecated and is going to be removed in the v2 of the API.
Fixing an issue when showing a large list of custom fields on the preferences page
Fixing an issue which broke the ability to update the value of a custom field on the class level
Fixing an issue where the timeline chart was not updated properly on the job result page

07. Mar 2022

UI improvements
Adding a preferences page for general account settings
Adding German language to the UI (accessible via the preferences page)
Adding custom fields for dataset classes to the preferences page
Faster processing of YouTube links
Fixing an issue where some endpoints where redirected if no trailing slash was given
Adding the ability to use more than one model for face recognition in on-prem environment
Refactor UI for the storage section
Adding search and filter to storage picker in the Visual Mining Wizard
Introducing v2 of the API (BETA!, not recommended for production yet)
Minor fixes

26. July 2021

Show fallback image if image url is not available anymore
Fix an issue when showing landmark recognition result
Show spinner while results are loading
Minor fixes

22. Jun 2021

Existing fields were updated in the API response:
- The existing fields frame_start and frame_end of the Detailed Result object will have the zero-based index of the source when a batch of images is passed to a job (before both fields had a null value for image jobs)
New fields will be added to the API response by Friday, 25. Jun 2021:
- A new string field called type will be added to the Class object representing the inherited type of the dataset (e.g. "face" or "landmark")

16. Jun 2021

Fixing an issue where Dataset Evaluation was failing for large datasets

04. Jun 2021

Management of index collections
UI improvements
Sorting for number of images per class added
Small performance improvements on storage level
Face Recognition: +1k identities added to our pre-trained model
Object & Scene Recognition: Improved model 'general-c' added
Landmark Recognition: Improved model 'general-b' added
Improved performance for Dataset Evaluation
Minor bugfixes

12. Feb 2021

UI improvements
Small bugfixes on face recognition result visualization

01. Feb 2021

Thumbnails added to Face Recognition result
Indexing of unknown identities for Face Recognition added (See module parameters)
Showing Diversity Analysis chart on module result and dataset level

20. Nov 2020

The Help Center is now available with some tutorial videos
Performance improvements for operations on datasets
New Visual Mining Module: Aspect Ratio Detection
Mining Module Gender Neutrality Estimation renamed into Diversity Analysis since it has a new ability to detect the age as well (Module ID has changed to diversity_analysis).
Bugfixes and improvements for Dataset Evaluation

29. Oct 2020

Hotfix: Broken video player for videos on DeepVA Storage

23. Oct 2020

UI Job page re-designed
Dataset Evaluation available to check the quality of your datasets (see FAQ)
Support of MOV (QuickTime) video format added
Class page loading time improved
Minor bugfixes

28. Aug 2020

UI Dataset, Class and Image page re-designed
Page loading time improvements
Several bugfixes

14. Aug 2020

Landmark Recognition: Improvements for general model
New Visual Mining Module: QR Code Detection offering the possibility to find and decode QR Codes but also EAN13 codes (European Article Number) and their corresponding product names in your videos and images.

11. Aug 2020

List of Detections added to DetailedResult resource (enable the user to get bounding-boxes of a face when applying Face Recognition)
Support of MXF video format added
Support of M3U8 stream URLs added
Landmark Recognition: Europe + North America pre-trained model released (general)

10. Jul 2020

Custom training of Landmark Recognition models

29. May 2020

Model versioning
UI Model page re-designed
Multi-file upload feature added

15. Apr 2020

New Visual Mining Module: Landmark Recognition offering the possibility to identify all important sights, architectural structures and natural monuments across the world

28. Apr 2020

Video player for job result added

01. Apr 2020

Custom training of Face Recognition models
Dataset management via UI
New Visual Mining Module: Gender Neutrality Estimation offering the possibility to determine the percentage of gender occurrence in images or videos. Ensure your desired ratio between male and female in any content.