Optical character recognition (OCR) FAQ

Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document or from subtitle text superimposed on an image.
The OCR service is enabled on an individual site by enabling the feature within the site administration section.
Once enabled, all existing and new documents uploaded to the site will be sent to the OCR server for processing, following a set schedule. Please note that it might take some time for all documents to go through the OCR process.
Additionally, the status of the OCR(ed) documents can be viewed within the site administration interface. The OCR functionality comes with page numbering count and once the document has gone through the OCR process, the page numbers become visible in document details.

FAQ

Which document types are supported by OCR?
The supported document types are a combination of:
  • Files that are supported by the document viewer
  • Files that are whitelisted for the instance within the system configuration settings
Is there any impact of OCR on system performance?
OCR is managed by a separate service and is scheduled at a fixed rate (the configuration allows you to change the frequency of documents that are OCRed in a given time) to ensure that the HighQ instance will have little to no impact due to OCR enablement.
Can you force the OCR(ing) of an individual document
There is no option to force the OCR(ing) of an individual document. Once enabled, all documents are sent for OCR and if the OCR quality is poor, there is no option to send the document for OCR other than downloading and adding a new version of the document. This will trigger the OCR of the document again.
Supported languages
  • English
  • Dutch
  • German
  • French
  • Portuguese Standard
  • Portuguese Brazilian
  • Italian
  • Chinese (simplified)
  • Chinese (traditional)
  • Japanese
  • Arabic
  • Danish
  • Norwegian
  • Swedish
  • Finnish