Generating OCR/Searchable Text

<< Click to Display Table of Contents >>

Navigation:  Managing Documents >

Generating OCR/Searchable Text

With the OCR/Language Analysis function, you can generate searchable text of images and PDF documents. Several DWR features use this searchable text such as:

Keyword searching

Language Analysis

Productions

Predictive Analysis (DWR Gist)

There are several ways to access the OCR functions in DWR:

Via the Process -> Collections interface right-click a collection and select “Generate OCR”.

Via the Process -> Collections interface by right-clicking on an individual import and selecting "Generate OCR"

In Review by right-clicking on the filter tree and selecting "OCR/Language Analysis..."

In Review by right-clicking one or more selected documents and selecting "OCR/Language Analysis..."

The interface:

ocr_lang_analysis

Choose whether to overwrite existing OCR and click Run to start the OCR process.

You can view the progress of your job in the View Jobs interface. See Viewing Jobs for more information.  After OCR is generated, the "OCR" column in the Document List will indicate whether or not a document has OCR.

ocr_columns

Note: Any errors that occur when generating OCR/extracted text will be logged and can be viewed on the Exceptions tab in the Process -> View Collections interface after processing completes.