Skip to content
PixEdit web header (2)

Data extraction

Extract data for registration in system applications

With a data extraction module *) for PixEdit® Desktop, data is extracted from existing files or in connection with document scanning. Both single-page and multi-page documents, with or without colour, can be processed.

Data model

The work starts with defining a data model. The data model contains the type of information that is to be extracted from the documents and in what order. The data model can be used to extract data from structured documents (forms) and unstructured documents. The data can be, for example, name, agreement number, social security number etc.

Export to XML or CSV

The scanned documents are saved as PDF and the associated data extracts are saved as data files in a standard exchange format (XML or CSV). File name and storage location can be defined in the process. The data extracts can then be used in other systems or imported into Microsoft Excel.

Structured documents (forms)

When the documents have a standard appearance and the data to be extracted is always in the same location in the document, data extraction can be automated by creating form templates. The form templates define the location of the data and any required properties to detect deviations.
 
By defining a form template for each form type, PixEdit will automatically identify the form type so that you can scan different forms in the same batch. The data is extracted according to the various form templates. 

*) The functionality is available as an add-on/extension module to PixEdit Desktop.

Unstructured documents

When the documents are different, it cannot be automatically identified what kind of document this is. It therefore becomes a manual task to determine what kind of information the document contains and which data extracts are relevant.
 
The data extraction is done by registering (typing or copying) data in a separate registration window with defined fields. The extract can then be exported to XML or CSV.