Extracts content from the Adobe PDF format and saves it as XML with built-in code parser and high conversion performance. Supports command line and automation.
Converting a PDF document into a tagged text-based XML file does not come without its problems. PDF-to-XML is an extremely easy-to-use utility that uses an intelligent parsing algorithm that detects all lines, page breaks, and paragraphs and tags them accordingly. Likewise, it can identify tables and tag all cells, columns, and rows efficiently to produce an XML file compatible with the 1.0 specification and above.
In order for the conversion process to succeed, the original PDF document needs to be a text-based file or to have gone through an OCR process beforehand, as PDF-to-XML comes without OCR capabilities. The program’s conversion functionality itself is as simple as they come, thanks to its wizard-driven interface. You’ll be asked to select a PDF file and to define the page range to be converted. You are then given the option to customize the XML tags supported, namely cell, column, line, page, par (for paragraphs), and row. The limited list of XML elements supported can give you an idea of the types of PDF documents that this tool can deal with successfully. The third step of the conversion wizard is the conversion process itself. You will need to go through these three steps again for every PDF document you wish to convert, as no batch conversion capabilities have been added to this tool. more
Comments