Mitigate Procurement

Document parsing

How your uploaded files get converted into AI-readable text.

When you upload a PDF or Word document, the AI can't just "see" it like you do. It needs structured text. Parsing is the conversion step that makes this possible.

What parsing does

Your documents go through a specialized parsing service that:

  1. Extracts text from the document format (PDF, Word, etc.)
  2. Preserves structure - headings, paragraphs, lists, tables
  3. Handles tables - converts them into a format the AI can interpret
  4. Reads scanned documents - uses OCR (optical character recognition) for image-based PDFs
  5. Processes EDOC archives - extracts all files from the archive and parses each one

The output is clean, structured text that the AI agents can search and read efficiently.

What affects parsing quality

Document type matters. A Word document or a native PDF (created from software, not scanned) gives the cleanest results. The text is already digital - parsing just restructures it.

Scanned documents are trickier. The system has to "read" images of text, which depends on scan quality. A clear, straight, high-resolution scan works well. A faded, skewed, or low-resolution scan may have errors.

Tables can be complex. Simple tables parse well. Complex ones with merged cells, nested tables, or unusual formatting may lose some structure. If a key requirement is hidden in a complex table, check the parsed result.

Formatting-heavy documents - lots of text boxes, watermarks, multi-column layouts, embedded images with text - can sometimes lose content during parsing. The simpler the layout, the more reliable the result.

The technology

Parsing is handled by LlamaParse, a service specifically built for converting documents into AI-ready text. It's good at handling the variety of formats you'd see in procurement - long PDFs, multi-sheet Excel files, PowerPoint presentations.

When parsing fails

See the troubleshooting guide for specific issues. The most common culprits are password protection, file corruption, and very poor scan quality.

You can retry parsing for any file. If it keeps failing, try converting the source file to a different format and re-uploading.

On this page