0
Manual data entry for standard invoices
94%
Extraction accuracy on first pass
−22h
Accounts payable time saved per week

The problem we were solving

The client's accounts payable team was processing 400–600 supplier invoices per month entirely by hand. Each invoice arrived through a different channel — email attachments, scanned PDFs from a shared drive, manually uploaded files, and occasionally documents forwarded from their procurement system. A team member would open each one, read the key fields, manually type the data into Xero, and then chase the relevant manager for approval if the amount exceeded a threshold.

The process took an average of 8–12 minutes per invoice. Errors were common — transposed numbers, wrong vendor codes, invoices posted against the wrong purchase order. Month-end close took three days longer than it should because the AP team was still catching up with the backlog.

They needed a system that could handle documents from any source, extract data accurately regardless of invoice format, validate it against their Xero supplier and PO records, route for approval where needed, and post automatically without a human typing a single field.


Architecture overview — the 8-stage pipeline

We designed an eight-stage pipeline covering everything from document arrival to Xero posting. Each stage is independent — documents can enter at any point, and failures at any stage are caught, logged, and routed for exception handling rather than silently dropped.

— End-to-end AI invoice pipeline
📥

Stage 1 — Document Intake

Email inbox (IMAP) · Web portal upload · REST API · Google Drive / Dropbox polling
4 sources
⚙️

Stage 2 — Pre-Processing

File type detection · Image deskew & noise removal · Document classification (invoice / PO / delivery note)
Auto-classify
🔍

Stage 3 — AI Data Extraction

OCR text layer · LLM entity extraction (vendor, invoice no., date, line items, totals) · Structured JSON output
94% accuracy

Stage 4 — Data Validation

Vendor match · Total/tax/duplicate checks · PO vs Invoice cross-check · Delivery note vs PO · Flag mismatches
Auto-flag
🔀

Stage 5 — Approval Routing

Rule-based auto-approval for matched invoices · Exception routing to finance team · Full audit trail
Audit trail
🔗

Stage 6 — Xero Integration

Post bills via Xero API · Create purchase entries · Auto-create vendor if new · Link to PO in Xero
Xero API
☁️

Stage 7 — Storage & Traceability

Document saved in cloud · Linked to Xero transaction · Full processing log maintained
Linked
📊

Stage 8 — Reporting & Insights

Processing status dashboard · Exception reports · Vendor analytics · Throughput metrics
Live dashboard

Stage 1 — Document intake from four sources

The client received invoices through four distinct channels and had no single intake point. We built a unified ingestion layer that monitors all four simultaneously and normalises documents into a standard processing queue regardless of source.


Stage 3 — AI extraction: how it works

The extraction stage is where the AI does the heavy lifting. We use a two-layer approach: a traditional OCR engine to produce a raw text layer from the document, then an LLM (GPT-4o via Azure OpenAI) to interpret the text and extract structured fields regardless of the invoice's layout or format.

This matters because supplier invoices are not standardised. One vendor's invoice puts the total at the bottom right; another puts it in a summary table on page two. A rule-based extraction system requires a template for each supplier format — which breaks every time a supplier redesigns their template. The LLM approach reads the document semantically and extracts the right value regardless of where it appears.

// Extraction schema — structured JSON output per invoice
{ "vendor_name": "Acme Supplies Ltd", "vendor_id": "XERO-CONTACT-4821", "invoice_number": "INV-2024-00847", "invoice_date": "2024-11-14", "due_date": "2024-12-14", "line_items": [ { "description": "Industrial Gasket Set x 50", "quantity": 50, "unit_price": 4.80, "total": 240.00, "account_code": "5000" } ], "subtotal": 240.00, "tax": 24.00, "total": 264.00, "currency": "GBP", "po_reference": "PO-2024-0391", "confidence": 0.97 }

Every extracted document gets a confidence score. Documents below 0.85 confidence are automatically routed to the exception queue for human review rather than proceeding to auto-posting. In practice, confidence falls below this threshold mainly on low-quality scans and handwritten documents — about 6% of the total volume.


Stage 4 — Validation logic

Extraction accuracy alone isn't enough — the data still needs to be correct relative to what was ordered and what was received. We built four validation checks that run automatically on every extracted invoice.

Validation Check Logic Pass Fail Action
Vendor match Extracted vendor name matched against Xero contacts via fuzzy string match + VAT number Auto-map to Xero ID Flag — create new or match manually
Duplicate check Invoice number + vendor ID checked against last 12 months of Xero bills Proceed Block — alert AP team
PO vs Invoice Invoice line items and totals cross-checked against matched purchase order Proceed to approval Flag discrepancy amount
Tax validation VAT/tax recalculated and compared to extracted tax field Proceed Flag for review
Three-way match PO quantity vs Invoice quantity vs Delivery note quantity Auto-approve if ≤2% variance Route to approver with diff highlighted

Stage 5 — Approval routing

Not every invoice needs a human. We designed the approval routing to minimise human touchpoints while maintaining control where it matters. The rules were defined with the client's finance manager and encoded as configurable business rules — they can be updated without a code change.

Every approval action is logged — who approved, when, from which IP, and what the invoice state was at the time. This produces a complete audit trail for each transaction without any manual record-keeping.


Stage 6 — Xero API integration

Once approved, the system posts to Xero automatically via the Xero Accounting API. We use OAuth 2.0 for authentication with token refresh handled automatically — the integration runs continuously without manual re-authentication.

For each approved invoice, the system creates a Bill in Xero with all line items, account codes, tax rates, and due date. If the vendor doesn't exist in Xero, a new Contact is created from the extracted supplier data before the bill is posted. The original document is attached to the Xero bill so the AP team can access the source invoice directly from Xero without switching tools.

Purchase orders that have been fully invoiced are automatically marked as billed in the client's procurement system via a webhook callback.


Results after 90 days in production

0

Manual keystrokes for auto-approved invoices

94%

First-pass extraction accuracy

78%

Invoices auto-approved without human touch

−22h

AP team hours saved per week

<90s

Average time from intake to Xero posting

−3d

Reduction in month-end close time

The 22% of invoices that don't auto-approve still benefit from the system — they arrive in the approver's inbox with all data pre-filled, the PO match highlighted, and a discrepancy note explaining exactly what needs review. Approval time dropped from an average of 3.2 days to same-day for most exceptions.


Tech stack

Python · FastAPI Azure OpenAI (GPT-4o) Azure Document Intelligence Tesseract OCR Xero Accounting API PostgreSQL Azure Blob Storage Azure Service Bus React (approval portal) OAuth 2.0 Docker · Azure Container Apps

Similar invoice processing challenge? Let's talk.

We build this for Xero, Zoho Books, Sage, SAP, and QuickBooks. Free consultation — we'll assess your invoice volume and document mix, and give you a realistic automation rate estimate before any commitment.

All case studies
AI Invoice ProcessingXero IntegrationDocument Processing AIAccounts Payable AutomationOCRPO MatchingAP AutomationAzure OpenAI