The problem we were solving
The client's accounts payable team was processing 400–600 supplier invoices per month entirely by hand. Each invoice arrived through a different channel — email attachments, scanned PDFs from a shared drive, manually uploaded files, and occasionally documents forwarded from their procurement system. A team member would open each one, read the key fields, manually type the data into Xero, and then chase the relevant manager for approval if the amount exceeded a threshold.
The process took an average of 8–12 minutes per invoice. Errors were common — transposed numbers, wrong vendor codes, invoices posted against the wrong purchase order. Month-end close took three days longer than it should because the AP team was still catching up with the backlog.
They needed a system that could handle documents from any source, extract data accurately regardless of invoice format, validate it against their Xero supplier and PO records, route for approval where needed, and post automatically without a human typing a single field.
Architecture overview — the 8-stage pipeline
We designed an eight-stage pipeline covering everything from document arrival to Xero posting. Each stage is independent — documents can enter at any point, and failures at any stage are caught, logged, and routed for exception handling rather than silently dropped.
Stage 1 — Document Intake
Stage 2 — Pre-Processing
Stage 3 — AI Data Extraction
Stage 4 — Data Validation
Stage 5 — Approval Routing
Stage 6 — Xero Integration
Stage 7 — Storage & Traceability
Stage 8 — Reporting & Insights
Stage 1 — Document intake from four sources
The client received invoices through four distinct channels and had no single intake point. We built a unified ingestion layer that monitors all four simultaneously and normalises documents into a standard processing queue regardless of source.
-
Email inbox:
IMAP listener monitors a dedicated AP email address. Attachments are extracted, validated as document types (PDF, image, Word), and queued. The email body is preserved for audit trail.
-
Web portal:
A simple upload interface for suppliers and internal staff. Drag-and-drop with file type validation before upload. Suppliers can track status of their submitted invoices.
-
REST API:
For documents arriving from the client's procurement system and partner portals. Structured intake with authentication and webhook acknowledgement.
-
Cloud storage:
Google Drive folder polling (every 15 minutes) picks up documents dropped by the warehouse and procurement teams. Processed files are moved to an archive folder automatically.
Stage 3 — AI extraction: how it works
The extraction stage is where the AI does the heavy lifting. We use a two-layer approach: a traditional OCR engine to produce a raw text layer from the document, then an LLM (GPT-4o via Azure OpenAI) to interpret the text and extract structured fields regardless of the invoice's layout or format.
This matters because supplier invoices are not standardised. One vendor's invoice puts the total at the bottom right; another puts it in a summary table on page two. A rule-based extraction system requires a template for each supplier format — which breaks every time a supplier redesigns their template. The LLM approach reads the document semantically and extracts the right value regardless of where it appears.
{
"vendor_name": "Acme Supplies Ltd",
"vendor_id": "XERO-CONTACT-4821",
"invoice_number": "INV-2024-00847",
"invoice_date": "2024-11-14",
"due_date": "2024-12-14",
"line_items": [
{
"description": "Industrial Gasket Set x 50",
"quantity": 50,
"unit_price": 4.80,
"total": 240.00,
"account_code": "5000"
}
],
"subtotal": 240.00,
"tax": 24.00,
"total": 264.00,
"currency": "GBP",
"po_reference": "PO-2024-0391",
"confidence": 0.97
}
Every extracted document gets a confidence score. Documents below 0.85 confidence are automatically routed to the exception queue for human review rather than proceeding to auto-posting. In practice, confidence falls below this threshold mainly on low-quality scans and handwritten documents — about 6% of the total volume.
Stage 4 — Validation logic
Extraction accuracy alone isn't enough — the data still needs to be correct relative to what was ordered and what was received. We built four validation checks that run automatically on every extracted invoice.
| Validation Check | Logic | Pass | Fail Action |
|---|---|---|---|
| Vendor match | Extracted vendor name matched against Xero contacts via fuzzy string match + VAT number | Auto-map to Xero ID | Flag — create new or match manually |
| Duplicate check | Invoice number + vendor ID checked against last 12 months of Xero bills | Proceed | Block — alert AP team |
| PO vs Invoice | Invoice line items and totals cross-checked against matched purchase order | Proceed to approval | Flag discrepancy amount |
| Tax validation | VAT/tax recalculated and compared to extracted tax field | Proceed | Flag for review |
| Three-way match | PO quantity vs Invoice quantity vs Delivery note quantity | Auto-approve if ≤2% variance | Route to approver with diff highlighted |
Stage 5 — Approval routing
Not every invoice needs a human. We designed the approval routing to minimise human touchpoints while maintaining control where it matters. The rules were defined with the client's finance manager and encoded as configurable business rules — they can be updated without a code change.
-
Auto-approve:
Invoice passes all 5 validation checks, total below £2,500, vendor is an approved supplier, PO exists and matches within 2%.
-
Line manager approval:
Total £2,500–£10,000 or minor PO discrepancy (<5%). Approval request sent by email with a one-click approve/reject link. Reminder after 24 hours.
-
Finance director approval:
Total above £10,000, new vendor, or validation failure. Full invoice review required in the portal.
Every approval action is logged — who approved, when, from which IP, and what the invoice state was at the time. This produces a complete audit trail for each transaction without any manual record-keeping.
Stage 6 — Xero API integration
Once approved, the system posts to Xero automatically via the Xero Accounting API. We use OAuth 2.0 for authentication with token refresh handled automatically — the integration runs continuously without manual re-authentication.
For each approved invoice, the system creates a Bill in Xero with all line items, account codes, tax rates, and due date. If the vendor doesn't exist in Xero, a new Contact is created from the extracted supplier data before the bill is posted. The original document is attached to the Xero bill so the AP team can access the source invoice directly from Xero without switching tools.
Purchase orders that have been fully invoiced are automatically marked as billed in the client's procurement system via a webhook callback.
Results after 90 days in production
0
94%
78%
−22h
<90s
−3d
The 22% of invoices that don't auto-approve still benefit from the system — they arrive in the approver's inbox with all data pre-filled, the PO match highlighted, and a discrepancy note explaining exactly what needs review. Approval time dropped from an average of 3.2 days to same-day for most exceptions.