Manual data entry for standard invoices

94%

Extraction accuracy on first pass

−22h

Accounts payable time saved per week

The problem we were solving

The client's accounts payable team was processing 400–600 supplier invoices per month entirely by hand. Each invoice arrived through a different channel — email attachments, scanned PDFs from a shared drive, manually uploaded files, and occasionally documents forwarded from their procurement system. A team member would open each one, read the key fields, manually type the data into Xero, and then chase the relevant manager for approval if the amount exceeded a threshold.

The process took an average of 8–12 minutes per invoice. Errors were common — transposed numbers, wrong vendor codes, invoices posted against the wrong purchase order. Month-end close took three days longer than it should because the AP team was still catching up with the backlog.

They needed a system that could handle documents from any source, extract data accurately regardless of invoice format, validate it against their Xero supplier and PO records, route for approval where needed, and post automatically without a human typing a single field.

Architecture overview — the 8-stage pipeline

We designed an eight-stage pipeline covering everything from document arrival to Xero posting. Each stage is independent — documents can enter at any point, and failures at any stage are caught, logged, and routed for exception handling rather than silently dropped.

— End-to-end AI invoice pipeline

📥

Stage 1 — Document Intake

Email inbox (IMAP) · Web portal upload · REST API · Google Drive / Dropbox polling

4 sources

⚙️

Stage 2 — Pre-Processing

File type detection · Image deskew & noise removal · Document classification (invoice / PO / delivery note)

Auto-classify

🔍

Stage 3 — AI Data Extraction

OCR text layer · LLM entity extraction (vendor, invoice no., date, line items, totals) · Structured JSON output

94% accuracy

✅

Stage 4 — Data Validation

Vendor match · Total/tax/duplicate checks · PO vs Invoice cross-check · Delivery note vs PO · Flag mismatches

Auto-flag

🔀

Stage 5 — Approval Routing

Rule-based auto-approval for matched invoices · Exception routing to finance team · Full audit trail

Audit trail

🔗

Stage 6 — Xero Integration

Post bills via Xero API · Create purchase entries · Auto-create vendor if new · Link to PO in Xero

Xero API

☁️

Stage 7 — Storage & Traceability

Document saved in cloud · Linked to Xero transaction · Full processing log maintained

Linked

📊

Stage 8 — Reporting & Insights

Processing status dashboard · Exception reports · Vendor analytics · Throughput metrics

Live dashboard

Stage 1 — Document intake from four sources

The client received invoices through four distinct channels and had no single intake point. We built a unified ingestion layer that monitors all four simultaneously and normalises documents into a standard processing queue regardless of source.

Email inbox:
IMAP listener monitors a dedicated AP email address. Attachments are extracted, validated as document types (PDF, image, Word), and queued. The email body is preserved for audit trail.
Web portal:
A simple upload interface for suppliers and internal staff. Drag-and-drop with file type validation before upload. Suppliers can track status of their submitted invoices.
REST API:
For documents arriving from the client's procurement system and partner portals. Structured intake with authentication and webhook acknowledgement.
Cloud storage:
Google Drive folder polling (every 15 minutes) picks up documents dropped by the warehouse and procurement teams. Processed files are moved to an archive folder automatically.

Stage 3 — AI extraction: how it works

The extraction stage is where the AI does the heavy lifting. We use a two-layer approach: a traditional OCR engine to produce a raw text layer from the document, then an LLM (GPT-4o via Azure OpenAI) to interpret the text and extract structured fields regardless of the invoice's layout or format.

This matters because supplier invoices are not standardised. One vendor's invoice puts the total at the bottom right; another puts it in a summary table on page two. A rule-based extraction system requires a template for each supplier format — which breaks every time a supplier redesigns their template. The LLM approach reads the document semantically and extracts the right value regardless of where it appears.

// Extraction schema — structured JSON output per invoice

{
  "vendor_name":    "Acme Supplies Ltd",
  "vendor_id":      "XERO-CONTACT-4821",
  "invoice_number": "INV-2024-00847",
  "invoice_date":   "2024-11-14",
  "due_date":       "2024-12-14",
  "line_items": [
    {
      "description": "Industrial Gasket Set x 50",
      "quantity":    50,
      "unit_price":  4.80,
      "total":       240.00,
      "account_code": "5000"
    }
  ],
  "subtotal": 240.00,
  "tax":      24.00,
  "total":    264.00,
  "currency": "GBP",
  "po_reference": "PO-2024-0391",
  "confidence":   0.97
}

Every extracted document gets a confidence score. Documents below 0.85 confidence are automatically routed to the exception queue for human review rather than proceeding to auto-posting. In practice, confidence falls below this threshold mainly on low-quality scans and handwritten documents — about 6% of the total volume.

Stage 4 — Validation logic

Extraction accuracy alone isn't enough — the data still needs to be correct relative to what was ordered and what was received. We built four validation checks that run automatically on every extracted invoice.

Validation Check	Logic	Pass	Fail Action
Vendor match	Extracted vendor name matched against Xero contacts via fuzzy string match + VAT number	Auto-map to Xero ID	Flag — create new or match manually
Duplicate check	Invoice number + vendor ID checked against last 12 months of Xero bills	Proceed	Block — alert AP team
PO vs Invoice	Invoice line items and totals cross-checked against matched purchase order	Proceed to approval	Flag discrepancy amount
Tax validation	VAT/tax recalculated and compared to extracted tax field	Proceed	Flag for review
Three-way match	PO quantity vs Invoice quantity vs Delivery note quantity	Auto-approve if ≤2% variance	Route to approver with diff highlighted

Stage 5 — Approval routing

Not every invoice needs a human. We designed the approval routing to minimise human touchpoints while maintaining control where it matters. The rules were defined with the client's finance manager and encoded as configurable business rules — they can be updated without a code change.

Auto-approve:
Invoice passes all 5 validation checks, total below £2,500, vendor is an approved supplier, PO exists and matches within 2%.
Line manager approval:
Total £2,500–£10,000 or minor PO discrepancy (<5%). Approval request sent by email with a one-click approve/reject link. Reminder after 24 hours.
Finance director approval:
Total above £10,000, new vendor, or validation failure. Full invoice review required in the portal.

Every approval action is logged — who approved, when, from which IP, and what the invoice state was at the time. This produces a complete audit trail for each transaction without any manual record-keeping.

Stage 6 — Xero API integration

Once approved, the system posts to Xero automatically via the Xero Accounting API. We use OAuth 2.0 for authentication with token refresh handled automatically — the integration runs continuously without manual re-authentication.

For each approved invoice, the system creates a Bill in Xero with all line items, account codes, tax rates, and due date. If the vendor doesn't exist in Xero, a new Contact is created from the extracted supplier data before the bill is posted. The original document is attached to the Xero bill so the AP team can access the source invoice directly from Xero without switching tools.

Purchase orders that have been fully invoiced are automatically marked as billed in the client's procurement system via a webhook callback.

Results after 90 days in production

0

Manual keystrokes for auto-approved invoices

94%

First-pass extraction accuracy

78%

Invoices auto-approved without human touch

−22h

AP team hours saved per week

<90s

Average time from intake to Xero posting

−3d

Reduction in month-end close time

The 22% of invoices that don't auto-approve still benefit from the system — they arrive in the approver's inbox with all data pre-filled, the PO match highlighted, and a discrepancy note explaining exactly what needs review. Approval time dropped from an average of 3.2 days to same-day for most exceptions.

Tech stack

Python · FastAPI Azure OpenAI (GPT-4o) Azure Document Intelligence Tesseract OCR Xero Accounting API PostgreSQL Azure Blob Storage Azure Service Bus React (approval portal) OAuth 2.0 Docker · Azure Container Apps

AI Invoice ProcessingXero IntegrationDocument Processing AIAccounts Payable AutomationOCRPO MatchingAP AutomationAzure OpenAI

AI Invoice Automation
with Xero Integration

The problem we were solving

Architecture overview — the 8-stage pipeline

Stage 1 — Document Intake

Stage 2 — Pre-Processing

Stage 3 — AI Data Extraction

Stage 4 — Data Validation

Stage 5 — Approval Routing

Stage 6 — Xero Integration

Stage 7 — Storage & Traceability

Stage 8 — Reporting & Insights

Stage 1 — Document intake from four sources

Email inbox:

Web portal:

REST API:

Cloud storage:

Stage 3 — AI extraction: how it works

Stage 4 — Validation logic

Stage 5 — Approval routing

Auto-approve:

Line manager approval:

Finance director approval:

Stage 6 — Xero API integration

Results after 90 days in production

0

94%

78%

−22h

<90s

−3d

Tech stack

Similar invoice processing challenge? Let's talk.

AI Invoice Automationwith Xero Integration

The problem we were solving

Architecture overview — the 8-stage pipeline

Stage 1 — Document Intake

Stage 2 — Pre-Processing

Stage 3 — AI Data Extraction

Stage 4 — Data Validation

Stage 5 — Approval Routing

Stage 6 — Xero Integration

Stage 7 — Storage & Traceability

Stage 8 — Reporting & Insights

Stage 1 — Document intake from four sources

Email inbox:

Web portal:

REST API:

Cloud storage:

Stage 3 — AI extraction: how it works

Stage 4 — Validation logic

Stage 5 — Approval routing

Auto-approve:

Line manager approval:

Finance director approval:

Stage 6 — Xero API integration

Results after 90 days in production

0

94%

78%

−22h

<90s

−3d

Tech stack

Similar invoice processing challenge? Let's talk.

AI Invoice Automation
with Xero Integration