If you've ever sat in a meeting where someone mentioned "ETL pipelines" and nodded along without being entirely sure what they meant — this is for you.
ETL stands for Extract, Transform, Load. It describes the process of taking data from where it lives, cleaning and reshaping it, and putting it somewhere useful. It's the plumbing that makes business intelligence work — and like most plumbing, you only notice it when it's missing or broken.
The three stages — explained plainly
Get the data out
Clean and reshape it
Put it somewhere useful
That's it. Extract the data from where it lives, transform it into a consistent and useful shape, load it into the place your dashboards read from. Every ETL system — from a simple scheduled script to a complex enterprise data platform — is doing these three things in some form.
A real example — the restaurant chain
Abstract explanations only go so far. Here's what ETL looked like in a real project we delivered for a restaurant chain with multiple locations.
They had four systems: a POS system tracking every sale, a finance system with P&L data, a supply chain platform tracking ingredient costs, and kitchen-level spreadsheets tracking waste. Each system used different codes for the same locations and different conventions for dates and product categories.
Before ETL: the finance team's revenue number didn't match the POS team's revenue number because they were measuring different things in different ways. After ETL: one agreed number for every metric, applied consistently across all five dashboards.
What business looks like without ETL
Without ETL
With ETL
The Transform step is where most ETL projects fail
Extract is straightforward — connect to the source, pull the data. Load is straightforward — write it to the destination. Transform is where the real work is, and where most ETL projects that look good in demos fall apart in production.
The Transform step is where you have to answer questions like:
- When the CRM says "revenue" and the finance system says "revenue," are they measuring the same thing? (Almost always: no.)
- What do you do when a record arrives with a null in a field that should have a value?
- When the same customer appears in two systems with slightly different names, how do you know they're the same customer?
- When a location changes its name, do historical records get updated or preserved under the old name?
- What happens when an API returns a format you didn't expect?
This is why we always work with finance, sales, and operations leadership to define metric definitions in writing before building any ETL pipeline. The technical implementation is the easy part. Getting three departments to agree on one definition of "revenue recognition timing" is the hard part — and if you skip it, the ETL pipeline delivers consistent wrong answers rather than inconsistent right ones.
Five signs your business needs ETL
Different systems show different numbers
Finance says £4.2M, CRM says £4.4M. Both are technically right by their own definitions. ETL creates one agreed number.Someone builds reports manually every week
Data exported, combined in Excel, formatted, emailed. Hours of work producing a report that's already out of date. ETL automates all of this.You can't answer cross-system questions
"Which customers have both an open invoice and an active support ticket?" If your data is in separate systems with no common link, ETL is what makes this answerable.You can't see trends — only current state
Your ERP shows today's stock levels. But what were they six months ago? ETL preserves historical snapshots so trend analysis is always possible.BI tool querying production databases directly
Analytical queries running on transactional databases slow both down. ETL moves the analytical workload off production systems entirely.Direct database access isn't available
IT won't allow direct BI connections to production. ETL via structured file exports or PowerApps capture (as we built for Atlantic LNG) solves this without touching production.ETL vs ELT — is there a difference?
You may hear ELT (Extract, Load, Transform) mentioned alongside ETL. The difference is when the transformation happens: ETL transforms the data before loading it, ELT loads raw data first and transforms it inside the destination system.
ELT has become more common with powerful cloud data warehouses (Snowflake, BigQuery, Azure Synapse) that can handle transformation inside the warehouse at scale. For most mid-market businesses, the distinction is less important than whether the transformation logic is well-designed and the metric definitions are properly agreed.
Whether you call it ETL or ELT, the business requirement is the same: data from multiple sources, cleaned and standardised, in a place your dashboards can reliably read from.
What a well-built ETL pipeline looks like in practice
A well-built ETL pipeline isn't just a script that runs and moves data. It has:
- Monitoring and alerting — when a pipeline fails or produces unexpected results, the right people are notified immediately. Not discovered three days later when someone notices the dashboard hasn't updated.
- Data quality validation — checks that run at every load. Row count within expected range. Required fields not null. Referential integrity maintained. Bad data stopped at the door rather than loaded silently.
- Audit logging — every pipeline run recorded. What was extracted, when, from where, what was loaded. When a data discrepancy surfaces three weeks later, the audit log tells you exactly what happened.
- Idempotent processing — running the same data twice produces the same result. No duplicate records from retry runs or accidental re-processing.
- Historical preservation — append-only loads where appropriate. The data warehouse grows with each run, preserving history that the source system may not keep.
The Atlantic LNG engagement is a good example of ETL under constraint: no direct database access was available, so we built a two-stream ingestion architecture using structured Excel exports and Microsoft PowerApps as a data capture layer. The ETL pipeline ran on top of those two streams rather than on production databases. Same principles, different ingestion mechanism.
The honest summary
ETL is the infrastructure that makes business intelligence reliable. Without it, dashboards show data that came from somewhere, cleaned somehow, at some point — and nobody is entirely sure which of those is true.
With a properly designed ETL pipeline, every metric on every dashboard traces back to an agreed definition, applied consistently, from data that was validated before it arrived. That's what "single source of truth" actually means in practice.
If your business is still combining data manually in Excel every week, or if your CRM and finance system are showing different numbers for the same metric, you need ETL — and you probably already know it.