Bottom line
This is a good first step, but it is not really a true airline load-factor estimate.
It is better described as an airport-demand exposure model.
Your proportional allocation method:
then:
algebraically becomes:
So it mostly says:
Is this airline exposed to airports where passenger demand is strong or weak relative to scheduled seats?
That is useful. But it does not recover airline-specific load factor inside each airport.
Assessment of Step 1: airport actual pax vs OAG seats
What works
This can be useful for YoY trend nowcasting, especially if the question is:
Is airline X exposed to airports where actual demand is running ahead of, or behind, scheduled capacity?
That is investable if used as a relative signal, not as a precise load-factor estimate.
| Use case | Verdict |
|---|---|
| Detecting airport-level demand strength vs scheduled capacity | Good |
| Ranking airlines by exposure to strong/weak airports | Good |
| Estimating airline LF YoY direction | Maybe |
| Estimating absolute airline LF | Weak |
| Estimating carrier-specific LF at hubs | Dangerous |
| Replacing reported LF / RPM / ASK / RPK data | No |
For the US, BTS T-100 is the obvious delayed truth set. It includes carrier-level passengers, seats, departures, aircraft hours and load factor. Use it to backtest the proxy.
For Europe, Eurostat and airport operator data can help, but timeliness and consistency are more annoying.
The big flaws
1. You erase the airline signal inside each airport
If Airline A has 30% of seats at Heathrow and you allocate 30% of Heathrow passengers to Airline A, you are assuming Airline A has airport-average load factor.
That misses:
| Distortion | Example |
|---|---|
| Hub carrier effects | BA at LHR, Delta at ATL, American at DFW, United at DEN |
| ULCC vs legacy LF differences | Ryanair / Wizz / easyJet vs Lufthansa / BA / Air France |
| Long-haul vs short-haul mix | Widebody long-haul behaves differently from domestic or intra-Europe |
| Business vs leisure mix | Corporate-heavy routes can have weaker pax but stronger fares |
| Regional feed | American Eagle, SkyWest, Air Dolomiti, CityJet, etc. |
| Slot-constrained airports | LHR, AMS, FRA, MUC, CDG |
So the output is not:
BA load factor at Heathrow
It is more like:
BA’s Heathrow-weighted exposure to Heathrow’s airport-level demand pressure
That distinction matters.
2. Airport passenger definitions must match the seat denominator
You need to align passenger data with the OAG denominator.
| Passenger data type | Match with OAG denominator |
|---|---|
| Departing / enplaned passengers only | Departing seats only |
| Arriving + departing passengers | Arriving + departing seats |
| Terminal passengers | Usually arrivals + departures, sometimes excludes transit |
| TSA screened passengers | Mostly departing / originating screened passengers |
| Transfer passengers | Often treated differently by airport |
For load factor, the clean concept is:
Airport passenger totals are often airport throughput, not onboard segment passengers by airline.
3. Scheduled seats are not operated seats
OAG schedules are useful, but for load factor you want operated seats.
You should adjust scheduled seats using:
| Adjustment | Dataset |
|---|---|
| Cancelled flights | FlightAware, Flightradar24, OAG flight status |
| Aircraft swaps | Cirium Fleets Analyzer, OAG actual equipment, ADS-B tail matching |
| Wet lease / substitution | OAG operating carrier, Cirium, flight status |
| Regional / operator mapping | BTS T-100 in US, OAG operating carrier, parent mapping |
| Charter / irregular ops | FlightAware, FR24, airport movement data |
If fuel prices trigger cancellations, the denominator is exactly where the error lives.
Using raw schedules would overstate seats and make LF look artificially weak.
Step 2: TSA throughput nowcast
This is more interesting, but only if used carefully.
TSA throughput is useful for near-real-time US demand direction.
But TSA throughput is not the same as airport passengers.
It measures screened passengers, which is closer to:
Originating departing passengers plus some re-screened passengers
It is not total onboard segment passengers.
This creates a major hub problem.
| Airport type | TSA usefulness |
|---|---|
| Origin-heavy leisure airport | Good |
| Large connecting hub | Distorted |
| International gateway | Mixed |
| ULCC-heavy airport | Better |
| Legacy hub airport | Needs correction |
At ATL, DFW, CLT, DEN, IAH, ORD and similar hubs, many passengers connect airside and are not re-screened.
So TSA can understate total passenger flow relative to seats, especially for network airlines.
Better way to use TSA
Do not simply replace airport passengers with TSA passengers and allocate by total seat share.
Use TSA as a near-real-time local-origin demand factor.
Better formula
For US airports:
Then estimate:
Then map to carriers using originating exposure, not total seat share.
Where:
That local-origin-share adjustment is what separates a useful model from decorative nonsense.
What I would build
1. Backtest layer: US first
Use the US because BTS T-100 gives you delayed truth.
| Component | Source |
|---|---|
| Actual carrier-route passengers | BTS T-100 |
| Actual seats / operated capacity | BTS T-100, OAG, FlightAware, FR24 |
| Scheduled seats | OAG |
| TSA live throughput | TSA airport/checkpoint throughput |
| Airline reported LF / RPM / ASM | Company traffic releases / filings |
| Price / yield context | DB1B, ARC/BSP, fare scrapes, Cirium/OAG fare tools if licensed |
Train the model historically:
Use T-100 to evaluate whether your proxy predicts:
| Target | Usefulness |
|---|---|
| Airline LF YoY direction | High |
| Airline LF YoY magnitude | Medium |
| Monthly passenger growth | High |
| Reported RPM / RPK growth | Medium |
| Revenue surprise | Lower unless fare / yield data added |
2. Europe layer: airport scrape + ACI + national stats
For Europe, the live problem is harder because there is no TSA equivalent.
Use:
| Country / group | Useful sources |
|---|---|
| Spain | Aena monthly airport statistics |
| UK | CAA airport stats, Heathrow, Gatwick, Manchester releases |
| France | Groupe ADP, DGAC where available |
| Germany | Fraport monthly traffic, ADV airport statistics |
| Netherlands | Schiphol monthly traffic |
| Italy | Assaeroporti / airport operator releases |
| Pan-Europe | ACI Europe, Eurocontrol flights, OAG actuals |
Eurostat is useful for validation but too lagged for trading.
Eurocontrol is useful for flight activity but gives flights, not passengers.
How to improve the allocation
Instead of allocating airport passengers by total seat share, use a hierarchy.
Level 0: naïve model
Airport pax allocated by seat share.
Useful only as a benchmark.
Level 1: domestic / international split
Allocate domestic airport pax to domestic seats and international pax to international seats.
This is a large improvement.
Level 2: route-region split
Split airport pax into buckets:
| Bucket |
|---|
| Domestic |
| Intra-Europe / intra-US |
| Transatlantic |
| Middle East |
| Asia |
| LatAm |
| Leisure sun routes |
| Long-haul premium routes |
Then allocate passengers by airline seat share inside each bucket.
Level 3: origin vs connecting adjustment
For hubs, estimate local-origin share using:
| Source | Use |
|---|---|
| BTS DB1B | US O&D / local vs connecting calibration |
| MIDT / DDS / ARC / BSP | Better O&D shares, if available |
| Airline network structure | Approximate flow complexity |
| Minimum-connect-time graph / itinerary builder | Estimate plausible connecting flows |
| Historical T-100 vs airport pax | Infer airport-carrier correction factors |
Level 4: actual flown seats
Replace scheduled seats with operated seats.
This is critical during fuel-price disruption or cancellation waves.
Level 5: route-level model
Best version:
Then aggregate to airline.
Key biases by airline type
| Airline type | Your method likely does |
|---|---|
| Hub legacy airline | Understates or overstates depending on connecting share |
| ULCC / leisure-heavy airline | Works better because pax are more O&D-driven |
| Regional feeder | Messy due operating vs marketing carrier mapping |
| Slot-constrained flag carrier | May miss pricing/yield weakness if LF holds |
| Long-haul network carrier | Too blunt unless split by region and route |
| Ryanair / Wizz / easyJet | Better than for legacies, but secondary airport coverage matters |
Most useful output for investing
I would not call this:
Estimated load factor
I would call it:
Capacity-adjusted airport demand exposure index
For each airline, track:
| Signal | Meaning |
|---|---|
| Airport demand growth YoY | Are its airports seeing pax growth? |
| Seat growth YoY | Is it adding too much capacity into that demand? |
| Demand minus capacity | Proxy for LF pressure |
| Operated vs scheduled seat delta | Cancellation / capacity discipline |
| Hub-origin adjustment | Reduces TSA distortion |
| Confidence score | Based on coverage and historical backtest accuracy |
Main alpha angle:
Find airlines where demand proxies are deteriorating faster than capacity cuts, or where demand is resilient but the market is pricing broad sector weakness.
That is cleaner than a fake point-estimate LF with two decimal places.
Verdict
Your idea is worth doing, but change the framing.
| Question | Answer |
|---|---|
| Is airport actual pax × OAG seat share useful? | Yes, as an exposure proxy |
| Is it an airline LF model? | Not really |
| Is TSA useful for live US nowcasting? | Yes, especially airport-level TSA data |
| Can TSA directly estimate total airport passengers? | Only with calibration |
| Best first build? | US backtest using T-100 truth, then apply lessons to Europe |
| Best final signal? | YoY LF pressure / demand-capacity mismatch index, not absolute LF |
Build order
- US only first.
- Use BTS T-100 as truth.
- Build naïve airport allocation model.
- Add TSA airport/checkpoint throughput.
- Add actual flown seats from FlightAware / FR24 / OAG status.
- Add domestic/international and hub/origin adjustments.
- Backtest vs reported LF / T-100 LF.
- Port to Europe using ACI, Eurostat, airport operator data and Eurocontrol.
Final framing
The investable version is not:
Delta LF is 84.3% this month.
The investable version is:
Delta’s operated capacity is up X%, its airport-origin demand proxy is up only Y%, hub-adjusted LF pressure is deteriorating versus peers, and the historical backtest says this usually leads reported LF by Z weeks/months.
That is much closer to alpha.