Bottom line

This is a good first step, but it is not really a true airline load-factor estimate.

It is better described as an airport-demand exposure model.

Your proportional allocation method:

then:

algebraically becomes:

So it mostly says:

Is this airline exposed to airports where passenger demand is strong or weak relative to scheduled seats?

That is useful. But it does not recover airline-specific load factor inside each airport.


Assessment of Step 1: airport actual pax vs OAG seats

What works

This can be useful for YoY trend nowcasting, especially if the question is:

Is airline X exposed to airports where actual demand is running ahead of, or behind, scheduled capacity?

That is investable if used as a relative signal, not as a precise load-factor estimate.

Use caseVerdict
Detecting airport-level demand strength vs scheduled capacityGood
Ranking airlines by exposure to strong/weak airportsGood
Estimating airline LF YoY directionMaybe
Estimating absolute airline LFWeak
Estimating carrier-specific LF at hubsDangerous
Replacing reported LF / RPM / ASK / RPK dataNo

For the US, BTS T-100 is the obvious delayed truth set. It includes carrier-level passengers, seats, departures, aircraft hours and load factor. Use it to backtest the proxy.

For Europe, Eurostat and airport operator data can help, but timeliness and consistency are more annoying.


The big flaws

1. You erase the airline signal inside each airport

If Airline A has 30% of seats at Heathrow and you allocate 30% of Heathrow passengers to Airline A, you are assuming Airline A has airport-average load factor.

That misses:

DistortionExample
Hub carrier effectsBA at LHR, Delta at ATL, American at DFW, United at DEN
ULCC vs legacy LF differencesRyanair / Wizz / easyJet vs Lufthansa / BA / Air France
Long-haul vs short-haul mixWidebody long-haul behaves differently from domestic or intra-Europe
Business vs leisure mixCorporate-heavy routes can have weaker pax but stronger fares
Regional feedAmerican Eagle, SkyWest, Air Dolomiti, CityJet, etc.
Slot-constrained airportsLHR, AMS, FRA, MUC, CDG

So the output is not:

BA load factor at Heathrow

It is more like:

BA’s Heathrow-weighted exposure to Heathrow’s airport-level demand pressure

That distinction matters.


2. Airport passenger definitions must match the seat denominator

You need to align passenger data with the OAG denominator.

Passenger data typeMatch with OAG denominator
Departing / enplaned passengers onlyDeparting seats only
Arriving + departing passengersArriving + departing seats
Terminal passengersUsually arrivals + departures, sometimes excludes transit
TSA screened passengersMostly departing / originating screened passengers
Transfer passengersOften treated differently by airport

For load factor, the clean concept is:

Airport passenger totals are often airport throughput, not onboard segment passengers by airline.


3. Scheduled seats are not operated seats

OAG schedules are useful, but for load factor you want operated seats.

You should adjust scheduled seats using:

AdjustmentDataset
Cancelled flightsFlightAware, Flightradar24, OAG flight status
Aircraft swapsCirium Fleets Analyzer, OAG actual equipment, ADS-B tail matching
Wet lease / substitutionOAG operating carrier, Cirium, flight status
Regional / operator mappingBTS T-100 in US, OAG operating carrier, parent mapping
Charter / irregular opsFlightAware, FR24, airport movement data

If fuel prices trigger cancellations, the denominator is exactly where the error lives.

Using raw schedules would overstate seats and make LF look artificially weak.


Step 2: TSA throughput nowcast

This is more interesting, but only if used carefully.

TSA throughput is useful for near-real-time US demand direction.

But TSA throughput is not the same as airport passengers.

It measures screened passengers, which is closer to:

Originating departing passengers plus some re-screened passengers

It is not total onboard segment passengers.

This creates a major hub problem.

Airport typeTSA usefulness
Origin-heavy leisure airportGood
Large connecting hubDistorted
International gatewayMixed
ULCC-heavy airportBetter
Legacy hub airportNeeds correction

At ATL, DFW, CLT, DEN, IAH, ORD and similar hubs, many passengers connect airside and are not re-screened.

So TSA can understate total passenger flow relative to seats, especially for network airlines.


Better way to use TSA

Do not simply replace airport passengers with TSA passengers and allocate by total seat share.

Use TSA as a near-real-time local-origin demand factor.

Better formula

For US airports:

Then estimate:

Then map to carriers using originating exposure, not total seat share.

Where:

That local-origin-share adjustment is what separates a useful model from decorative nonsense.


What I would build

1. Backtest layer: US first

Use the US because BTS T-100 gives you delayed truth.

ComponentSource
Actual carrier-route passengersBTS T-100
Actual seats / operated capacityBTS T-100, OAG, FlightAware, FR24
Scheduled seatsOAG
TSA live throughputTSA airport/checkpoint throughput
Airline reported LF / RPM / ASMCompany traffic releases / filings
Price / yield contextDB1B, ARC/BSP, fare scrapes, Cirium/OAG fare tools if licensed

Train the model historically:

Use T-100 to evaluate whether your proxy predicts:

TargetUsefulness
Airline LF YoY directionHigh
Airline LF YoY magnitudeMedium
Monthly passenger growthHigh
Reported RPM / RPK growthMedium
Revenue surpriseLower unless fare / yield data added

2. Europe layer: airport scrape + ACI + national stats

For Europe, the live problem is harder because there is no TSA equivalent.

Use:

Country / groupUseful sources
SpainAena monthly airport statistics
UKCAA airport stats, Heathrow, Gatwick, Manchester releases
FranceGroupe ADP, DGAC where available
GermanyFraport monthly traffic, ADV airport statistics
NetherlandsSchiphol monthly traffic
ItalyAssaeroporti / airport operator releases
Pan-EuropeACI Europe, Eurocontrol flights, OAG actuals

Eurostat is useful for validation but too lagged for trading.

Eurocontrol is useful for flight activity but gives flights, not passengers.


How to improve the allocation

Instead of allocating airport passengers by total seat share, use a hierarchy.


Level 0: naïve model

Airport pax allocated by seat share.

Useful only as a benchmark.


Level 1: domestic / international split

Allocate domestic airport pax to domestic seats and international pax to international seats.

This is a large improvement.


Level 2: route-region split

Split airport pax into buckets:

Bucket
Domestic
Intra-Europe / intra-US
Transatlantic
Middle East
Asia
LatAm
Leisure sun routes
Long-haul premium routes

Then allocate passengers by airline seat share inside each bucket.


Level 3: origin vs connecting adjustment

For hubs, estimate local-origin share using:

SourceUse
BTS DB1BUS O&D / local vs connecting calibration
MIDT / DDS / ARC / BSPBetter O&D shares, if available
Airline network structureApproximate flow complexity
Minimum-connect-time graph / itinerary builderEstimate plausible connecting flows
Historical T-100 vs airport paxInfer airport-carrier correction factors

Level 4: actual flown seats

Replace scheduled seats with operated seats.

This is critical during fuel-price disruption or cancellation waves.


Level 5: route-level model

Best version:

Then aggregate to airline.


Key biases by airline type

Airline typeYour method likely does
Hub legacy airlineUnderstates or overstates depending on connecting share
ULCC / leisure-heavy airlineWorks better because pax are more O&D-driven
Regional feederMessy due operating vs marketing carrier mapping
Slot-constrained flag carrierMay miss pricing/yield weakness if LF holds
Long-haul network carrierToo blunt unless split by region and route
Ryanair / Wizz / easyJetBetter than for legacies, but secondary airport coverage matters

Most useful output for investing

I would not call this:

Estimated load factor

I would call it:

Capacity-adjusted airport demand exposure index

For each airline, track:

SignalMeaning
Airport demand growth YoYAre its airports seeing pax growth?
Seat growth YoYIs it adding too much capacity into that demand?
Demand minus capacityProxy for LF pressure
Operated vs scheduled seat deltaCancellation / capacity discipline
Hub-origin adjustmentReduces TSA distortion
Confidence scoreBased on coverage and historical backtest accuracy

Main alpha angle:

Find airlines where demand proxies are deteriorating faster than capacity cuts, or where demand is resilient but the market is pricing broad sector weakness.

That is cleaner than a fake point-estimate LF with two decimal places.


Verdict

Your idea is worth doing, but change the framing.

QuestionAnswer
Is airport actual pax × OAG seat share useful?Yes, as an exposure proxy
Is it an airline LF model?Not really
Is TSA useful for live US nowcasting?Yes, especially airport-level TSA data
Can TSA directly estimate total airport passengers?Only with calibration
Best first build?US backtest using T-100 truth, then apply lessons to Europe
Best final signal?YoY LF pressure / demand-capacity mismatch index, not absolute LF

Build order

  1. US only first.
  2. Use BTS T-100 as truth.
  3. Build naïve airport allocation model.
  4. Add TSA airport/checkpoint throughput.
  5. Add actual flown seats from FlightAware / FR24 / OAG status.
  6. Add domestic/international and hub/origin adjustments.
  7. Backtest vs reported LF / T-100 LF.
  8. Port to Europe using ACI, Eurostat, airport operator data and Eurocontrol.

Final framing

The investable version is not:

Delta LF is 84.3% this month.

The investable version is:

Delta’s operated capacity is up X%, its airport-origin demand proxy is up only Y%, hub-adjusted LF pressure is deteriorating versus peers, and the historical backtest says this usually leads reported LF by Z weeks/months.

That is much closer to alpha.