Airline Load Factor Nowcast Using Airport Actuals, OAG Schedules and TSA Data

Bottom line

This is a good first step, but it is not really a true airline load-factor estimate.

It is better described as an airport-demand exposure model.

Your proportional allocation method:

Pax_{airline, airport, month} = ActualPax_{airport, month} \times \frac{ScheduledSeats _{airline, airport, month}}{ScheduledSeats _{airport, month}}

then:

LF_{airline} = \frac{\sum Pax _{airline, airport}}{\sum ScheduledSeats _{airline, airport}}

algebraically becomes:

LF_{airline} = airport \sum Weight_{airline, airport} \times AirportLF_{airport}

So it mostly says:

Is this airline exposed to airports where passenger demand is strong or weak relative to scheduled seats?

That is useful. But it does not recover airline-specific load factor inside each airport.

Assessment of Step 1: airport actual pax vs OAG seats

What works

This can be useful for YoY trend nowcasting, especially if the question is:

Is airline X exposed to airports where actual demand is running ahead of, or behind, scheduled capacity?

That is investable if used as a relative signal, not as a precise load-factor estimate.

Use case	Verdict
Detecting airport-level demand strength vs scheduled capacity	Good
Ranking airlines by exposure to strong/weak airports	Good
Estimating airline LF YoY direction	Maybe
Estimating absolute airline LF	Weak
Estimating carrier-specific LF at hubs	Dangerous
Replacing reported LF / RPM / ASK / RPK data	No

For the US, BTS T-100 is the obvious delayed truth set. It includes carrier-level passengers, seats, departures, aircraft hours and load factor. Use it to backtest the proxy.

For Europe, Eurostat and airport operator data can help, but timeliness and consistency are more annoying.

The big flaws

1. You erase the airline signal inside each airport

If Airline A has 30% of seats at Heathrow and you allocate 30% of Heathrow passengers to Airline A, you are assuming Airline A has airport-average load factor.

That misses:

Distortion	Example
Hub carrier effects	BA at LHR, Delta at ATL, American at DFW, United at DEN
ULCC vs legacy LF differences	Ryanair / Wizz / easyJet vs Lufthansa / BA / Air France
Long-haul vs short-haul mix	Widebody long-haul behaves differently from domestic or intra-Europe
Business vs leisure mix	Corporate-heavy routes can have weaker pax but stronger fares
Regional feed	American Eagle, SkyWest, Air Dolomiti, CityJet, etc.
Slot-constrained airports	LHR, AMS, FRA, MUC, CDG

So the output is not:

BA load factor at Heathrow

It is more like:

BA’s Heathrow-weighted exposure to Heathrow’s airport-level demand pressure

That distinction matters.

2. Airport passenger definitions must match the seat denominator

You need to align passenger data with the OAG denominator.

Passenger data type	Match with OAG denominator
Departing / enplaned passengers only	Departing seats only
Arriving + departing passengers	Arriving + departing seats
Terminal passengers	Usually arrivals + departures, sometimes excludes transit
TSA screened passengers	Mostly departing / originating screened passengers
Transfer passengers	Often treated differently by airport

For load factor, the clean concept is:

LF = \frac{Onboard segment passengers}{Operated seats}

Airport passenger totals are often airport throughput, not onboard segment passengers by airline.

3. Scheduled seats are not operated seats

OAG schedules are useful, but for load factor you want operated seats.

You should adjust scheduled seats using:

Adjustment	Dataset
Cancelled flights	FlightAware, Flightradar24, OAG flight status
Aircraft swaps	Cirium Fleets Analyzer, OAG actual equipment, ADS-B tail matching
Wet lease / substitution	OAG operating carrier, Cirium, flight status
Regional / operator mapping	BTS T-100 in US, OAG operating carrier, parent mapping
Charter / irregular ops	FlightAware, FR24, airport movement data

If fuel prices trigger cancellations, the denominator is exactly where the error lives.

Using raw schedules would overstate seats and make LF look artificially weak.

Step 2: TSA throughput nowcast

This is more interesting, but only if used carefully.

TSA throughput is useful for near-real-time US demand direction.

But TSA throughput is not the same as airport passengers.

It measures screened passengers, which is closer to:

Originating departing passengers plus some re-screened passengers

It is not total onboard segment passengers.

This creates a major hub problem.

Airport type	TSA usefulness
Origin-heavy leisure airport	Good
Large connecting hub	Distorted
International gateway	Mixed
ULCC-heavy airport	Better
Legacy hub airport	Needs correction

At ATL, DFW, CLT, DEN, IAH, ORD and similar hubs, many passengers connect airside and are not re-screened.

So TSA can understate total passenger flow relative to seats, especially for network airlines.

Better way to use TSA

Do not simply replace airport passengers with TSA passengers and allocate by total seat share.

Use TSA as a near-real-time local-origin demand factor.

Better formula

For US airports:

TSA Growth_{a, m} = \frac{TSA _{a, m}}{TSA _{a, m - 12}} - 1

Then estimate:

AirportDemandGrowth_{a, m} = α_{a} + β_{a} \times TSA Growth_{a, m} + Holiday/CalendarControls

Then map to carriers using originating exposure, not total seat share.

CarrierDemandProxy_{i, m} = a \sum OriginatingSeatWeight_{i, a, m} \times AdjustedAirportDemandGrowth_{a, m}

Where:

OriginatingSeatWeight = Seats_{i, a, m} \times EstimatedLocalOriginShare_{i, a}

That local-origin-share adjustment is what separates a useful model from decorative nonsense.

What I would build

1. Backtest layer: US first

Use the US because BTS T-100 gives you delayed truth.

Component	Source
Actual carrier-route passengers	BTS T-100
Actual seats / operated capacity	BTS T-100, OAG, FlightAware, FR24
Scheduled seats	OAG
TSA live throughput	TSA airport/checkpoint throughput
Airline reported LF / RPM / ASM	Company traffic releases / filings
Price / yield context	DB1B, ARC/BSP, fare scrapes, Cirium/OAG fare tools if licensed

Train the model historically:

ActualLF_{carrier, month} \sim AirportDemandPressure + CapacityGrowth + RouteMix + HubExposure + CancellationRate + HolidayTiming

Use T-100 to evaluate whether your proxy predicts:

Target	Usefulness
Airline LF YoY direction	High
Airline LF YoY magnitude	Medium
Monthly passenger growth	High
Reported RPM / RPK growth	Medium
Revenue surprise	Lower unless fare / yield data added

2. Europe layer: airport scrape + ACI + national stats

For Europe, the live problem is harder because there is no TSA equivalent.

Use:

Country / group	Useful sources
Spain	Aena monthly airport statistics
UK	CAA airport stats, Heathrow, Gatwick, Manchester releases
France	Groupe ADP, DGAC where available
Germany	Fraport monthly traffic, ADV airport statistics
Netherlands	Schiphol monthly traffic
Italy	Assaeroporti / airport operator releases
Pan-Europe	ACI Europe, Eurocontrol flights, OAG actuals

Eurostat is useful for validation but too lagged for trading.

Eurocontrol is useful for flight activity but gives flights, not passengers.

How to improve the allocation

Instead of allocating airport passengers by total seat share, use a hierarchy.

Level 0: naïve model

Airport pax allocated by seat share.

Useful only as a benchmark.

Level 1: domestic / international split

Allocate domestic airport pax to domestic seats and international pax to international seats.

This is a large improvement.

Level 2: route-region split

Split airport pax into buckets:

Bucket
Domestic
Intra-Europe / intra-US
Transatlantic
Middle East
Asia
LatAm
Leisure sun routes
Long-haul premium routes

Then allocate passengers by airline seat share inside each bucket.

Level 3: origin vs connecting adjustment

For hubs, estimate local-origin share using:

Source	Use
BTS DB1B	US O&D / local vs connecting calibration
MIDT / DDS / ARC / BSP	Better O&D shares, if available
Airline network structure	Approximate flow complexity
Minimum-connect-time graph / itinerary builder	Estimate plausible connecting flows
Historical T-100 vs airport pax	Infer airport-carrier correction factors

Level 4: actual flown seats

Replace scheduled seats with operated seats.

This is critical during fuel-price disruption or cancellation waves.

Level 5: route-level model

Best version:

LF_{carrier, route, month} = HistoricalLF_{carrier, route, season} + DemandShock_{airport/region} - CapacityShock_{carrier, route} + Fare/PricingSignal + CancellationAdjustment

Then aggregate to airline.

Key biases by airline type

Airline type	Your method likely does
Hub legacy airline	Understates or overstates depending on connecting share
ULCC / leisure-heavy airline	Works better because pax are more O&D-driven
Regional feeder	Messy due operating vs marketing carrier mapping
Slot-constrained flag carrier	May miss pricing/yield weakness if LF holds
Long-haul network carrier	Too blunt unless split by region and route
Ryanair / Wizz / easyJet	Better than for legacies, but secondary airport coverage matters

Most useful output for investing

I would not call this:

Estimated load factor

I would call it:

Capacity-adjusted airport demand exposure index

For each airline, track:

Signal	Meaning
Airport demand growth YoY	Are its airports seeing pax growth?
Seat growth YoY	Is it adding too much capacity into that demand?
Demand minus capacity	Proxy for LF pressure
Operated vs scheduled seat delta	Cancellation / capacity discipline
Hub-origin adjustment	Reduces TSA distortion
Confidence score	Based on coverage and historical backtest accuracy

Main alpha angle:

Find airlines where demand proxies are deteriorating faster than capacity cuts, or where demand is resilient but the market is pricing broad sector weakness.

That is cleaner than a fake point-estimate LF with two decimal places.

Verdict

Your idea is worth doing, but change the framing.

Question	Answer
Is airport actual pax × OAG seat share useful?	Yes, as an exposure proxy
Is it an airline LF model?	Not really
Is TSA useful for live US nowcasting?	Yes, especially airport-level TSA data
Can TSA directly estimate total airport passengers?	Only with calibration
Best first build?	US backtest using T-100 truth, then apply lessons to Europe
Best final signal?	YoY LF pressure / demand-capacity mismatch index, not absolute LF

Build order

US only first.
Use BTS T-100 as truth.
Build naïve airport allocation model.
Add TSA airport/checkpoint throughput.
Add actual flown seats from FlightAware / FR24 / OAG status.
Add domestic/international and hub/origin adjustments.
Backtest vs reported LF / T-100 LF.
Port to Europe using ACI, Eurostat, airport operator data and Eurocontrol.

Final framing

The investable version is not:

Delta LF is 84.3% this month.

The investable version is:

Delta’s operated capacity is up X%, its airport-origin demand proxy is up only Y%, hub-adjusted LF pressure is deteriorating versus peers, and the historical backtest says this usually leads reported LF by Z weeks/months.

That is much closer to alpha.

notes

Explorer