Annual Average Daily Traffic (AADT)

Overview

The annual average daily traffic (AADT) table contains information about almost all major roads. For motorways and trunks, Replica also provides single-truck volume and combination truck volume [1]. Data is available for 2019, 2021, 2022, 2023, and, 2024 and is updated annually. The data is available for download through the Replica application in CSV, Geojson, and Shapefile formats.

To produce AADT, Replica leverages data about hundreds of millions of trips per week, in combination with network link volume ground truth. Read the full methodology here.

See what coverage we have in specific areas here. The coverage values in the table below are calculated based on the entire OSM network, which is highly detailed containing even service alleys. The denominator is all links by highway type in the OSM network.

Highway	% AADT Coverage in 2024 Data	% Single-Unit and Combination Truck Coverage in 2024 Data
Motorway	84%	83%
Trunk	59%	58%
Primary	52%
Secondary	48%
Tertiary	40%
Residential	18%
Other	12%

Sample Download

Click here to download a sample of Replica's AADT table.

Schema

Field Name	Content Type	Sample Value	Description
id	String	972382137_0+	A unique identifier for the network link. The trailing +/- indicates directionality. + indicates the same direction of the OSM way.
stable_edge_id	String	2171086143316490127	The matching stable edge id (mapping to Replica's road network) for this segment. This id will match the Places season for Fall of the selected year.
comp_stable_edge_id	String	13608319450172218181	If bidirectional is True for this segment, this field provides the matching stable edge id (mapping to Replica's road network) for the opposing direction of travel. This id will match the Places season for Fall of the selected year.
osm_id	String	155000229	OSM Way ID (version 2020-07-01 for the 2019 data; version 2021-09-01 for the 2021 data; version 2022-12-01 for the 2022 data)
bidirectional	Boolean	TRUE	True if the provided traffic volume is the sum for both directions of an undivided roadway. False if for only one direction.
street_name	String	Isabel Avenue	The common name of the network link if available. Matches the name assigned by OpenStreetMap.
highway	String	trunk	The classification of the link based on OpenStreetMap data.
length	Float	229.22	The distance (length) of the network link in meters.
heading	Integer	176	The heading of the network link.
compass_direction	String	NE	The compass direction of the network link.
aadt	Integer	494	The annual average daily traffic volume of the network link. This value is inclusive of single-unit and combination truck volumes.
aadt_single_unit	Integer	54	The annual average daily traffic volume of single unit trucks on the network link. Populated for FRC 1 and 2 roads, null otherwise.
aadt_combination	Integer	44	The annual average daily traffic volume of combination trucks on the network link. Populated for FRC 1 and 2 roads, null otherwise.
geometry	Geography	LINESTRING(-97.398509 27.662232, -97.398556 27.662034, -97.398517 27.661988)	The geometry (linestring) for each network link.

Quality Metrics

Validation Against Test Set

When producing AADT, we exclude a random 10% of the HPMS ground truth data as a "test set" - this data is not used either to train the model's weights, nor to tune its hyperparameters. Rather, this is data the model has not "seen before" and allows us to quantify model performance. Below are tables and charts summarizing the model’s performance on the test data available for 2019 and 2021, nationwide.

These summary statistics indicate the model's predictions generalize quite well. Across roads of FRC 1-4, expected error for total AADT in 2021 is 12.2%. Expected error for single-unit and combination truck volumes are 17.7% and 22.1%, respectively. We also report the signed median percent error: this statistic tells us if the model is consistently over- or under-predicting. For the total counts, this number is close to 0 for all roads, indicating that the model outputs an unbiased estimate.

2019

Observed Count (2019, x-axis) vs. Replica AADT (2019, y-axis) Nationwide

The black line is the 45-degree angle line. — The black line is the line where y=x. The R-squared value is 0.97.

Classification	FRC	Median Percent Error	Signed Median Percent Error

Total	1-4	12.9%	-0.9%
Total	1	9.1%	1.0%
Total	2	9.1%	0.0%
Total	3	12.3%	-0.8%
Total	4	17.9%	-4.2%

Single-Unit Truck	1-2	17.1%	1.4%
Single-Unit Truck	1	17.3%	2.2%
Single-Unit Truck	2	17.0%	0.3%

Combination Truck	1-2	19.9%	0.8%
Combination Truck	1	17.0%	1.0%
Combination Truck	2	22.0%	-0.1%

2021

Observed Count (2021, x-axis) vs. Replica AADT (2021, y-axis) Nationwide

The black line is the 45-degree angle line. The R-squared value is 0.94. — The black line is the line where y=x. The R-squared value is 0.98.

Classification	FRC	Median Percent Error	Signed Median Percent Error

Total	1-4	12.2%	-0.0%
Total	1	9.5%	1.5%
Total	2	9.8%	0.0%
Total	3	11.6%	-0.3%
Total	4	15.4%	-1.1%

Single-Unit Truck	1-2	17.7%	1.0%
Single-Unit Truck	1	18.0%	2.0%
Single-Unit Truck	2	17.1%	0.3%

Combination Truck	1-2	22.1%	0.5%
Combination Truck	1	19.3%	1.7%
Combination Truck	2	24.1%	0.8%

Benchmarking

Comparison of Replica 2022 AADT to 2022 Arizona ground truth

We obtained 2022 ground truth for a subset of links in Arizona from the Arizona DOT. This Ground Truth was not used in training or validation of the model.

This map plots the locations of the links used for comparison.

Observed Count (2022, x-axis) vs. Replica AADT (2022, y-axis) for Comparison Links in Arizona

The black line is the 45-degree angle line, while the red line is the line of best fit. The R-squared value is 0.98. — The black line is the line where y=x, while the red line is the line of best fit. The R-squared value is 0.98.

Methodology

Summary

The Replica AADT dataset is derived from 1) vehicle trajectories from connected vehicle and location-based services data and 2) ground truth network link volumes. We use the ground truth network link volumes to infer the “penetration rate” - the probability of any particular trip appearing in the input dataset of connected vehicle and LBS data.

We specify a model such that the penetration rate is defined on the basis of an Origin-Destination (OD) matrix; the entries in this OD matrix are then learned by comparing to ground truth network link volumes from prior years’ ground truth data, in a process analogous to Origin-Destination Matrix Estimation (ODME) problems. We do this for the total AADT, single-unit truck volumes and combination truck volumes, and also produce a time-dependent dataset—the average hourly volumes for each day of a typical week in the year. Read more about that dataset here.

The model's performance is assessed using an excluded test set of ground truth data, which the model has never seen. This comparison indicates the model's predictions generalize well, especially on major roads with low Functional Road Classifications (FRCs) [1].

Detailed Methodology

There are two inputs to the Replica AADT model:

A large panel of pre-processed vehicle counts derived from connected vehicle data. We also use trajectories from connected trucks with high-level vehicle classification - medium trucks and heavy trucks. Both panels contains information about hundreds of millions of trips per week, and constitutes more than 30% of all trips that happen.
1. Each trajectory consists of an origin and a destination location, the sequence of traversed network links between them (indexed by OSM segment IDs), and a corresponding start and end time for each network link as well as the overall trip.
2. High-level vehicle classification data—private auto, light or medium truck, or heavy truck—is also provided per trip.
A set of ground truth AADT values, derived from HPMS data, conflated to the same representation of the roadway network. These AADT values are broken out by vehicle classification—total count, single-unit truck, and combination truck. We use the most up-to-date HPMS data available: 2023. [2]

We use the ground truth data to infer the “penetration rate” of the trajectories panel. The penetration rate is what fraction of actual trips are represented in the panel. In the initial iteration of the Replica AADT product, the penetration rate was defined as a function of the properties of the link. In this iteration, for overall AADT, the penetration rate is allowed to vary as a function of the H3 cells containing the network link. For truck aadt, the penetration rate is allowed to vary as a function of the H3 cells containing each trip’s origin and destination, as well as with vehicle classification.

This change allows the model to better capture the variation in the penetration rate in the real world. Not all vehicles are equally likely to be represented in the trajectories panel: for example, late model cars with in-dash GPS systems are much more likely to be represented than are older, non-connected cars. Furthermore, the travel patterns of the drivers of these vehicles will differ as well. Presumably, someone commuting from a wealthy suburb into the central business district is much more likely to do so in a car represented in our panel than is someone commuting from a less wealthy part of the city out to an industrial or agricultural area. Both of these drivers may, however, traverse some of the same network links for part of their journey—say, the beltway around the city core. This new method allows the imputed penetration rate to differ for these two trips, meaning that each trip’s contribution to the modeled AADT on each link they traverse —even the links they have in common -- can and in principle will differ.

We construct a sparse incidence matrix with dimensions [number of trips in the panel] x [number of ground truth network links]: each entry in the matrix is 1 if the given trip traverses the given network link, and 0 otherwise. We then use a Lasso regression to find a vector of weights - scaling factors defined on a per-trip basis - such that the sum of the weights of all observed trips crossing a given network link is equal to the appropriately scaled ground truth count on that network link, for all network links with ground truth counts. The Lasso regression is shifted such that the loss function penalizes departures of each weight from unity, with the ones vector as a lower bound for the admitted solutions. The regularization is parameterized by a coefficient which, when greater, encourages solutions with more similar weights rather than more different weights.[3] We use a validation set, a random 10% of the HPMS ground truth data that was not used for model training, to tune this coefficient.

The weights determined in the above procedure are defined on a per-trip basis. In order to generalize the model to a time period other than the one on which it was trained, we do two things:

We build a penetration rate table based on these weights aggregated, via averaging, to the level 5 H3 cells containing the trip Os and Ds. This creates a matrix yielding a scaling factor for each O and D pair.
1. If a trip in the prediction time period is observed which does not correspond to one of the OD pairs for which a trip was observed in the training time period, we have a system of successive “fall back” scaling factors in place to ensure that some factor is applied to this trip.
As more connected vehicles enter the market, we expect the overall penetration rate to increase with time. On the other hand, sometimes contractual agreements expire and we lose access to some data for certain segments of the connected vehicle market. We therefore must control for the overall expansion or contraction of the panel between the training and prediction periods. For each state, we compare the aggregate predictions for FRC = 1 roadways between the INRIX volume profiles product and our own to compute a single overall scaling factor by which to adjust all predictions.
1. This separates OD-level variation from the overall penetration rate. The guiding assumption is that the geographic variation learned in the training time period is meaningful relatively (as in, this particular OD pair has a higher penetration rate than that particular OD pair) but may require minor adjustments before it is used absolutely.
2. This adjustment is performed by comparing the “total” AADT values between our model and INRIX’s, and then applying the same scaling factor to all vehicle classifications. This assumes that our model produces a reliable freight fraction.

Once this is done, inference using the model is straightforward: every trip in the prediction time period is simply assigned a scaling factor according to its vehicle classification, the H3 cells containing its O and D, and an overall adjustment to control for the panel size. When aggregated at the daily level and averaged, this produces a by-vehicle classification model for each network link’s AADT, nationwide. By aggregating at the hourly level instead, we can also produce an estimate for the typical volumes on a per-day of week, per-hour of day basis.

Appendix

[1] Functional Road Classes describe how major a road is. FRC 1: Highways and major interstates. FRC 2: Major artery. FRC 3: Major road. FRC 4 and higher: Neighborhood Streets. Single-Unit Trucks are defined as FHWA Vehicle Classes 5-7. Combination Trucks are defined as FWHA Vehicle Class 8 or greater.

[2] States for which we only have 2019 HPMS ground truth data: Washington, Oregon, Idaho, Nevada, Utah, Arizona, New Mexico, South Dakota, Wisconsin, Mississippi, Tennessee, South Carolina, West Virginia, Virginia, Pennsylvania, Vermont, Rhode Island, Maine, Alaska. For all other states, there is HPMS ground truth data for both 2019 and 2021.

[3] A strongly regularized formulation would be more likely to give equal scaling factors to the two trips mentioned, whereas a more weakly regularized formulation would attempt to satisfy the ground truth constraint however possible. In such massively underdetermined systems like these, totally un-regularized least squares will match the training data exactly and is less likely to generalize.