Replica's weekly economic model combines publicly available data and vendor data to produce three customer-facing tables: Weekly Spend (by home location), Weekly Spend (by merchant location), and Weekly Spend County-to-County Flows.
The publicly available data sources used are:
- US Census Advance Monthly Retail Trade Survey - released monthly, but on a 3-4 month lag to reality.
- US Census Quarterly E-Commerce Report - released monthly, but on a 3-4 month lag to reality.
- Bureau of Economic Analysis Consumer Spending by State- updated annually, around October with the previous year's data
- American Community Survey 5-Year Population Estimates
The vendor data sources used are:
- Merchant transaction data: All in-person credit and debit card transactions for a subset of merchants. This data includes 60-70 million transactions per day.
- Cardholder transaction data: All credit and debit card transactions for a subset of cards issued, including both in-person and e-commerce transactions. This data includes 8-10 million transactions per day.
- Visit counts: Foot traffic data for a large set of merchants/points of interest. This data includes approximately 3 million stores where spending activity occurs.
These data sources are then used for modeling and calibration. First we create synthetic ground truth, and estimate spend by home geography and spend per merchant. Then, we calibrate home spend and merchant spend. Read about these steps in more detail below.
There are 5 main steps in the weekly economic model. Below, a diagram shows each of these steps.
1. Synthetic ground truth generation
The publicly available Ground Truth sources we use are US Census bureau retail sales (both total and e-commerce) and Bureau of Economic Analysis (BEA) consumer spend tables. The census totals are provided at a nationwide-monthly cadence for total sales, and a quarterly cadence for e-commerce sales. However, they are typically 3-4 months behind the current date. On the other hand, BEA typically releases state-wide annual totals for a year during the following October.
To create up-to-date “synthetic” statewide monthly totals, we first forecast nationwide monthly totals using an auto-regressive time-series model. The model uses the latest 3 months of Census bureau data available, and use features for the forecasted months in previous years to account for seasonality. Then, we use the latest state-wide annual totals from the BEA to apportion the forecasted nationwide monthly spend by state.
2. Spend by home geography estimator
In the spend by home geography estimator, we use the cardholder transactions dataset that includes both in-person and e-commerce transactions on a subset of cards. Since we only receive a subset of cardholders across the USA at any given point in time (about 10-15M), we first estimate spend per person and then use population data to scale that to a total spend per tract. For example, suppose we have cardholder data for 30 people in a tract and the total population is of the tract is 3000. We would then scale spend in the tract by 100. With this approach, we assume that spending behavior is relatively consistent within each tract.
Then, we aggregate this estimator to the PUMA granularity, ensuring sufficient quality given relatively low sample size.
3. Merchant spend estimator
The merchant transactions dataset includes many more transactions (60-70M per date) but only includes a subset of merchants. On the other hand, we have visit count/foot traffic data from an extremely large (approximately comprehensive) set of merchants/POIs. To take advantage of this we use a semi-supervised regression model where we predict the spend per visit for each merchant/POI where the labelled data includes the merchants/POIs in our merchant transactions data, and the unlabelled data includes all merchants/POIs not in our transactions data but in our points of interest data. We then run a standard regression model where the features include population features for the geography of the merchant and merchant features such as name, major brands, e.t.c. This generates a spend per visit estimate for each merchant/POI. To calculate total spend, we multiply the spend per visit by number of visits from the visits source data.
4. Home spend calibration
In computing Spend by Home Geography, there is bias as cardholders included in the panel are not a random selection. To correct for this bias, we use the statewide monthly synthetic ground truth to compute scaling factors based on the previous 12 months of data. More concretely, we add up the last 12 months from the synthetic ground truth and the last 12 months from the home geography estimator and take a ratio to compute the scaling factor. The scaling factor depends on the industry but is typically in the range of 1-5.
5. Merchant spend calibration
Finally, to account for biases in the data, we again compute a scaling factor for each PUMA based on the in-person spend from the calibrated home spend estimator. Hence the total spend for each merchant/POI is scaled up to ensure that the in-person spend totals from the calibrated home spend data are met. This also helps account for non-credit card transactions.
These modeling and calibration steps produce two tables: (1) Spend by home location, including in-person and e-commerce transactions, by tract, and (2) In-person spend per merchant and home tract. These two tables are used to create the three customer-facing tables Spend by Home Location, Spend by Merchant Location, and Spend County-to-County Flows.
The graphs below compare Replica outputs to Census monthly retail sales data across four spending categories: 1) Grocery Stores, 2) Gas Stations, Parking, Taxis and Tolls, 3) Retail, 4) Restaurants and Bars.
Nationwide Spend at Grocery Stores
Nationwide Spend on Gas Stations, Parking, Taxis, Tolls
Nationwide Spend on Retail
Nationwide Spend at Restaurants and Bars
The graphs below compare Replica outputs to the Census quarterly e-commerce report across two spending categories: 1) Grocery Stores 2) Retail.
Nationwide E-commerce Spend on Grocery Stores
Nationwide E-commerce Spend on Retail
Detail about NAICS code granularity
Currently, Replica provides spend data for six industry categories:
- Grocery Stores
- Restaurants and Bars
- Gas Stations, Parking, Taxis and Tolls
- Airline, Hospitality, and Car Rental
- Entertainment and Recreation
However, the underlying modeling is performed for finer-grained categories more similar to NAICS-3 codes. The results are then aggregated to create the 6 industry categories above. The underlying modeling is performed for the following categories:
|447||GAS_STATIONS||Gas Stations, Parking, Taxis, and Tolls|
|481||AIRLINE||Airline, Hospitality, and Car rental|
|4853||TAXIS||Gas Stations, Parking, Taxis, and Tolls|
|5321||CAR RENTAL||Airline, Hospitality, and Car rental|
|71||ENTERTAINMENT_RECREATION||Entertainment and Recreation|
|721||HOSPITALITY||Airline, Hospitality, and Car rental|
|722||RESTAURANT_BARS||Restaurants and Bars|
|81293||PARKING||Gas Stations, Parking, Taxis, and Tolls|
Updated 5 months ago