Replica is a data platform for the built environment. By combining powerful data insights with an uncompromising approach to privacy, Replica provides a holistic view into the ways mobility, land use, and economic activity intersect. Our approach to delivering insights to our customers is rooted in using a composite of data sources to do advanced modeling and simulation of activity across time and space.
At Replica, we understand that data is only valuable when you can trust it to inform analysis and decision-making. To that end, this document outlines data sources, data processing methods, and data outputs for Replica's mobility model which is the basis of Replica's products, to help our customers evaluate the quality and accuracy of our models, and assess data privacy implications.
Replica's seasonal trips tables and demographic and employment tables (Places) are created using high-fidelity activity-based travel models that simulate the movements of residents, visitors, and commercial vehicles in a given area. Replica produces models at the “megaregions” level, most of which cover between 10 and 50 million people and multiple states, for a typical weekday and typical weekend day in a given season. Data outputs can be queried down to the network link level.
Replica's weekly O-D pairs and VMT tables (Trends) is a nationwide activity-based model updated each week with near-real-time data on mobility, consumer spending, and land use. Replica's weekly tables have census-tract-level fidelity with mobility data including origins and destinations, trip mode, and residential vehicle miles traveled (VMT).
Replica Scenario is the first tool that allows anyone working with the built environment to easily forecast travel activities anywhere in the country. With Scenario, public agencies and their consultants can obtain high-quality, detailed data projections of future conditions based on expected changes to the population, land use, and transportation infrastructure.
Replica generates its data by running large-scale, computationally intensive simulations. These simulations allow us to deliver granular data outputs that match behavior in aggregate, but don’t surface the actual movements (or compromise the privacy) of any one individual.
Rather than simply cleansing, normalizing, and scaling individual data sources, Replica uses a composite of data sources to:
- Create a synthetic population that matches the characteristics of a given region
- Train a number of behavior models specific to that region
- Run simulations of those behavior models applied to the synthetic population in order to create a “replica” of transportation and economic patterns
- Calibrate the outputs of the model against observed “ground-truth” to improve quality
In our data outputs, origin-destination pairs are consistent with human activities. Population demographics are accurate and correlate with appropriate movement. Recurring activities are coherent over time and capture a pattern of life. Routing between locations is consistent with local road networks and transportation options, and the scale of population and number of trips is appropriate for a given geographic extent.
In the following document, we outline our sources, methodology, and outputs, as well as detail regarding our uncompromising approach to protecting individual privacy.
Replica builds its simulations using a diverse set of third-party data from public and private-sector sources. These sources include five categories of data:
- Mobile location data: To create a representative sample of daily movement patterns within a place, Replica uses multiple types of mobile location data as inputs to our model – location-based services (LBS) data collected from personal mobile devices; vehicle in-dash GPS data; and point-of-interest aggregates. Previous versions of Replica’s model also included cellular networks data as another source of mobile location data. Replica only acquires de-identified mobile location data.
- Consumer/resident data: Demographic data from public and private sources provides the basis for determining where people live and work, and the characteristics of the population, such as age, race, income, and employment status.
- Built environment data: Land use data (such as zoning regulations), building data (such as total square footage and use types), and transportation network data (such as road and transit networks) are used to determine where people live, work, and shop, and by what means it is possible to travel to each activity.
- Economic activity data: Includes all transactions, including credit card, debit card, and cash transactions, that take place at a point of sale. With this input, Replica depicts the level and types of spending that occurred at a particular time and place.
- Ground truth data: Ground truth data is used to calibrate and improve the overall accuracy of Replica outputs. The types of ground truth collected by Replica include auto and freight volumes, transit ridership, and bike and pedestrian counts. Ground truth is both acquired directly by Replica and provided by customers.
Each of Replica’s data processing pipelines leverages a composite of these diverse data sets. This process minimizes the risk of sampling bias that exists in any single source on its own. For example, a product that relies more heavily on data from personal mobile devices risks failing to adequately simulate the portions of the population that do not have mobile devices or those who opt out of device tracking technologies. Our composite approach also creates resiliency against data quality issues and protects against disruptions of individual data sources.
Replica’s process to generate its seasonal trips and demographics and employment tables (Replica Places) is best described in four steps:
Step 1: Create Synthetic Population
Every season, Replica generates a nationwide synthetic population, statistically equivalent to the actual population, for the entirety of the United States. Replica creates a synthetic population in order to overcome the limitations of census data, which is only provided at the aggregate level. Synthetic populations allow Replica to assign attributes to individuals and households while protecting privacy and preserving spatial fidelity.
The synthetic population is generated using census and consumer marketing data. Replica applies data science techniques to this data that allow for: 1) modeling the dependencies in socio-demographic parameters and structure of the households, and 2) generating individual households that match census information at the required level of aggregation, such as block groups or tracts.
Each synthetic household consists of people with an assigned set of attributes: age, sex, race, ethnicity, employment status, household income, vehicle ownership status, and resident or visitor status. Workplace locations for all employed individuals are assigned based on the combination of mobile location data aggregates, census, and census land use information. These assignments are static in each seasonal model, but can and do change across seasons.
To begin each specific Places deployment, the population relevant for the specific megaregion and season is extracted from the nationwide population.
Step 2: Create Mobility Model
Modern machine learning techniques are then used to develop travel personas. Personas are based on the composite of mobile location data for the megaregion and specific season. Personas are an extraction of behavioral patterns from individual devices that live in, work in, travel to, travel from, or pass through a specific region during the modeled season. Each persona is composed of three underlying behavioral-choice models: activity planning and sequencing (e.g., at home -> drive to work -> at work -> drive to shop -> at shop -> drive to home), destination location choice (i.e., the exact location people are traveling to and from), and travel mode (i.e., the chosen mode).
Replica’s mobile-location data represents anywhere from 5% to 20% of a local population. Replica intentionally only acquires what data is necessary to build statistically representative models, another tenet of balancing model fidelity with user privacy.
Step 3: Generate Activity
To simulate activity, the outputs from Step 1 and Step 2 are joined. Each synthetic household is assigned one or more personas using home and work locations as a primary input, enhanced with matching by available socio-demographic attributes and by the role of the person in a household. In effect, with travel behavior models assigned, each synthetic person can now make choices about when, where, and how to travel.
Replica uses three models to assign movements to the individuals in the synthetic population. The activity sequence model determines the activities of a person’s day, including recurring activities (e.g., travel to work, school drop off), and one-time activities (e.g., shopping, visiting a restaurant, social visit to a friend’s residence). The location choice model determines the specific location of each discretionary activity (e.g., what restaurant is chosen for lunch, where grocery shopping gets done), assigning a location at the point-of-interest level. The mode choice model determines how the trip will be made based on the state of the transportation network, accounting for available transit options and multiple driving routes.
Movement is then simulated with an agent-based approach that accounts for congestion and other interactions between individual travel itineraries.
Step 4: Calibrate
After each individual simulation is run, the modeled outputs are compared to aggregate control group data (i.e., observed counts, or “ground truth”) for quality and reporting purposes. This calibration process involves solving a set of large-scale optimization problems with an objective function defined as “fit to observed ground truth.” We strike a careful balance to ensure that the calibration algorithms do not overfit the modeled outputs to the calibration data, as both outliers and a certain level of noise are often present in every dataset.
To complete this iterative calibration process, Replica always holds out some of its own ground truth data from the initial mobility simulation. Replica can also incorporate additional ground truth provided by its customers for additional quality enhancement.
As noted earlier, when a completed model is published, customers also have access to an associated quality report.
Each simulation results in a complete trip, population, and routing table for the given region. Each trip is assigned the following attributes: Origin and destination points, Origin and destination points by land use category, Trip distance, Trip duration, Start and end time, Complete routing information for each trip (network links and transit routes), Trip mode (including private auto driver, private auto passenger, public transit, walking, biking, freight, and transportation network companies), and Trip purpose (including home, work, errands, eat, social, shop, recreation, commercial, and school). See full list of attributes available here.
Each trip is associated with a specific person in the simulation, for whom the following characteristics are available: Age, Sex, Race and ethnicity, Primary language, Employment status, Industry of employment, Home location, Work location, Individual and household income, Work-from-home status, Vehicle ownership status and Resident or visitor status. See full list of attributes available here.
Replica models to specific real-world locations and points of interest (e.g., a specific office building, the Starbucks at a certain address) — trips are modeled from individual parcels or building footprint to individual parcels or building footprint, rather than zone to zone. We update our nationwide catalog of points of interest and parcel data monthly, and we use the applicable set of locations for each simulation.
Updated 4 months ago