Guides

Scenario Model Methodology Summary

Summary

Scenario is an extension of our core seasonal mobility model (Places) pipeline, updated to accept new input data before re-running the mobility model for specific forecasts. Scenario accepts changes to population, employment, and work-from-home rates, as well as road closures to model associated impacts on travel behavior and the transportation network.

To produce a Scenario, Replica follows the following process:

  • For population and employment changes, Replica works with customers to select a base year and provide their projections for future population and employment at a specified geographic unit such as census block groups or custom transportation analysis zones (TAZ)s. We then run the customer’s projections through our models to create a forecasted view of mobility patterns (i.e., a new Scenario).
  • For road closures, customers use our completely self-service Road Closure Scenario application to specify the roadway closure(s), selected base year, and geographic area(s) of analysis at the county level. The application will run the customer's changes through our model to create a forecasted view of the impacts of the road closure(s).

Our travel synthesis pipeline in Scenario leverages the activity sequence model, location choice model, and mode choice model from our seasonal mobility model pipeline, with certain changes, including to traffic and transit assignment. For traffic assignment, we estimate the true physical capacity of road segments, replicating the traditional speed/flow approach (based on the Highway Capacity Manual) for every major road in the country that reaches capacity in the data that we have. We then use these physical capacities to constrain the number of trips that can be routed along each road option. When a road is at capacity, subsequent travelers will take an alternate route or adjust their departure time to accommodate.

For transit assignment in Scenario, we model forecasted line-level totals based off current transit ridership, which has detailed trip information associating origin and destination geographies with specific transit services. We start with each tract-level origin/destination pair, and scale the number of Base Year trips that traveled on each transit line, using a weighted average of population and employment changes in the origin and destination tract. Then, we sum the totals across all origin/destination pairs to get updated line-level ridership estimates. This allows our model to be responsive to local changes in population and employment forecasts with high granularity.

The outputs of Scenario include the following data:

  • New population, employment, and home-work distribution
  • Travel behaviors (origins, destinations, mode choice, trip distances, etc.)
  • Auto volumes and transit line ridership
  • Locations where travel demand exceeds road capacity

Extended Methodology

Base Year and Scenario Runs

Replica is continuously developing modeling and pipeline improvements in order to add new functionality or improve data quality. When starting a new Scenario project, we will re-run our model with the latest improvements for both the selected Base Year and Forecast. This ensures that all of our outputs incorporate the latest modeling improvements and that the differences observed between the Base Year and Scenario run are due to specified inputs. Because of this, there may be some differences observed between a Replica seasonal mobility model season and Base Year season for the same time period.

Note: Unless otherwise requested, Scenarios will be run for a random 20% sample of the population (scaled up to 100% in data outputs) to reduce computation time and complexity. This is a large enough sample for analysis at any geographic level, and means that the raw population/trip table will have to be scaled to for disaggregate use.

Population Synthesis

Allocation of input data to Census geographies

Our modeling pipeline uses a combination of Census Block Groups (BGs) and Public Use Microdata Areas (PUMAs) for population synthesis, as these are the smallest statistically-reliable geographies provided in the various census data products we use (e.g. Public Use Microdata Samples (PUMS), American Community Survey (ACS), Census Transportation Planning Products (CTPP), and Longitudinal Employer-Household Dynamics (LEHD)).

Scenario inputs, such as forecasted population or employment totals, can be provided at the level of a census geography (e.g. county, BG) or for custom geographies (e.g. Transportation Analysis Zones (TAZs)). To synthesize a disaggregate population from these input forecasts, we apportion totals from these input geographies to the BGs used in our seasonal mobility model pipeline using a weighted combination of population- and land area-based allocation.

  • For population, we apportion input geography demographic totals based on the proportion of total residents and employees that live in each BG that overlaps with the input geography in the base year.
  • For area, we apportion input geography demographic totals based on the intersection area of each BG that overlaps with the input geography.
  • We average the two apportionment methods with a 9:1 weight on population-based allocation, with the assumption that population density will be preserved more than it will be evenly spread out over geographic area.

Marginals synthesis

Our core modeling pipeline trains Bayesian networks from PUMS files (for household and individual). The purpose of the Bayesian network approach is to model a diversity of persons and households with attributes that reflect conditional probability distributions of the attributes specific to the given geographic area, as well as conditional dependencies between the attributes (e.g. education, employment, and income). The system then allocates PUMS-based household profiles to census areas using a convex optimization method in order to best match a set of controlled attributes.

If Scenario input data includes only total population and employment estimates, we do not modify the PUMS-based models, meaning that existing demographic marginals would not change meaningfully between Base Year and Scenario runs. Specifically, when synthesizing a synthetic population for a given block group, we would sample households from the existing PUMA demographics until we meet the target population.

When Scenario input data includes a set of demographic attributes, we update our marginals to match the input data. We then attempt to match those with the convex optimization method during population synthesis. The resulting demographic marginals should closely match the inputs, though the method cannot guarantee that they match exactly, particularly if there are combinations of demographic attributes that are very different from existing conditions (e.g. a substantial increase in young, high-income population when the existing residents are older and low-income).

Demographic attributes that are not included in input data will not be controlled. These attributes may change in the output population as demographic attributes are often correlated (e.g. income and vehicle ownership), but they should not be used for analysis purposes.

Note: To be used in our population synthesis, demographic attributes must be provided in the same grouping as Replica uses. Replica will provide an input data template and guidance to ensure your inputs meet our model requirements.

Home-work assignment

Our core population synthesis pipeline uses a combination of CTPP, LEHD, and mobile location data to assign our synthesized population to the proper home-work distribution based on their commute mode, industry of employment, and income group.

For Scenario, we generate modeled estimates of CTPP and LEHD totals based on population and employment inputs. Specifically, we:

  1. Update the Base Year’s home-work distribution (as modeled in our seasonal mobility model pipeline) to match the population and job totals provided in the input forecast, thus generating a scenario-specific home-work distribution. We do so using iterative proportional fitting (IPF).
  2. Update counts of CTPP commute flows to match the newly modeled home-work ODs.
  3. Scale LEHD job totals according to the input employment total, preserving industry distribution in the absence of specific industry forecasts.

Once we have produced the Scenario home-work aggregates, we rely on our seasonal mobility model's population synthesis pipeline to generate a disaggregate population that conforms to these totals as much as possible. Note that because of the interconnected nature of many population attributes, it is possible that we may not be able to synthesize the exact totals provided in each dimension of the inputs – however, we should always be within at most a few percentage points of the population and employment totals.

Homes and workplaces are assigned using the land use from the Base Year. Specifically, homes are assigned to residential and mixed-use residential parcels, and workplaces are assigned to existing buildings with non-residential uses. No new work locations are created for Scenarios.

School enrollment
Our core population synthesis pipeline assigns school-aged residents to a school location based on each school’s enrollment and proximity to home. For Scenarios, we estimate new enrollment limits for schools based on population growth in the relevant geography (i.e. school district for grade school, metro area for universities), and apply the same enrollment-constrained and distance-based school allocation algorithm to assign students.

Travel Synthesis

Activity choice model
Our core activity-based model uses a variety of data sources, including location-based services (LBS) and others, to generate a large number of ‘Personas’ which represent a sequence of activities (e.g. Home-Work-Eat-Work-School-Home), as well as the times and general locations where those activities occurred. These personas are assigned to our synthetic population based on home and work location, as well as certain demographic attributes (e.g. employment status). When a synthetic resident cannot be matched to a suitable Persona, we assign them to a “default Persona” that uses historically observed activity sequences to generate trips to their assigned home, work, and/or school, as well as popular locations for discretionary activities in the local area, based on visit count data for Points-of-Interest (POIs) (see more below).

We use the same personas for Scenarios as we do for our seasonal mobility models, since these are the best representation of activity patterns. This means that we do not assume any changes in current activities – for example, if a certain neighborhood consists mostly of people that work starting at 7am, we assume that new residents in that neighborhood will also mostly start work at 7am. Activity patterns will still change at a regional level because of differential growth in various areas (e.g. more growth expected in a different neighborhood that today goes to work at 9am). Timing of travel for activities may also be adjusted during traffic assignment based on congestion (see below).

Location choice model
While Personas specify a general area (e.g. a neighborhood) where a given activity occurred, in our core activity-based model the specific location for discretionary activities (e.g. eating, shopping, doing errands) is chosen using a dataset of Points-of-Interest (POI) visits. Our model searches for the nearest POIs in a given category, then randomly selects one based on the weighted popularity of each option. This allows us to most accurately determine specific (i.e. building-level) destinations for trips, and also to capture changes in potential destinations (e.g. business openings or closures).

In Scenario, we use the latest POI visit data to weight location choice options. This means that growth in activities of a particular purpose will be modeled as increasing visits to existing businesses, as opposed to the creation of new ones. The impact of this should be mostly insignificant, as analysis zones are usually significantly larger than the granularity of POIs. However, Scenario cannot currently model specific new discretionary activity destinations (e.g. a large shopping mall in an area that is currently undeveloped). These types of changes are on the roadmap for a future version of Scenario.

Routing / traffic and transit assignment

After selecting an activity and location, our activity-based model sends a request for each trip to an internal routing Application Programming Interface (API), which returns all of the possible travel options including one or multiple driving, transit, biking, and walking routes. A route is selected based on travel time, cost, and various other factors (e.g. access to a vehicle).

For Scenario, we make the following changes to our traffic and transit assignment:

Traffic

Our seasonal mobility models use data on traffic volumes (from partner agencies and in-vehicle location providers) to determine the number of trips that should be routed along each road segment. This approach cannot work for Scenario because we do not have future volume counts. Instead, we adopt an approach where we estimate the true physical capacity of road segments, replicating the Highway Capacity Manual speed/flow approach for every major road in the country that reaches capacity in the data that we have. We then use these physical capacities to constrain the number of trips that can be routed along each road option. When a road is at capacity, subsequent travelers will take an alternate route or adjust their departure time to accommodate.
With Scenario outputs, we additionally provide high-level estimates of travel demand exceeding capacity. The intent of this is to provide information on how many vehicles would have wanted to take a given roadway for their trip but could not, either due to congestion or capacity constraints. To do this, we compute the relative change in trip demand between each origin-destination pair (OD) in the Base Year and Scenario. We then attribute link volumes to OD flows using a non-capacity-constrained Scenario run to obtain an approximation for the “unconstrained demand” for each link, and then scale the final simulated volumes using these link attributions and the per-OD growth factors calculated earlier. By dividing this demand-scaled volume estimate by the physical capacity of each roadway, we are able to provide an estimate of “excess demand” on each link.

Transit

Our seasonal mobility models use public agency-provided estimates of ridership to appropriately weight preferences for transit options in routing. This is provided at the line level where available, or apportioned from system-level totals to line-level counts by Replica. In Scenario, we model forecasted line-level totals based off of the Base Year run, which has detailed trip information associating origin and destination geographies with specific transit services. We start with each census tract-level OD pair, and scale the number of Base Year trips that traveled on each transit line, using a weighted average of population and employment changes in the origin and destination census tract. Then, we sum the totals across all ODs to get updated line-level ridership estimates. This allows our model to be responsive to local changes in population and employment forecasts with high granularity. In the simulation stage of modeling, these estimates are used to weight transit preferences when each proposed trip receives a set of routing and mode options.

For transit agencies which are not core to the forecast region, or cover significantly more area than the modeled geographies, we use the Base Year totals and apply a default scalar to the existing ground truth totals from the Base Year. These forecasted line and agency level totals are then used to weight transit options in the Scenario pipeline.

Freight

Our Scenario freight model takes as input a calibrated panel of observed freight trips in the study region and derives scaling coefficients with respect to the properties of the origins and destinations of the freight trips. Currently, the model considers only census tract-level population and employment numbers, as these were found to be the most predictive indicators for freight OD flows. Fractional changes in the census tract-level population and employment numbers are translated into new (or removed) freight trips, depending on the scaling coefficients learned from the Base Year data.

Transportation Network Company (TNC) Travel

Our Scenario TNC model is based on the modeled and observed TNC trip counts in the Base Year. Our seasonal mobility models use city-provided TNC trips where possible in combination with a direct demand model to estimate TNC trips per census tract over the course of the day. In Scenario, we use the population and employment forecasts per census tract to scale expected TNC totals, preserving areas of high and low TNC ridership as observed in the Base Year. These modeled TNC totals are used to weight TNC preference in the Scenario pipeline.

Visitors and pass-through travel

Visitors and pass-through travel are modeled separately from our core population synthesis and activity modeling pipelines. In Scenarios, the numbers of visitors and pass through trips are scaled uniformly with a customer-provided input (e.g. 1.2x of Base Year volumes). The models are otherwise unchanged.