Painting 79k tracts…
BUS 410 · Round 7 · v3 / 2026

Two-Layer Credit-Desert Risk.

A forecast across 3,235 U.S. counties. Toggle the lens. Pick a horizon. Change the geography. Move the levers. Watch the map answer.

Forecasting → 2027
Model
Horizon
Geography
n 3,235 counties · 20092024 panel · 8 time-split tests, no leakage · 2027 forecast · trained on data through 2021 · predicts 2024
Risk percentile Diagnostic
0 25 50 75 100
2027 forecast · in-state percentile
How to use this dashboard
How to use this dashboard

Click any step to find it on the dashboard.

Five things to know before you click around.

Behind the map

How this thing was built, and why we forecast 2027 and 2030.

Round 7 splits credit-desert prediction into two layers: one that predicts as well as possible, one that only uses things you could change. The split is the project. Below: the architecture, the levers that mattered, the COVID break, and what we still can't see.

Methodology view
independent of map
00

Why 2027 and 2030, not next year.

Federal CRA and HMDA data arrives ~2 years late, making 1-year-ahead forecasts useless in practice; the models were trained at the two soonest actionable horizons instead.

An earlier version of this dashboard predicted one year ahead. That's useless in practice: federal CRA and HMDA reporting lags about two years. By the time we run a model on real reported data, last year is already over. Predicting 2025 from 2024 is a paper exercise.

So we retrained at two real horizons. The 2027 forecast uses data through 2021 to forecast 2024, the soonest the data and the model agree on. The 2030 scenario uses data through 2018 to forecast 2024 (same target year, longer reach, weaker but still informative). Both are evaluated walk-forward on out-of-sample folds (averaged across 8 time-split tests); both are calibrated isotonically per fold.

2027 forecast
2030 scenario
Reporting lag~2y CRA, ~1.5y HMDA
Test calibrationisotonic, per-fold
01

Two layers, one panel.

Model 1 uses every available signal for the best possible forecast; Model 2 uses only the 20 features a policymaker could build or fund, residualized against demographics.

Both models train on the same tract-year panel, 2009–2024, averaged across 8 time-split tests. Model 1 gets all 39 features, including supply-side signals that are partly downstream of the outcome. Model 2 gets only features a policymaker could actually move: twenty levers, residualized against demographics so the model has to find lever signal independent of who lives there.

The Diagnostic lens gives the best forecast, but won’t tell you what to do. The Influenceable lens is a quieter forecast, but every signal in it is something you could fund or build.

at active horizon (2027 forecast) Diagnostic Influenceable
Features 39 20
AUC (area under curve)
Average precision
Time-split tests
Forecast year 2024 2024
02

Importance is not impact.

A feature the model notices heavily is not always the same as a feature that changes the final forecast by itself.

We check each category by asking: if the model loses this category, how much worse does its ranking of high-risk places get? That tells us whether the category is carrying unique signal or whether other variables can mostly stand in for it.

The honest read: some lever categories carry stronger unique signal than others, and the active horizon matters. The slider readouts above show this plainly.

Category-dependence check: 2027 forecast

02b

Top features at this horizon.

The chart shows how much the model weights each feature across 8 time-split tests; names ending in _resid had their demographic component mathematically removed before training.

How much the model leans on each feature, averaged across 8 time-split tests. This is different from the category-dependence check above. One asks “how much does the model use this feature when it’s present?” The other asks “how much does the model miss this category when it is gone?” A heavily used feature with low category dependence is a feature the model can route around.

Top 8 · 2027 forecast

A name ending in _resid means the feature was residualized against demographics. We regressed out the part of it that’s explained by who lives in the tract (in other words, the lender-count effect has been mathematically removed), so the model has to find lever signal independent of population.

    03

    The COVID break.

    Training the same architecture on pre- vs. post-COVID data shows distance-to-branch dominated before; lender concentration and MDI presence dominate after — the world changed, not the model.

    We trained the same architecture on pre-COVID data and on post-COVID data and asked which features the model reached for. The answer changed. Distance to a bank branch dominated the pre-COVID model. Post-COVID, lender concentration and MDI presence move up the list.

    This is not the model getting smarter. It’s the world changing under it. Branch closures during COVID rewired what scarcity even means at the tract level.

    Regime split: 2027 forecast

    04

    What this map can’t see.

    The unit is a census tract over multiple years, so anything finer (one branch closing, one new program) is invisible, and future shocks outside the training distribution aren't foreseeable.

    The unit is the census tract; the horizon is multi-year. Anything operating below that resolution (the closure of one branch, a new microlender opening, a single program rule change) is invisible. The model also can’t see the future of the world it was trained on: another COVID-scale shock would land in the Influenceable lens’s blind spot first.

    Use it like a forecast, not a verdict. The colors on the map say where the lens points, not where intervention will work.

    Resolutiontract · year
    Horizons2027, 2030
    Calibrationisotonic, per-fold
    Eval8 time-split tests, no leakage
    AudienceBUS 410, Round 7