Alzheimer’s Disease Early Detection Simulation

Abbreviations
Abbreviation	Definition
AD	Alzheimer’s Disease
BBBM	Blood-Based Biomarkers
CSF	Cerebrospinal Fluid
CT	Computed Tomography
MCI	Mild Cognitive Impairment
MSLT	Multistate Life Table
PET	Positron Emission Tomography
DALYs	Disability-Adjusted Life Years
CSU	Client Services Unit
FHS	Future Health Scenarios
ACMR	All-Cause Mortality Rate
CSMR	Cause-Specific Mortality Rate
EMR	Excess Mortality Rate
CBR	Crude Birth Rate
YLD	Years Lived with Disability
YLL	Years of Life Lost

1.0 Overview 

This project leverages IHME’s simulation capabilities to quantify health and economic impacts associated with early detection and treatment of pre-clinical Alzheimer’s disease (AD). The simulation evaluates scenarios involving blood-based biomarker (BBBM) testing and a hypothetical intervention that slows disease progression.

Basic Goals:

Simulate the patient journey from identification through intervention and outcomes
Compare health and economic impacts across reference and alternative scenarios in 10 locations

Funding and collaboration:

We are designing this simulation in conjunction with IHME’s Client Services Unit (CSU) with a focus on health and economic impact. Our team will focus on simulating the the health impacts of preclinical AD testing and the hypothetical intervention, and the Resource Tracking team will use our results to estimate the economic impacts. We will be using population forecasts from the Future Health Scenarios (FHS) team.

2.0 Modeling Aims and Objectives 

The primary goal is to simulate the impact of early detection and treatment strategies for Alzheimer’s disease using blood-based biomarkers and subsequent interventions. The simulation tracks simulants through health states from age ~30 to 125 years (or death), capturing progression through preclinical AD, mild cognitive impairment (MCI) due to AD, and three stages of dementia due to Alzheimer’s disease.

2.1 Scenarios 

Reference Scenario: Present-day conditions, including current cerebrospinal fluid (CSF), computed tomography (CT), amyloid-positron emission tomography (PET) diagnostic pathways after clinical disease develops, but with no BBBM uptake or disease-modifying therapies.
Alternative Scenario 1: Introduction of BBBM testing for at-risk preclinical populations (no intervention)
Alternative Scenario 2: BBBM testing plus hypothetical intervention that prevents, delays, or slows disease progression

2.2 General Modeling Strategy 

Based on literature and GBD, we conceive of Alzheimer’s disease (AD) as comprising a six-stage progression:

Susceptible → Preclinical AD → MCI due to AD → Mild AD → Moderate AD → Severe AD

The last three stages correspond to a portion of the three sequelae (mild, moderate, severe) of the GBD cause “Alzheimer’s disease and other dementias.” We will have to separate AD out from other dementias in the GBD data, and we will need non-GBD data sources to inform our modeling of preclinical AD and MCI due to AD. Furthermore, reality may be a bit more complicated than the simple one-directional progression depicted above, but the assumption of no recovery from any state might be sufficient for our purposes.

The basic plan for the design of the simulation is as follows:

Use forecasted population estimates
- We have data on ‘population’, ‘deaths’, ‘migration’, and ‘births’ from FHS that can inform the age structure in the population out to year 2100; we plan to use population and deaths forecasts, but not migration or births
- Based on GBD data, the incidence of AD within each age group is pretty stable over time, so we are not planning on using forecasted data for Alzheimer’s disease
Only simulate people who will eventually get AD (and other dementias (?))
- This drastically reduces population size and hence compute resources
- We will need to “work backwards” from GBD’s Alzheimer’s estimates and the population forecasts to determine how many people to add on each time step
- We will need to do some calculations outside the simulation to account for false positive tests and people who don’t progress from preclinical AD or MCI to dementia due to AD
On top of the population model, we will add an Alzheimer’s disease progression model, a testing and diagnosis model, and a treatment model, as detailed in the next section

3.0 Simulation Components 

Simulation Components
Component	Purpose	Main Features	Dependencies
Population Model	Evolution of simulant demographics over time	Influx of incident cases of preclinical AD, aging of simulants, all-cause mortality	Forecasted population data, age-specific incidence rates of preclinical AD
Alzheimer’s Disease Model	Disease progression	Transition rates through 6 stages of AD, cause-specific mortality	Population model
Testing/Diagnosis Model	BBBM and existing testing pathways	Multi-modal testing, correlation between testing and disease progression	Disease model, population model
Treatment Model	Hypothetical disease-modifying therapy	Reduction in progression rate, adherence	Disease model, testing model
Multistate Life Table (MSLT) model	Testing and treatment among the susceptible population	Counts of BBBM tests, false positive BBBM tests, and treatments initiated (incorrectly) among the susceptible population	Forecasted population and mortality rates, incident preclinical AD cases, testing and treatment rates, specificity of BBBM testing
Economic Impact model	Cost-effectiveness analysis	Comprehensive cost modeling, ICER calculations	All other modules

4.0 Specifications 

4.1 Default Parameter Specifications 

Default Simulation Parameter Specifications
Parameter	Value	Note
Locations	Sweden, US, China, Japan, Brazil, UK, Germany, Spain, Israel, Taiwan	10 locations of interest
Simulation start date	2025-01-01
Simulation end date	2100-12-31	76-year simulation period
Observation start date	2025-01-01	No burn-in period
Cohort type	Open	Cohort consists of simulants who are in any of the 5 stages of Alzheimer’s disease
Sex	Males & Females
Age start (Initialization)	Age at which preclinical AD starts (currently set to 25 years to accommodate the youngest preclinical AD incident cases)	Age start is simulant-dependent
Age end (Initialization)	125 years	End of oldest age group
Age start (Observation)	Age at which preclinical AD starts (currently set to 25 years to accommodate the youngest preclinical AD incident cases)	All simulants are observed since all have AD or its precursors
Age end (Observation)	125 years or death
Initial population size per draw	100,000 simulants
Number of Draws	25 draws
Step size	182 days (~6 months)	Twice a year is sufficient to capture frequency of testing and disease progression. Model 1 used a step size of 182 days, resulting in 3 timesteps the first year, so we increased to 183 days in model 2 to guarantee exactly 2 timesteps per year for all 76 simulation years. In model 6.1, we switched back to 182 days but recorded “event time” in the observers instead of “current time.” This effectively makes the first observation 182 days after the start of the simulation, so the first “timestep” on Jan 1, 2025 doesn’t count, and all simulation years are again guaranteed to contain exactly 2 timesteps.
Randomness key columns	[‘entrance_time’, ‘age’, ‘sex’]	There should be no need to modify the standard key columns

4.2 Scenario Details 

Scenario details
Scenario	Columns with more details go here	Note
Baseline (Reference)
Testing scale-up (Alternative 1)
Treatment scale-up (Alternative 2)

4.3 Outputs and Observers 

Default stratifications for all observations:

Year
Sex
Age group

Additionally, all output should automatically be stratified by location, scenario, and input draw.

Outputs of simulation observers
Observation	Stratification modifications	Note
Number of new simulants each year		Either births or new Alzheimer’s cases, depending on population model
Deaths and YLLs (cause-specific)
YLDs (cause-specific)
Transition counts between Alzheimer’s cause states
Person-time in each Alzheimer’s cause state
CSF/PET-eligible simulant count	Test state: CSF test received, PET test received, no test received, (negative) BBBM test received	Observe only simulants eligible for CSF/PET tests and stratify by test states to get test counts. Simulants who are CSF/PET-eligible but whose test propensity value is >= (CSF testing rate + PET testing rate) will be in either the no test received stratification or BBBM test received stratification (depending whether or not they have received a negative BBBM test), since any CSF/PET eligible simulants with propensities < (CSF testing rate + PET testing rate) will be immediately given one of those tests.
BBBM test counts	Diagnosis provided (positive, negative). Treatement initiation decision (yes, no).	Diagnosis and treatment initiation both stratified under test count because they both happen immediately on test.
BBBM newly test-eligibile simulant count		Count of simulants who are newly eligible for BBBM testing, based on the BBBM eligibility requirements (list in step 1). Newly eligible simulants could be incident to pre-clinical, turning 60, or reaching 3 years since their last test. Will be used to check simulation test counts per newly eligible simulant match Lilly annual year-specific test rates.
Person-time eligible for BBBM testing	BBBM test result (positive, negative, not tested)
Person-time ever eligible for BBBM testing	Alzheimer’s cause state (BBBM-AD, MCI-AD, AD-dementia); BBBM test result (positive, negative, not tested)	A simulant contributes to this person-time if they have ever been eligible for BBBM testing. We will use this observer to calculate (person-time ever BBBM tested) / (person-time ever BBBM test-eligible) among simulants between 60-80 in the BBBM-AD disease state. The numerator is obtained from the BBBM test result stratification by summing the person-time for simulants with positive or negative BBBM test results, and the denominator is the person-time summed over all test result strata including not tested.
Treatment status transition counts	State transitioned to (Full treatment effect, Waning treatment effect, No treatment effect), treatment completion (completed, discontinued)	Treatment completion stratification for transitions to Full treatment effect state allows us to validate the 10% discontinuation rate. Note that the diagram states Full treatment effect LONG and Full treatment effect SHORT are both considered the same status (Full treatment effect), but are stratified by completion status.
Months on treatment	Number of months on treatment (integer between 1 and 9, inclusive), among simulants who get treated	Count the number of treated simulants in each (year, sex, age group, months on treatment) stratum. This will be used for cost estimates.
Treatment status person-time	Status (In treatment/ Waiting for treatment, Full treatment effect, Waning treatment effect, No treatment effect). Also stratify by treatment completion (completed, discontinuated) from transition observer	Treatment completion stratification allows us to validate the different sized durations for completed/discontinued Full and Waning treatment statuses

5.0 Model Runs and Verification & Validation 

5.1 Model Runs 

Model run requests
Run	Description	Scenarios	Specification mods	Stratification mods	Observer mods
0.0	Speed test with fake data but full population and mock-ups of all components to test runtime	Custom scenario including three types of Alzheimer’s testing and a hypothetical treatment	Locations: United States (USA) Cohort: Open cohort simulating entire population (including susceptible simulants, not just simulants who will get AD) in all age groups; simulants enter at age = 0 using crude birth rate	Default	Use (mostly) standard VPH observers: Mortality and Disability observers Disease observer for Alzheimers Custom observer for Alzheimer’s testing (based on DiseaseObserver) CategoricalInterventionObserver for Alzheimer’s treatment
1.0	Simple SI model of AD using GBD data for AD and other dementias	Baseline	Locations: USA, China Cohort: Same population model as Model 0.0	Default	Default
2.0	Replace standard population components with custom Alzheimer’s population component to model only population with AD; use same simple SI model of AD as Model 1.0, but with initial prevalence of AD equal to 1	Baseline	Locations: USA, China Change step size from 182 days to 183 days	Default	Default
2.1	Replace old Alzhiemer’s disease model with one where everyone is infected	Baseline	Locations: USA, China	Default	Default
2.2	Fix incidence to be based on full population instead of suscpetible population in fertility	Baseline	Locations: USA, China	Default	Default
3.0	Replace population and mortality data with forecasts from IHME’s FHS team	Baseline	Locations: USA, China	Default	Default
3.1	Use draws from forecasted population structure data rather than mean value	Baseline	Locations: USA, China	Default	Default
4.0	Include BBBM-AD and MCI-AD states	Baseline	Locations: USA, China	Default	Default
4.1	Update MCI duration and MCI → AD transition rate to avoid negatives in older age groups	Baseline	Locations: All (Sweden, USA, China, Japan, Brazil, UK, Germany, Spain, Israel, Taiwan)	Default	Default
4.2	Switch BBBM → MCI hazard to Weibull distribution	Baseline	Locations: USA	Default	Default
4.3	Set population and AD-dementia incidence rates to zero on nonexistent older age groups instead of forward filling	Baseline	Locations: USA	Default	Default
4.4	Use total-population incidence rate of AD-dementia in calculation of BBBM-AD incidence (we had been incorrectly using susceptible-population incidence)	Baseline	Locations: USA	Default	Default
4.5	Don’t double round age when finding age group at midpoint of interval	Baseline	Locations: USA	Default	Default
5.0	Replace incidence and prevalence with AD proportion of GBD 2023 dementia envelope	Baseline	Locations: All (Sweden, USA, China, Japan, Brazil, UK, Germany, Spain, Israel, Taiwan)	Default	Default
6.0	Add testing (CSF/PET, BBBM) intervention	Baseline, Alternative Scenario 1	Locations: All (Sweden, USA, China, Japan, Brazil, UK, Germany, Spain, Israel, Taiwan)	Default	Add test counts and testing eligibility observers
6.1	Add person-time observers for BBBM testing	Baseline, Alternative Scenario 1	Locations: USA Record “event time” in observers instead of “current time,” effectively making the first timestep 6 months after the simulation start date instead of on the start date, and change the step size back to 182 days to guarantee 2 timesteps per year	Stratify BBBM testing observers by semester so that we have one row of observation for every time step	Observe person-time of simulants eligible for BBBM testing Observe person-time of simulants ever eligible for BBBM testing
7.0	Add treatment (full, waning) intervention	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: All (Sweden, USA, China, Japan, Brazil, UK, Germany, Spain, Israel, Taiwan)	Stratify all BBBM testing and treatment observations by semester	Add treatment status transition and person-time observers
7.1	Fix usage of propensities for testing, and stratify disease observers by treatment status	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: USA, Spain	Stratify disease state transitions and person-time by treatment status	Default
7.2	Fix bug with 80-year-olds entering “waiting for treatment” state, and fix bug in BBBM → MCI hazard rate	Baseline, Alternative Scenario 2	Locations: USA	Stratify disease state transitions and person-time by treatment status	Default
7.3	Add BBBM testing history	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: USA	Stratify disease state transitions and person-time by treatment status	Default
7.4	Bugfix: Use conditional prevalence to initialize AD dementia state instead of unconditional prevalence	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: USA	Stratify disease state transitions and person-time by treatment status	Default
7.5	Bugfix: Don’t initialize BBBM testing history in baseline scenario	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: USA	Stratify disease state transitions and person-time by treatment status	Default
7.6	Additional bug fixes for BBBM testing	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: USA Simulation end date: 2065-12-31	Stratify disease state transitions and person-time by treatment status	Default
8.0	Abie’s consistent rates model	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: Sweden, USA	Stratify disease state transitions and person-time by treatment	Default
8.1	Consistent rates model with AD prevalence bug fix from model 7.4	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: Sweden, USA	Stratify disease state transitions and person-time by treatment	Default
8.2	Model 8.1, but one draw with 500 seeds, for estimating population size for final run	Baseline, Alternative Scenario 2	Locations: USA Simulation end date: 2060-12-31 Number of draws: 1 Population size per draw: 10 million (500 seeds of 20,000 simulants each)	Stratify disease state transitions and person-time by treatment	Default
8.3	Model 8.1 + use EMR from DisMod	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: All Simulation end date: 2060-12-31 Number of draws: 5	Stratify disease state transitions and person-time by treatment	Default
8.4	Final runs for 10/31 deadline: Model 8.3 + merge in changes through model 7.6	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: All Population size per draw: 2 million (100 seeds of 20,000 simulants each)	Stratify disease state transitions and person-time by treatment	Default
8.5	Bugfix: Ensure all new simulants enter the sim in the BBBM state after time zero (previously, new simulants had been continuing to enter all 3 states according to the initial prevalences)	Baseline	Locations: USA, Sweden	Stratify disease state transitions and person-time by treatment	Default
8.6	Run model 8.5 for all three scenarios	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: USA, Sweden	Stratify disease state transitions and person-time by treatment	Default
8.7	New final runs, same as models 8.5/8.6, but with more random seeds	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: All Population size per draw: 2 million (100 seeds of 20,000 simulants each)	Stratify disease state transitions and person-time by treatment	Default
9.0	Split AD dementia into mild, moderate, severe	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: China Number of draws: 3 Population size per draw: 2 million (100 seeds of 20,000 simulants each)	Stratify disease state transitions and person-time by treatment	Default
10.0	Add mixed dementia cases that include AD (instead of modeling AD only, as we have been since model 5.0)	Baseline, Alternative Scenario 2	Locations: USA, China Population size per draw: 2 million (100 seeds of 20,000 simulants each). Note: We did not actually need this big of a run, it just happened to be what was already on `main`	Stratify disease state transitions and person-time by treatment	Default
11.0	Updates to treatment model	Baseline, Alternative Scenario 2	Locations: USA, China, Brazil	Stratify disease state transitions and person-time by treatment	Add observer for months on treatment
11.1	Bugfix: change to new artifact, observer bugfix	Baseline, Alternative Scenario 2	Locations: USA, China, Brazil	Stratify disease state transitions and person-time by treatment	Add observer for months on treatment
12.0	Updates to testing model	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: USA, China, Brazil	Stratify disease state transitions and person-time by treatment	Default
12.1	Updates to testing constants: sensitivity and elgibility age	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: USA, China, Brazil	Stratify disease state transitions and person-time by treatment	Default
12.2	Final runs based on model 12.1, with minor bugfix to testing ramp-up	Baseline, Alternative Scenario 1, Alternative Scenario 2	Locations: All Population size per draw: 2 million (100 seeds of 20,000 simulants each)	Remove stratification of disease state transitions and person-time by treatment	Default

5.2 V & V Tracking 

V&V Tracking
Run	V&V plan	V&V summary	Link to notebook
0.0	Check runtime of simulation. No other V&V since data was fake.	~15 minutes to complete parallel runs of 100 jobs with 20K simulants each (2 million total simulants, equivalent to 20 draws with 100K simulants each)	None
1.0	Note: All these checks can be done separately for each age group and sex, but due to the large number of age groups, it may be more prudent to start by looking at aggregated results. Verify crude birth rate (CBR) against GBD Verify ACMR against GBD Validate Alzheimer’s CSMR against GBD Verify Alzheimer’s incidence rate against GBD Validate Alzheimer’s prevalence against GBD Validate Alzheimer’s EMR against GBD Validate Alzheimer’s YLLs and YLDs against GBD Check whether overall population remains stable over time Check whether Alzheimer’s prevalence remains stable over time For comparison with model 2, calculate total “real world” Alzheimer’s population over time as \(p_\text{AD}(t) \cdot X_t / S\), where \(p_\text{AD}(t)\) is prevalence of AD at time \(t\), \(X_t\) is the simulated population at time \(t\), and \(S = X_{2025}\) / (real total population in 2025) is the model scale	Birth observer was missing, so we couldn’t verify CBR Total population per draw was 200k instead of 100k, and there were 10 draws instead of 25 Timestep was 182 days, resulting in 3 timesteps in 2025, making population counts 1.5 times what they should be in 2025; we’ll change the timestep to 183 days for future models Total population decreased monotonically during the 76 years of the sim from 200k to about 170k in USA and about 125k in China Prevalence, incidence, EMR, CSMR, ACMR, and YLLs all validated to artifact values and remained stable over time YLDs were above GBD values for both locations. We should look into disability weights to see if there is a bug.	https://github.com/ihmeuw/vivarium_research_alzheimers/blob/b84ad4c959ad6a0ef5957250c17ef36dba23b190/verification_and_validation/2025_08_12_model1_vv.ipynb
2.0	Note: All these checks can be done separately for each age group and sex, but it may be more prudent to start by looking at aggregated results. Verify the number of new simulants per year against the AD population model Use interactive sim to verify initial population structure against the AD population model Verify that all simulants in the model have AD (i.e., all recorded person-time is in the “AD” state, not the “susceptible” state) Verify that there are no transitions between AD states during the simulation (since it’s an SI model and all simulants should be in the I state the whole time) Verify ACMR against GBD Validate Alzheimer’s CSMR against GBD Validate Alzheimer’s EMR against GBD Validate Alzheimer’s YLLs and YLDs against GBD For comparison with model 1, calculate total “real world” Alzheimer’s population over time as \(X_t / S\), where \(X_t\) is the simulated population at time \(t\), and \(S = X_{2025}\) / (real population with AD in 2025) is the model scale (I’m not sure how closely we expect this to match model 1)	There are simulants in susceptible and who transition from susceptible to infected. This is incorrect. Because of this, incidence and prevalence have not been evaluated ACMR, CSMR, EMR, YLLs are all correct The issues with YLDs is still present, as expected	https://github.com/ihmeuw/vivarium_research_alzheimers/blob/28c884aa7628819fe5ee03248c9a488d5c7eb340/verification_and_validation/2025_08_12_model2_vv.ipynb
2.1	Note: All these checks can be done separately for each age group and sex, but it may be more prudent to start by looking at aggregated results. Verify the number of new simulants per year against the AD population model Use interactive sim to verify initial population structure against the AD population model Verify that all simulants in the model have AD (i.e., all recorded person-time is in the “AD” state, not the “susceptible” state) Verify that there are no transitions between AD states during the simulation (since it’s an SI model and all simulants should be in the I state the whole time) Verify ACMR against GBD Validate Alzheimer’s CSMR against GBD Validate Alzheimer’s EMR against GBD Validate Alzheimer’s YLLs and YLDs against GBD For comparison with model 1, calculate total “real world” Alzheimer’s population over time as \(X_t / S\), where \(X_t\) is the simulated population at time \(t\), and \(S = X_{2025}\) / (real population with AD in 2025) is the model scale (I’m not sure how closely we expect this to match model 1)	No simulants were susceptible or transitioned as expected EMR validated, but CSMR and ACMR did not which was expected, see below for new mortality metrics to validate against Similarly, YLLs and YLDs did not match as expected, remove these moving forward The number of new simulants entering the sim is correct in younger age groups but incorrect in later ages. This is thought to be an issue with incidence used in the sim. Prevalence and real world pop have not been evaluated	https://github.com/ihmeuw/vivarium_research_alzheimers/blob/232bab04fff9591b4fb4a543199ce50091087d95/verification_and_validation/2025_08_12_model2.1_vv.ipynb
2.2	Note: All these checks can be done separately for each age group and sex, but it may be more prudent to start by looking at aggregated results. Verify the number of new simulants per year against the AD population model Verify the prevalent simulants per year against the AD population model Verify that all simulants in the model have AD (i.e., all recorded person-time is in the “AD” state, not the “susceptible” state) Verify that there are no transitions between AD states during the simulation (since it’s an SI model and all simulants should be in the I state the whole time) Validate Alzheimer’s EMR against artifact Validate overall mortality (ACMR - CSMR + EMR) vs artifact	No simulants were susceptible or transitioned as expected EMR, total mortality rate and new sim incidence counts validated Prevalence was correct on initialization but total sim pop and prevalence increases for about 10 years before stabilizing. This is thought to be due to issues with misalignment of incidence and mortality in GBD data. We are moving to model 3 as pop values change with forecasting in that sim.	https://github.com/ihmeuw/vivarium_research_alzheimers/blob/b042cdee74149371425c001cedb022e7f6b6a0c4/verification_and_validation/2025_08_14_model2.2_vv.ipynb
3.0	Note: All these checks can be done separately for each age group and sex, but it may be more prudent to start by looking at aggregated results. Everything from 2.0, except use FHS values for ACMR in the overall mortality (ACMR - CSMR + EMR) vs artifact comparison Verify that (ACMR - CSMR + EMR) decreases slightly from 2025 to 2050 and then levels off Since there are so many (age groups, years, locations, sex) combinations that might be tested, it will be good enough to confirm that new simulant counts and total mortality rates line up for 2030, 2060, and 2090, and for two locations.	The number of new simulants entering the sim matches the target number, which leads to a prevalence counts higher than estimated by GBD/FHS, but closer than in Model 2.	https://github.com/ihmeuw/vivarium_research_alzheimers/blob/32e7d3d44f540a9b9620b21b5a137f626631475c/verification_and_validation/2025_08_25b_model3.0_vv.ipynb
3.1	Same as 3.0 (notebook copied)	Results are consistent with 3.0 results	https://github.com/ihmeuw/vivarium_research_alzheimers/blob/main/verification_and_validation/2025_09_05a_model3.1_vv.ipynb
4.0	All checks from 3.0, but instead of verifying all-cause mortality rate, use other-cause mortality rate, which is easier to compute; also confirm that there are person-years of BBBM-AD and MCI-AD for all age groups and years.	AD-dementia Incidence counts in simulation exceed artifact values for younger ages Zero incidence and prevalence of AD-dementia at oldest ages (due to bug with negative transition rates)	https://github.com/ihmeuw/vivarium_research_alzheimers/blob/8f7f48009ee36b65763d8103cc4c4182b52908f1/verification_and_validation/2025_09_05a_model4.0_vv.ipynb
4.1	Same as 4.0, but also look at durations of BBBM-AD, MCI-AD to make sure they match expectation. Anticipate there to be more similarity between AD-dementia incidence counts in simulation and GBD/FHS.	AD-dementia incidence counts still too high in younger ages AD-dementia incidence counts now extremely high in older ages, likely due to forward filling BBBM incidence data for nonexistent age groups above 95–100 Plot of BBBM → MCI transition rate looks very weird	https://github.com/ihmeuw/vivarium_research_alzheimers/blob/290165c8190b2030db735f812cf2b0c02733ac30/verification_and_validation/2025_09_13a_model4.1_vv.ipynb
4.2	Same as 4.1	Not much positive change to the AD-dementia incidence (still off in young ages, and now further off in old ages) Plot of BBBM → MCI transition rate is somewhat improved	https://github.com/ihmeuw/vivarium_research_alzheimers/blob/290165c8190b2030db735f812cf2b0c02733ac30/verification_and_validation/2025_09_15a_model4.2_vv.ipynb
4.3	Same as 4.2	Big improvement in AD-dementia incidence for older ages, still off for younger ages	https://github.com/ihmeuw/vivarium_research_alzheimers/blob/290165c8190b2030db735f812cf2b0c02733ac30/verification_and_validation/2025_09_18b_model4.3_vv.ipynb
4.4	Same as 4.3	Some improvement in AD-dementia incidence for younger ages; we think that the duration we have used is off by a little since we did not include mortality in our duration estimate	https://github.com/ihmeuw/vivarium_research_alzheimers/blob/290165c8190b2030db735f812cf2b0c02733ac30/verification_and_validation/2025_09_18c_model4.4_vv.ipynb
4.5	Same as 4.4, except add this check that we should have been doing previously: Compute prevalence of AD-dementia state alone (in addition to combined prevalence of all 3 disease states)	AD-dementia incidence looks identical to 4.4, so the double rounding was perhaps not a problem after all Prevalence counts of all 3 states combined look pretty good The prevalence counts of the AD-dementia state alone is too low at the start of the sim, then becomes too high as time goes on	https://github.com/ihmeuw/vivarium_research_alzheimers/blob/1fdfff314c3abb0088a919dd9cdfa7bb8766710b/verification_and_validation/2025_09_18d_model4.5_vv.ipynb
5.0	Same as 4.5, except add this check that we should have been doing previously: Check disability weights of MCI and AD-dementia by computing YLDs/person-time for each sex and age group	AD-dementia incidence is still close but a bit off, similar to model 4.5 Prevalence of AD-dementia still starts off too low and then becomes too high Disability weights computed from the sim are virtually identical to those stored in the artifact	Disease transition rates, mortality, incidence, prevalence Disability weights
6.0	Only eligible simulants are tested based on PET/CSF and BBBM testing requirements. Location-specific CSF vs PET testing rates (CSF tests / PET tests = CSF rate / PET rate) 90% sensitivity rate for BBBM tests (meaning 90% of simulants test positive, since they all have preclinical AD) Year-stratified CSF/PET test counts per CSF/PET eligible person-year match location and time-specific rates Year-stratified BBBM test count per newly eligible person count match time-specific rates CSF/PET tests initialized properly - no testing spike for first time step	V&V summary in PR 14 CSF and PET testing rates in baseline scenario match artifact values Baseline CSF and PET testing rates match between concept model and artifact CSF and PET testing rates in BBBM testing scenario decrease as expected as BBBM tests scale up, but CSF tests always start decreasing before PET tests, which is likely due to the way the testing propensity was implemented (the desired behavior is that CSF and PET tests would be independent of each other and would start decreasing simultaneously) BBBM tests only occur from ages 60 - 79 as expected BBBM test rate, calculated as (count of BBBM tests) / (count of newly eligible simulants), spikes in 2030 as expected and then increases until 2045 as expected, but it levels off between 0.1 and 0.2 in all countries, which is not close to our target value of 0.6. This is likely due to the lifetime propensities for testing, and we think we need to compute the test rate in a different way to validate the eventual testing coverage of 60%. BBBM test sensitivity is 90% as expected	Testing
6.1	Compute BBBM test rate as (count of tests) / (eligible person-time) Compute fraction of simulants who have had BBBM tests as (person-time ever tested) / (person-time ever eligible)	V&V summary in PR 15 The means of CSF and PET testing rates in baseline still look good, but the uncertainty intervals look off (I didn’t check the uncertainty in model 6.0) Plots of (BBBM tests) / (eligible person-time) look similar to plots of (BBBM tests) / (counts of eligible simulants), so not vey helpful Plots of (person-time ever tested) / (person-time ever eligible) look good when stratified by age group: For each age group above age 60, the coverage increases monotonically and levels off at 60% coverage, which is the target Age groups from 60 - 79 all have an immediate jump in 2030 as expected and follow identical patterns Age groups above age 80 follow a similar pattern but have a time lag and don’t show the initial jump – this is also expected Other checks look the same as in model 6.0	Testing
7.0	Positive BBBM tests result in treatment initiation rates that match the year/location specific rates from \(I\) in the treatment intervention data table 10% of transitions to Full treatment effect status are by simulants who discontinue treatment Full/Waning durations are accurate (use person-time ratios between states?) “In treatment/waiting for treatment” duration is accurate (use person-time ratios between states?) Interactive sim verification spot checking a simulant’s durations in treatment statuses as they move through BBBM test negative, Full treatment effect, Waning treatment effect, No treatment effect statuses (for both completed and discontinued treatments) Check hazard ratios for simulants who begin treatment and those who transition to No treatment effect	V&V summary in PR 19 Things that look good: Treatment coverage ramps up as expected 10% of simulants discontinue treatment as expected Ratio of person-time between disease states looks good, given that we can’t predict exactly what this will look like in the observers (the person-time for the “waning effect long” state looks a bit low, but this is likely because a significant number of people die before they reach the end of this state) Average treatment state durations for people entering or exiting each state look good (again, we can’t predict these exactly because of deaths) Averted deaths, YLLs, and YLDs all seem reasonable In the interactive sim, the hazard ratio (RR) for the treatment effect looks correct Treatment state durations look good in the interactive sim, but I didn’t run it long enough to check the “waning effect long” state Things that look wrong: I can’t check the hazard ratio in the sim outputs because we didn’t stratify disease state person-time or transitions by treatment status Simulants in the 80-84 age group are entering the “start treatment” state (aka “waiting for treatment”) when they shouldn’t be In the interactive sim, the MCI incidence probability looks like it’s about half as big as it should be (we determined that the hazard rate was getting multiplied by the time step of ~0.5 twice instead of once) In the interactive sim, some simulants have an RR of 1.0 on the last time step of the “waning treatment effect” state, whereas the docs specify that this should not happen until the “no effect” state on the next time step (this was determined to be an off-by-one error due to a misinterpretation of the docs, but we decided to leave it as is to err on the side of less effective treatment)	Treatment Interactive sim (hazard rates)
7.1	Same as model 7.0, but add: Check relative risk of treatment on BBBM → MCI transition in observer output now that we have the necessary stratifications Check that CSF and PET testing start decreasing at the same time when increasing BBBM testing, rather than CSF testing always decreasing first	The relative risk of treatment on BBBM → MCI transition looks about right for the “treatment effect long” state, but is strangely wiggly in the “treatment effect short” state — this may be an artifact of how the observers work CSF and PET tests now decrease simultaneously as BBBM testing ramps up, as desired	Testing Treatment
7.2	Same as model 7.1, but add: Check that the 80-84 year-old age group has no transitions into the “waiting for treatment” state Check BBBM → MCI hazard rate in observer output and interactive sim Re-run V&V from models 4 and 5 to check disease state incidence, prevalence, etc.	Transitions into the “waiting for treatment” state are now restricted to the age range 60-79 as desired The BBBM → MCI hazard rate and the RR of treatment on this hazard both look reasonable in the observer output; I didn’t check these in the interactive sim The results of running the model 5 V&V notebook seem reasonable, but it’s hard to tell exactly because the notebook was designed for the baseline scenario only and is probably adding together results from all three scenarios (I didn’t have time to fix this yet)	Disease transition rates, mortality, incidence, prevalence Treatment
7.3	Re-run testing notebook from model 7.1. Things should look similar, but testing should ramp up slightly slower	There were several problems with the BBBM test history: The baseline scenario had people entering with BBBM test history The sensitivity now appears to be 80% instead of 90% – it looks like negative tests from before the sim are getting counted by the observer, but we want to count just tests that occur during the sim Our original idea of measuring the testing rate as (# tests) / (# newly eligible) or (# tests) / (eligible person-time) look closer to the mark of 60% Our graphs of (person-time ever tested) / (person-time ever eligible) look very different from model 7.1, and I’m not sure why	Testing
7.4	Check initial prevalence of all three AD states Re-check the other health metrics (incidence, prevalence, mortality, etc.) in the baseline scenario	Not checking directly since Abie’s run included these changes; check model 8.1 instead	N/A
7.5	Skip, check 7.6 instead	Not checking since fixes were incomplete; check model 7.6 instead	N/A
7.6	Re-run testing notebook from model 7.3 to see if things look more like they did in 7.1	Things look good now, more like they did in model 7.1: Baseline scenario no longer includes BBBM testing history Sensitivity is back to 90% as expected Plots of (person-time ever tested) / (person-time ever eligible) look very similar to how they did in model 7.1 There are a few minor differences with model 7.1 that I’m not sure how to explain: The BBBM testing rate measured as (# tests) / (# newly eligible) levels off just under 50%, compared to around 15% in model 7.1, and there’s a similar pattern for (# tests) / (eligible person-time) The number of newly eligible simulants remains flatter after the first couple years, and the number of new BBBM tests per year levels off at a higher value than in model 7.1 The fraction of newly eligible simulants with negative BBBM tests remains much smaller now, under 5% compared with leveling off at almost 15% in model 7.1	Testing
8.0	Re-run model 5.0 V&V notebook (incidence, prevalence, mortality, etc.)	EMR appears to be correct AD dementia incidence by age no longer looks obviously shifted to the left like in model 5.0; it starts off a bit too high, then gets closer to its target as tim goes on AD dementia prevalence starts off too low, then becomes a bit too high as the sim progresses The relative prevalence of the three disease states over time still looks weird: AD dementia prevalence starts off too low, then increases much more quickly than the other two states, which doesn’t seem right	Disease transition rates, mortality, incidence, prevalence (Sweden) Disease transition rates, mortality, incidence, prevalence (USA)
8.1	Re-run model 5.0 V&V notebook (incidence, prevalence, mortality, etc.) See how initial prevalence looks after bugfix from model 7.4	EMR appears to be correct AD dementia incidence now looks slightly underestimated, particularly in older ages AD dementia prevalence now matches its target at the beginning of the sim, but still becomes too high as the sim progresses The relative prevalence of the three disease states over time still looks weird: Even though AD dementia prevalence now starts off correctly, it still increases much more quickly than the other two states and becomes too high	Disease transition rates, mortality, incidence, prevalence (Sweden) Disease transition rates, mortality, incidence, prevalence (USA)
8.2	See how variance and uncertainty in averted deaths scale with population size, and use the results to choose a population size for final runs	See linked notebooks →	Disease transition rates, mortality, incidence, prevalence Treatment Abie’s notebook estimating variance when scaling up population
8.3	Re-run model 5.0 V&V notebook (incidence, prevalence, mortality, etc.) Make sure EMR matches new EMR from DisMod Re-run testing and treatment V&V notebooks	V&V summary in PR 27 Testing results have the same bugs as model 7.3, as expected Treatment results look like those in model 7.2, which is good	Testing Treatment
8.4	No V&V (due to lack of time), just generate results tables Model 5.0 V&V notebook (incidence, prevalence, mortality, etc.) should look like model 8.3 in baseline scenario Testing notebook should look like model 7.6 Treatment notebook should look similar to models 8.3 and 7.2	N/A	Results tables in PR 28
8.5	Re-run model 5.0 V&V notebook (incidence, prevalence, mortality, etc.) Check whether prevalence of AD dementia still increases faster than the other two states, or if it looks more reasonable In the interactive sim, check that new simulants are only entering in the BBBM state	V&V summary in PR 31 The plot of AD dementia incidence by age is no longer shifted to the left like in model 5.0 and seems to be pretty closely matching its target, but it is now slightly low at the beginning of the sim rather than slightly high Prevalence of AD dementia by age is also a bit low at the start of the sim, but it now looks much closer to its target at later times rather than becoming too high Prevalence of the three states over time now looks much better: Namely, the relative proportions of the 3 states remains pretty constant as opposed to the AD dementia prevalence increasing much faster than preclinical and MCI I confirmed in the interactive sim that the new simulants added on each time step are always only in the BBBM state (at least for the four time steps I checked) I confirmed in the interactive sim that the displacement of PET and CSF tests by BBBM tests seems to be working as expected The interactive sim showed that we still have the same off-by-one error in the waning effect of treatment as in model 7.0, but we decided to leave this as is (and update the docs instead) to err on the side of making a conservative estimate	Disease transition rates, mortality, incidence, prevalence Interactive sim (hazard rates) Interactive sim (testing)
8.6	Re-run testing and treatment V&V notebooks	V&V summary in PR 31 The number of tests, treatments, and averted deaths all increased, by approximately a factor of 2	Testing Treatment
8.7	No V&V (model is identical to 8.5/8.6), just generate results tables	N/A	Results tables in PR 34
9.0	Re-run model 5.0 V&V notebook (incidence, prevalence, mortality, etc.) Make sure incidence, prevalence, etc. still look good Check that the prevalence of the different dementia severities looks reasonable	V&V summary in PR 33 AD dementia incidence and prevalence are looking generally good, except they’re a bit too high in the younger ages, likely because Abie didn’t solve the differential equations for age < 45. AD dementia incidence and prevalence are a bit low at the beginning of the sim. The same problem appears in model 8.5/8.6/8.7, but it looks slightly worse here. The prevalence of all disease states (including severities) follows the expected pattern, and we can see slight changes in prevalence between the baseline and treatment scenarios These results are only for 3 draws, so they could be affected by parameter uncertainty.	Disease transition rates, mortality, incidence, prevalence
10.0	Re-run model 5.0 V&V notebook (incidence, prevalence, mortality, etc.) Check that incidence and prevalence increase by an amount that corresponds to the proportions in the .csv file from the dementia modelers; I predict approximately a 69% increase in the all-ages prevalence	V&V checks still passing for mortality, comparison to artifact Increased prevalence and incidence by expected factor	Disease transition rates, mortality, incidence, prevalence Treatment
11.0	Re-run model 5.0 V&V notebook (incidence, prevalence, mortality, etc.) Check that the new observer for months on treatment works as expected and the months on treatment is uniformly distributed between 1 and 8 Check that changes in treatment ramp and efficacy are implemented	Incidence, prevalence, mortality all look as expected. Artifact used was incorrect, need to update to Model 10.0 artifact data. Observer for months to discontinuation has a bug, need to fix Efficacy looks as expected Treatment ramp up is unknown if changes are correct as changes to testing have not been implemented yet	Disease transition rates, mortality, incidence, prevalence Treatment
11.1	Re-run model 5.0 V&V notebook (incidence, prevalence, mortality, etc.) Confirm updates to the new observer for months on treatment work as expected	Some of the incidence and prevalence plots look slightly worse than before, but still in acceptable range Months of treatment observer seems to be working as expected Months to discontinuation has the correct distribution (90% complete all 9 months, 1.25% complete \(k\) months for \(1\le k\le 8\))	Disease transition rates, mortality, incidence, prevalence Treatment
12.0	NOT running V&V on Model 12.0, will instead run on Model 12.1	N/A	N/A
12.1	Re-run model 5.0 V&V notebook (incidence, prevalence, mortality, etc.) Check that testing and treatment only start at age 65 Check that ramp up for testing and treatment is aligned to new input values Check that the new sensitivity value (50%) is implemented correctly Check that simulants are being retested every 3-5 years instead of every 3 years	Model 5.0 V&V looks the same as in model 11.1, except for small differences in the testing and treatment scenarios Testing and treatment correctly start at age 65 Ramp-ups look mostly good, but there was a small bug in the dates for the testing ramp-up Test sensitivity is now 50% as expected Things look good in Abie’s interactive sim notebooks	Disease transition rates, mortality, incidence, prevalence Testing Treatment AI-assisted interactive sim notebooks
12.2	No V&V (due to difficulty in running notebooks on batched results; model is the same as 12.1 except for minor bugfix), just generate results tables	N/A	Results tables and plots in PR 45

Outstanding model verification and validation issues
Issue	Explanation	Action plan	Timeline
YLDs rates do not match in model 1	Thought to be due to incorrect disability weight aggregation	Will be updated when we add severity levels, recheck then	Model 9
Total simulation population increasing in model 3	Thought to be due to GBD mismatch in mortality and incidence	Review again after we reduce to AD only, and when we add in mixed dementias	Models 5 and 8
AD-dementia incidence counts are still a bit off in model 4	AD-incidence by age appears shifted to the left by about 2.5 years, making it too high in younger ages and too low in older ages. We think this is due to our average durations being too long because they don’t account for mortality. Also, AD incidence counts in 2025 are too high, likely because of our initialization strategy for the durations in BBBM-AD at time 0.	Update durations of BBBM-AD and MCI-AD to account for mortality during those stages Try using an exponential distribution instead of a uniform distribution when initializing durations Jira ticket: SSCI-2411	After model 8 or model 9