Low Birth Weight and Short Gestation: GBD 2017
Risk overview
Todo
Describe this risk
GBD 2017 modelling strategy
The meaning of the “low birth weight” and “short gestation” in GBD have subtle definitional differences compared to other usages of “low birth weight” and “short gestation” in literature. The term “low birth weight” has historically been used to refer to birth weight (BW) less than 2500 grams. However, because the goal of the GBD risk factors analysis is to quantify the entirety of attributable burden due to each risk factor, the GBD definition of “low birth weight” therefore refers to all birth weight below the Theoretical Minimum Risk Exposure Level (TMREL) for birth weight. Likewise, new-borns have been typically been classified into gestational age (GA) categories of “extremely preterm” (<28 weeks of gestation), “very preterm” (28-<32 weeks of gestation), and “moderate to late preterm” (32-<37 weeks of gestation). “Short gestation” in GBD refers to all gestational ages below the gestational age TMREL.
Exposures and relative risks for the GBD Low birth weight and short gestation risk factors are divided into joint 500-gram birth weight and 2-week gestational age combinations. The lowest risk overall 500- gram/2-week bin is the overall TMREL. The univariate TMRELs vary with GA and BW. The lowest risk GA varies by BW category and the lowest risk BWs vary with GA category. The latter are used to quantify univariate attributable risk. Under this framework, all attributable burden under the joint TMREL is referred to jointly as burden of LBWSG. All attributable burden to BWs under the TMREL for each GA category are, on aggregate, “low birth weight” and all attributable burden to GAs under the TMREL for each BW category are, on aggregate, “short gestation.” Each combination of 500-grams and 2-wks is associated with a relative risk for mortality by neonatal period (early and late neonatal) and by the causes, and relative to the joint TMREL.
Note
PAF-of-1 with LBWSG:
The cause Neonatal preterm birth complications is 100% attributable to this risk.
Risk exposure in GBD 2017
How is the exposure estimated in GBD 2017
To model the joint distribution of exposure of low birth weight and short gestation for each location, year, and sex estimated in GBD 2017, three types of information are used:
Distribution of gestational age for each location, year, and sex
Distribution of birth weight for each location, year, and sex
Copula family and parameters, specifying correlation between gestational age and birth weight distributions
Exposure modelling strategy in GBD 2017
GBD 2017 creates a joint distribution of birth weight and gestation age to create the low birth weight short gestation risk factor. It takes birth weight and gestational age microdata from 11 locations and uses ensemble model methods standard to GBD risk factors, to first create separate distributions of birth weight and gestational age for every location-sex-year. Then to model the joint distribution of gestational age and birth weight from separate distributions, the Spearman correlation for each country where joint microdata was available was pooled across all years of data available. This ranged from 0.25-0.49. Pooling across all countries in the dataset, the overall Spearman correlation was 0.38. Copula modelling was used to model joint distributions between the birth weight and gestational age marginal distributions. The joint distribution is then divided into 500g by 2wk bins. Birth prevalence was then calculated for each 500g by 2wk bin.
Note
The risk appendix’s description of “2-week age bins” is not totally accurate because:
There are two 1-week age bins (36-37 weeks, and 37-38 weeks).
There are two categories where the age range is 0-24 weeks (all the “extremely extreme” preterm births are grouped together). See image of LBWSG categories below
Risk effects in GBD 2017
Relative risks estimate in GBD 2017
The available data for deriving relative risk was only for all-cause mortality. For each location, the risk of all-cause mortality at the early neonatal period and late neonatal period at joint birth weight and gestational age combinations was calculated. In all datasets except for the United States, sex-specific data were combined to maximise sample size. The United States analyses were sex-specific. Relative risks were then calculated for each 500g and 2wk combination.
TMREL in GBD 2017
For each of the country-derived relative risk surfaces, the 500 g and 2-week gestational age joint bin with the lowest risk was identified. This bin differed within each country dataset. To identify the universal 500 g and 2-week gestational age category that would serve as the universal TMREL, all bins that were identified as the TMREL was chosen. This is cat55 (40-42ga, 3500-400g) and cat56 (40-42ga, 4000-4500g)
Note
the TMREL categories listed in GBD 2017 risk appendix are wrong.
Causes that are affected by LBWSG
The available data for deriving relative risk was only for all-cause mortality. The exception was the USA linked infant birth-death cohort data, which contained 3-digit ICD causes of death, but also had nearly 30% of deaths coded to causes that are ill-defined, or intermediate, in the GBD cause classification system. GBD 2017 analysed the relative risk of all-cause mortality across all available sources and selected outcomes based on criteria of biologic plausibility. Some causes, most notably congenital birth defects, haemoglobinopathies, malaria, and HIV/AIDS, were excluded based on the criteria that reverse causality could not be excluded. The final list of outcomes included in calculating the attributable burden for LBWSG are in the table below.
Cause id |
Cause (outcomes) |
|---|---|
302 |
diarrheal diseases |
322 |
lower respiratory tract infections |
328 |
upper respiratory tract infections |
329 |
otitis media |
333 |
pneumococcal meningitis |
334 |
H influenzae type B meningitis |
335 |
meningococcal meningitis |
336 |
other meningitis |
337 |
encephalitis |
381 |
neonatal preterm birth complications |
382 |
neonatal encephalopathy due to birth asphyxia and trauma |
383 |
neonatal sepsis and other neonatal infections |
384 |
hemolytic disease and other neonatal jaundice |
385 |
other neonatal disorders |
686 |
sudden infant death syndrome |
Todo
discuss in detail the PAF of 1 causes.
Restrictions
LBWSG risk effect on all-cause moratality only applies to the early neonatal and late neonatal age groups.
Restriction type |
Value |
Notes |
|---|---|---|
Male only |
False |
|
Female only |
False |
|
Age group |
early neonatal (0-6 days) late neonatal (7-28 days) |
id 2 id 3 |
Vivarium modelling strategy
Risk exposure in Vivarium
In GBD 2017, LBWSG exposure is modeled as an ordered polytomous distribution specifying the prevalence of births in each 500g-2week birthweight-ga bin/category. We first convert this discrete exposure distribution into a continuous joint exposure distribution of birthweight and gestational age by assuming a uniform distribution of birthweights and gestational ages within each bin/category. In this way, each simulant can be assigned a continuously distributed birthweight and gestational age, which can then be easily mapped back to the appropriate risk category in GBD. Python code for achieving these transformations can be found in Abie’s notebook in the Vivarium Data Analysis repo.
Note
This strategy is likely biasing towards overestimating extreme birthweights or gestational ages. For example, in the 0-500g category, most babies are probably pretty close to 500g, not equally likely to be <1 gram versus 499-500 grams.
Risk effects in Vivarium
The relative risk of each LBWSG category in GBD is for all-cause mortality in the early and late neonatal period. However, GBD identifies only a subset of causes (not all causes) that are affected by LBWSG, listed in the table above. Therefore, despite the RR’s being measured for all-cause mortality, we are interested in applying the PAF and relative risks only to the cause-specific mortality rates of the causes that GBD considers to be affected by LBWSG.
To do this, we first decompose the all-cause mortality rate (ACMR) as the sum of:
mortality from causes that are affected by LBWSG and modelled in the sim (green)
mortality from causes that are affected by LBWSG but not modelled in the sim (blue)
mortality from causes that are unaffected by LBWSG and modelled in the sim (salmon)
mortality from causes that are unaffected by LBWSG but not modelled in the sim (pink)
Our strategy will be to apply the relative risks and PAF only to the green and blue causes, i.e. those GBD says are affected by LBWSG. The rest of this section describes the details of how to do this. See the Assumptions and Limitations section for a discussion of the strengths and limitations of this approach, and a comparison with other possible strategies.
Cause categories
An example of the above color-coded cause breakdown from the large-scale-food fortification concept model concept model diagram is shown below:
Cause |
Causes by risk factors |
||||
|---|---|---|---|---|---|
Group |
ID |
LBWSG |
vitamin A |
iron |
folic acid |
Modelled causes affected by LBWSG |
302 |
diarrheal diseases |
diarrheal diseases |
||
322 |
lower respiratory tract infection |
lower respiratory tract infection |
|||
Un- modelled causes affected by LBWSG |
328 |
upper respiratory tract infections |
|||
329 |
otitis media |
||||
333 |
pneumococcal meningitis |
||||
334 |
H influenzae type B meningitis |
||||
335 |
meningococcal meningitis |
||||
336 |
other meningitis |
||||
337 |
encephalitis |
||||
381 |
neonatal preterm birth complications |
||||
382 |
neonatal encephalopathy |
||||
383 |
neonatal sepsis and oth er neonatal infections |
||||
384 |
hemolytic disease and other neonatal jaundice |
||||
385 |
other neonatal disorders |
||||
686 |
sudden infant death syndrome |
||||
Modelled causes unaffected by LBWSG |
341 |
measles |
|||
389 |
vitamin A |
||||
390 |
dietary iron deficiency |
||||
642 |
neural tube defects |
||||
Un- modelled causes unaffected by LBWSG |
— |
causes not in our model |
|||
Note
To pull CSMRs for the blue causes, use measure_id for death and metric_id for rate
Individual mortality hazard and all-cause mortality rate
At any time \(t\) in a Vivarium simulation, each individual \(i\) has an instantaneous mortality rate (i.e. mortality hazard) \(\text{mr}(i) = \text{mr}_t(i)\) that dictates how likely they are to die in the next instant. The mortality hazard is dependent on which cause states the individual is in at time \(t\). Our goal is to define the individual mortality hazard \(\text{mr}(i)\) so that the LBWSG relative risks for mortality are applied only to the causes that GBD considers to be affected by LBWSG (green and blue), while preserving the requirement that the expected value (denoted by \(E\)) of the mortality hazard equals the all-cause mortality rate for the individual’s location, year, age group, and sex:
(In actuality, this equation may only hold approximately when following our approach; see note below.) All-cause mortality is the sum of all the cause-specific mortality rates (CSMRs):
Likewise, we will decompose the individual mortality hazard \(\text{mr}(i)\) as a sum of individual-level cause-specific mortality hazards, defined according to the green/blue/salmon/pink breakdown (i.e. modelled vs. unmodelled causes and affected vs. unaffected causes).
Note
To minimize the amount of data we need to pull from GBD, we can solve for the sum of mortality rates from unmodelled causes unaffected by LBWSG (pink) in terms of the all-cause mortality rate and the CSMRs of the green, blue, and salmon causes:
This equation can be substituted into (2) and (3) below to eliminate the pink causes from the computation of the mortality hazard and background mortality rate for an individual simulant.
Note
Throughout this section, we will use the following notational convention for quantities related to an individual simulant \(i\):
Abbreviations in all-capital letters, such as ACMR or CSMR above, and EMR and BGMR below, denote quantities that depend only on an individual’s demographic group in GBD (location, year, age group, sex), but not on other modeled quantities of the individual in our simulation. We consider these variables “constant” for a fixed demographic group, and we suppress their explicit dependence on the individual \(i\) to reduce notational clutter.
Abbreviations in all-lower-case letters, such as \(\text{mr}\) above, or \(\text{cat}\), \(\text{state}\), \(\text{csmr}\), and \(\text{bgmr}\) below, denote quantities that depend on an individual’s current state in the simulation. We cannot treat these quantities as “constant” in the sense above.
Defining the individual mortality hazard
We now describe our strategy for defining the individual mortality hazard \(\text{mr}(i)\), taking an individual’s LBWSG category into account. For the modelled causes (green and salmon) we will use the excess morality rates (EMRs) instead of the CSMR. The EMR is cause-state dependent while the CSMR is the average EMR over all cause states (including the “without condition” state). For example, the excess mortality rates for a two-state cause (with condition / without condition) would be:
mortality rate due to cause if the person does NOT have the condition: EMR=0
mortality rate due to cause if the person HAS the condition: EMR of the condition (with EMR > CSMR)
We will need the following variables (see the note below for information about the RR’s and PAF):
Note that since \(\text{state}_c(i)\) implicitly depends on the time \(t\), the individual mortality hazard will also depend on time.
Important
While relative risks (RR’s) in GBD are usually specific to a risk-cause pair, the relative risks of LBWSG are for all-cause mortality, and therefore the RR’s are the same for all causes affected by LBWSG. As noted above, although these RR’s were computed for all-cause mortality, we will only be applying them to causes GBD says are affected by LBWSG (green and blue).
Correspondingly, the population attributable fraction (PAF) is the same for any of the LBWSG-affected causes (green and blue), except for neonatal preterm birth, which has a PAF of 1. The PAF should be pulled at the most-detailed-cause level, or else computed explicitly from the LBWSG risks and exposures. Its value in India, for example, is approximately 0.94 (see LBWSG PAF notebook), which roughly matches the most-detailed-level PAF in GBD for any of the LBWSG-affected causes except for preterm birth (differences are probably due to rounding errors). Note that although the PAF for preterm birth is 1, we will nevertheless apply the same PAF (e.g. ~0.94 in India) to preterm birth as to all the other affected causes.
Using the above variables, we will define the following individual mortality rates below:
For each cause \(c\), define the conditional cause-specific mortality hazard for individual \(i\) to be
The descriptor “conditional” here means that the above individual csmr’s can be interpreted as the expected cause-level CSMR’s conditioned (i.e. stratified) on all the individual cause states observed in the simulation (note that we can only observe cause states for modelled causes). In other words, \(\text{csmr}_c(i)\) is the conditional expectation of individual \(i\)’s cause-specific mortality hazard, given whether \(c\) is one of the causes we are modeling, and if so, given which of \(c\)’s cause states the individual is in.
Now we additionally stratify/condition the csmr’s by the individual’s LBWSG category. Define the LBWSG-stratified cause-specific mortality hazard of \(c\) for individual \(i\) to be
As described above, we are applying the PAF and relative risks only to the causes GBD considers affected by LBWSG. For the affected causes, we first compute the risk-deleted mortality rate by multiplying the individual csmr by \((1-\text{PAF})\), then multiply by the relative risk for the individual’s LBWSG category to get the cause-specific mortality hazard corresponding to that risk category.
The individual’s total mortality hazard, stratified by all modeled cause states and LBWSG risk categories, is then
because
When implementing (2), recall that \(\sum_{c\in\text{pink}} \text{CSMR}_c\) can be computed using (1).
Todo
Show that \(E[\text{mr}_t(i)] \approx \text{ACMR}\), with equality if \(\text{state}_c(i)\) is independent of \(\text{cat}(i)\) at time \(t\).
Question: Are these independent in general or not? It seems like since we are applying the relative risks to the with-condiiton states, these states will become less likely to be observed with higher risk LBWSG categories as time goes on. Instead of 1-PAF, is there some other quantity we should be multiplying the EMR by to get the right answer? E.g. since we are applying it to a subgroup of the entire population, should it be something like the “attributable fraction among cases” instead of the population attributable fraction?
Todo
add more description of the all-causes PAF and most-detailed-cause PAF and the logical reasoning for using one over the other.
add the problems we ran in and how we ended up trouble-shooting and came to the conclusion to use the most-detailed-cause PAF
discuss the implications of including preterm birth in the causes to which we are applying the PAF and relative risks, and why we decided to do it this way (note that this is inherently inconsistent since preterm birth is PAF-of-1 with LBWSG, but this approach seems reasonably consistent with what the GBD modelers did, which itself is inconsistent).
we can also discuss the other equations that thought up but did not end up using.
this way the discussion in the assumptions and limitations section will have more context (perhaps most of the above things should go in that section).
Assigning a cause of death
First we describe how cause of death is assigned in Vivarium’s standard Mortality component, then we describe how to modify the procedure if LBWSG is included in the model.
Cause of death without LBWSG
In standard Vivarium models not including LBWSG, an individual’s mortality hazard is defined to be
where \(\text{BGMR}\) is the background mortality rate for the simulation, i.e. the mortality rate for simulant \(i\)’s location/year/age/sex due to all unmodelled causes:
We also refer to BGMR as the cause-deleted mortality rate, since it is the mortality rate obtained by removing all the modelled causes.
If simulant \(i\) dies, the cause of death is assigned randomly, either to one of the modelled causes, or else to the category other_causes if the death was due to a cause we are not explicitly modeling. The random assignment is made by sampling from the following probability distribution:
and
Note that this does in fact define a probability distribution since
This probability distribution can be derived by observing that each individual cause-specific mortality hazard is the probability density that i dies of cause c in the next small time interval \(\Delta t\).
Todo
Make the above statement more precise and write out the equations to show that the probability distribution gives the right thing.
Note
The assignment of a cause of death should be independent of the decision of whether the simulant died. That is, a new random number should be generated to sample from the above probability distribution for cause of death, independent of the random number compared with the mortality hazard to determine whether the simulant dies.
Cause of death with LBWSG included
We follow essentially the same strategy as above to assign a cause of death when LBWSG is included, but we take into account the different individual cause-spceific mortality hazards depending on the individual’s LBWSG category.
First define individual \(i\)’s background mortality rate to be
Recall that \(\sum_{c\in\text{pink}} \text{CSMR}_c\) can be computed using (1).
Now define the cause-of-death probability distribution by
and
To assign a cause of death when LBWSG is included, randomly sample a cause (or other_causes) from the above probability distribution, independent of other random choices.
Todo
Update the above equations and prose with more descriptive variable names in addition to colors. See comments in PR 239
Assumptions and Limitations
Apply relative risks only to causes affected by LBWSG in GBD
Strengths
o This approach is consistent with GBD methodology and avoids artificially decreasing the mortality rate for individual causes that are not affected by improvements in LBWSG (due to reverse causality or other concerns).
Limitations
o The risk appendix of GBD 2017 says that the data available to compute the relative risks (RR) for the risk exposure LBWSG are for the outcome of all-cause mortality. GBD then evaluated the relative risk of all-cause mortality across all available sources. Based on criteria of biologic plausibility, a list of causes for which GBD believes LBWSG impacts mortality through were selected. Some causes, most notably congenital birth defects, haemoglobinopathies, malaria, and HIV/AIDS, were excluded based on the criteria that reverse causality could not be excluded. GBD assumed that the relative risks for all-cause mortality rates by LBWSG category applied equally to mortality rates from each of these blue causes only and did not apply to any other GBD causes in order to calculate the population attributable burden due to LBWSG; in other words, the conservatively ignored the potential impact of LBWSG on mortality due to causes that did not meet their causal criteria. We are choosing to apply the RRs only to this list of LBWSG-affected causes. We believe this is consistent with GBD’s approach but may not fully reflect what the RRs capture.
o Because we are applying the same all-cause mortality RR to all affected causes, we are not able to evaluate the impact of LBWSG on cause-specific mortality accurately.
Bias
Notably, it is uncertain if this approach will cause an exaggeration or underestimation of the impact of LBWSG on mortality in the neonatal age groups in our models compared with real-life because it requires an evaluation of the relative risks of mortality by LBWSG exposure category stratified by affected and unaffected causes and these data are not readily available to us.
o One source of bias could be from not including the reverse-causality causes: suppose we have a nutritional supplement that impacts LBWSG. This supplement was tested in an RCT in western Kenya where malaria is prevalent. Suppose there is some causal link in both directions between birthweight and malaria. For example, malaria during pregnancy can cause low birth weight babies due to the accumulation of parasites in the placentas of pregnant women. She can also pass on the malaria to the baby before or during childbirth. A low birth weight baby may also be more susceptible to diseases including malaria. So if a baby is low birth weight and has malaria, we do not know 100% whether this was ‘congenital malaria’ acquired from the mother before or during delivery and the mother’s malaria caused its low birth weight, or whether the baby was born low birth weight malaria-free but had higher likelihood of acquiring malaria from an infectious mosquito bite. Without a well designed study, it is hard to know. Hence GBD did not include malaria in the list of LBWSG-affected causes. If we improve birthweight in this population due to the supplement, we also decrease incidence of malaria in the latter case (the low birth weight baby born malaria free, but then acquired it because it was low birth weight), and decrease mortality from malaria. However, this effect through malaria will not be captured in our model, so our modelled effect on neonatal mortality might be less than the empirial effect of this supplement on neonatal mortality.
o GBD assumes that the RR’s for CSMR for each LBWSG-affected-causes (green and blue) are the same as the overall RR for ACMR (RR_acmr). This won’t matter for the blue causes that we aren’t modeling explicitly, but for the green causes that we are modeling, it could throw off our results depending on whether the RR’s for that cause (RR_csmr) is larger or smaller than the overall RR for all causes (RR_acmr).
o Another source of bias could be from not applying the RRs to the causes they are intended for. Following from the limitation mentioned above, we are applying the RRs in an inconsistent manner with that they represent: they represent a ratio of ACMRs (let’s call it \(RR_{acmr}\)), but we are using them as a ratio of all-“affected (blue and green) cause”-mortality-rates (let’s call this \(RR_{aacmr}\)). We do not know whether the \(RR_{acmr}\) is larger or smaller than the \(RR_{aacmr}\).
If the \(RR_{acmr}\) < \(RR_{aacmr}\), we are underestimating deaths.If the \(RR_{acmr}\) > \(RR_{aacmr}\) then we are over-estimating deaths.This can be illusted by the following equations:
LWB=low birth weight babiesNBW=normal birth weight babies (or TMREL category)\(RR_{acmr}\) = \(\frac{\text{(LBW\_deaths\_affected + LBW\_deaths\_unaffected)/LBW\_births}}{\text{(NBW\_deaths\_affected + NBW\_deaths\_unaffected)/NBW\_births}}\)
= \(\frac{\text{(LBW\_deaths\_affected + LBW\_deaths\_unaffected)}}{\text{(NBW\_deaths\_affected + NBW\_deaths\_unaffected)}} \times \frac{\text{NBW\_births}}{\text{LBW\_births}}\)
\(RR_{aacmr}\) = \(\frac{\text{LBW\_deaths\_affected/LBW\_births}}{\text{NBW\_deaths\_affected/NBW\_births}}\)
= \(\frac{\text{LBW\_deaths\_affected}}{\text{NBW\_deaths\_affected}} \times \frac{\text{NBW\_births}}{\text{LBW\_births}}\)
Since we do not know the ratio of the number of \(\text{LBW\_deaths\_unaffected}\) to the number of \(\text{NBW\_deaths\_unaffected}\), we do not know the direction of bias. We would need to analyse the stratified microdata.
Todo
check to see (LBW_deaths_unaffected / NBW_deaths_unaffected) ?<? (LBW_deaths_affected / NBW_deaths_affected) or the reverse inequality?
if this above inequality is true, then it implies RR_acmr < RR_aacmr (the math checks out)
at first glance, the above inequality seems more likely than the reverse, BUT the unaffected causes include reverse causality causes which can complicate things.
thus, we should dig into a bit more later
Risk Exposure Model Diagram
Data Description Tables
Validation Criteria
Our baseline scenario should compare with GBD artifact data with regards to:
LBWSG exposure categories (note: consider a proxy for this so that we don’t need to observe person time in each category, perhaps mean BW or mean RR or birth prevalence?)
All-cause mortality rates in the early neonatal and late neonatal categories
Pay special attention to the green causes (affected, modelled), as it’s possible that CSMR’s will not exactly match for these, throwing off the ACMR.
According to the math, the CSMRs for the blue and pink causes should validate, so it would be a good idea to explicitly compare “deaths due to other causes” in our model to the sum of CSMRs in these groups.