..
  Section title decorators for this document:

  ==============
  Document Title
  ==============
  Section Level 1
  ---------------
  Section Level 2
  +++++++++++++++
  Section Level 3
  ~~~~~~~~~~~~~~~
  Section Level 4
  ^^^^^^^^^^^^^^^
  Section Level 5
  '''''''''''''''

  The depth of each section level is determined by the order in which each
  decorator is encountered below. If you need an even deeper section level, just
  choose a new decorator symbol from the list here:
  https://docutils.sourceforge.io/docs/ref/rst/restructuredtext.html#sections
  And then add it to the list of decorators above.

.. _vivarium_best_practices_statistical_distributions:

=========================================================
Statistics Distributions of Uncertainty
=========================================================

.. contents::
   :local:
   :depth: 1

.. todo::

  Intro

Common Statistical Distributions
--------------------------------

.. todo::

  - Uniform
  - Normal/Truncated normal
  - Lognormal
  - Gamma
  - Beta
  - Ensemble
  - Etc.

.. _lognormal_distribution_section:

Log-normal distribution
+++++++++++++++++++++++

A random variable :math:`X` has a `log-normal distribution`_ with parameters
:math:`\mu` and :math:`\sigma`, denoted :math:`X\sim \mathrm{Lognorm}(\mu,
\sigma^2)`, if and only if its logarithm :math:`Y=\log(X)` has a normal
distribution with mean :math:`\mu` and variance :math:`\sigma^2`. Equivalently,
a random variable :math:`Y` satisfies :math:`Y\sim \mathcal{N}(\mu, \sigma^2)`
if and only if its exponential :math:`X = \exp(Y)` satisfies :math:`X \sim
\mathrm{Lognorm}(\mu, \sigma^2)`.

.. _log-normal distribution: https://en.wikipedia.org/wiki/Log-normal_distribution

.. _lognormal_parameter_uncertainty_section:

Defining a log-normal distribution to simulate parameter uncertainty
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A lognormal distribution may be an appropriate model of uncertainty for any
positive, unbounded quantity :math:`X`. Rates and relative risks are common
model parameters for which a lognormal distribution may be appropriate. If the
literature reports a value :math:`v` for :math:`X` and an asymmetric 95%
confidence interval :math:`(a,b)` such that :math:`v` is close to the geometric
mean of the endpoints (that is, :math:`v \approx \sqrt{ab}`, which is to the
*left* of the interval's midpoint), this is a good indication that the
uncertainty in :math:`X` can be modeled by a lognormal distribution with
geometric mean :math:`v` and central 95% interval :math:`(a,b)`.

A lognormal distribution with parameters :math:`\mu` and :math:`\sigma` has
geometric mean :math:`e^\mu`, which is also equal to the median of the
distribution. Below is Python code to construct a `scipy.stats lognormal
distribution`_ with a specified geometric mean (median) :math:`e^\mu = v =`
``median`` and approximate central 95% interval :math:`(a,b)` (where :math:`a =`
``lower`` and :math:`b =` ``upper``). The interval :math:`(a,b)` will be exactly
the central 95% interval of the distribution if and only if :math:`v =
\sqrt{ab}`.

.. _scipy.stats lognormal distribution: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.lognorm.html

.. _lognorm_from_median_lower_upper_code_block:

.. code-block:: Python

  import numpy as np
  from scipy import stats

  def lognorm_from_median_lower_upper(median, lower, upper, quantile_ranks=(0.025,0.975)):
    """Returns a frozen lognormal distribution with the specified median, such that
    the values (lower, upper) are approximately equal to the quantiles with ranks
    (quantile_ranks[0], quantile_ranks[1]). More precisely, if q0 and q1 are
    the quantiles of the returned distribution with ranks quantile_ranks[0]
    and quantile_ranks[1], respectively, then q1/q0 = upper/lower. If the
    quantile ranks are symmetric about 0.5, lower and upper will coincide with
    q0 and q1 precisely when median^2 = lower*upper.
    """
    # Let Y ~ Norm(mu, sigma^2) and X = exp(Y), where mu = log(median)
    # so X ~ Lognorm(s=sigma, scale=exp(mu)) in scipy's notation.
    # We will determine sigma from the two specified quantiles lower and upper.

    # mean (and median) of the normal random variable Y = log(X)
    mu = np.log(median)
    # quantiles of the standard normal distribution corresponding to quantile_ranks
    stdnorm_quantiles = stats.norm.ppf(quantile_ranks)
    # quantiles of Y = log(X) corresponding to the quantiles (lower, upper) for X
    norm_quantiles = np.log([lower, upper])
    # standard deviation of Y = log(X) computed from the above quantiles for Y
    # and the corresponding standard normal quantiles
    sigma = (norm_quantiles[1] - norm_quantiles[0]) / (stdnorm_quantiles[1] - stdnorm_quantiles[0])
    # Frozen lognormal distribution for X = exp(Y)
    # (s=sigma is the shape parameter; the scale parameter is exp(mu), which equals the median)
    return stats.lognorm(s=sigma, scale=median)

Note that since the log-normal distribution has only two parameters :math:`\mu`
and :math:`\sigma`, using three parameters :math:`v=` ``median``, :math:`a=`
``lower``, and :math:`b=` ``upper`` to specify the distribution results in an
overdetermined system if the specified median :math:`v` is not exactly equal to
the geometric mean of :math:`a` and :math:`b`. In ths case, the above code
actually determines the distribution's two parameters from the median :math:`v`
and the ratio :math:`b/a`, and the returned distribution's central 95% interval
will only be approximately equal to :math:`(a,b)`.

If the confidence interval :math:`(a,b)` was in fact generated from a log-normal
distribution with the reported value :math:`v` being an estimate of the
geometric mean :math:`e^\mu` of :math:`X`, then any discrepancy between
:math:`v` and :math:`\sqrt{ab}` would be due to rounding errors. In this case,
the above function incorporates all available data and generally results in a
better fit than using only one of the two estimated quantiles :math:`a` or
:math:`b`. See Nathaniel's `notebook investigating different lognormal
distributions`_ for the :ref:`CIFF acute malnutrition project <2019_concept_model_vivarium_ciff_sam>`.

.. _notebook investigating different lognormal distributions: https://github.com/ihmeuw/vivarium_research_ciff_sam/blob/main/wasting_transitions/uncertainty/2021_09_03c_lognormal_distributions_for_k_sam.ipynb

.. todo::

  Investigate more fully what the above algorithm does when there is no
  lognormal distribution matching the three parameters ``median``, ``lower`` and
  ``upper`` for the specified quantile ranks.

  As noted in the docstring, here's a brief description of what the algorithm
  does. Let :math:`v=` ``median``, :math:`a=` ``lower``, and :math:`b=`
  ``upper``, and let :math:`(p_0, p_1) =` ``(quantile_ranks[0],
  quantile_ranks[1])`` be the specified quantile ranks. Then the returned
  lognormal distribution has median :math:`v` and quantiles :math:`a'` and
  :math:`b'` of ranks :math:`p_0` and :math:`p_1` such that :math:`b'/a' = b/a`.

Common Model Parameters and Their Possible Appropriate Uncertainty Distributions
--------------------------------------------------------------------------------

.. todo::

  - Relative risk
  - Mean difference
  - Proportion
  - Cost estimate
  - Etc.

Other Considerations
--------------------

.. todo::

  - How to handle very asymmetric confidence intervals
  - How to handle uncertainty in data source(s) rather than statistical uncertainty from a single high quality data source?
    - Ex: combining multiple estimates from published papers with their own statistical uncertainty
  - How to handle uncertaity when extrapolating a subnataional estimate to a national estimate?
  - How to handle uncertainty distribution in the case of joint distributions