Overview of Engineering Files
This document is designed to provide an overview of the files found in the engineering repository and some examples of when you might need to use each file. This is NOT exhaustive as some files are less relevant for research team members, and the files included change between models.
This page is organized in sections by the different folders found in engineering repositories and then the pages within folders, as relevant.
Todo
Add information on ownership of documents to this file. What is RT responsibility vs engineering.
Project Folder
In the initial project folder, there are some files for information storage and documentation that are helpful to know about. These are not used in the actual model, but you might need to edit them in order to document your work.
CHANGELOG
The CHANGELOG.rst file is a way to track what has been run and what
updates took place with each model run. This information should match the
model runs seen in Vivarium Research documenation. When you run a model,
be sure to update this.
README
The README.rst file is found in most GitHub repositories, not just
on our team! It provides information on cloning and running the code stored
in the repo, and how to setup your environment. This is a helpful resource
to review when you first start on a project.
Additionally, you might have to update and edit the README.rst file
while doing archiving, but more details on this can be found on
the archiving page.
Components
The components folder contains files on the information for each
modeling “component”. This includes things like disease models,
risk factors, pregnancies, the health care system,
interventions, or observers.
The files included will be specific to your project rather than generalizable. Additionally, these files will mostly be maintained and written by the engineering team. For those reasons, we won’t elaborate on these files here.
Constants
The constants folder contains input information for the model. This includes
things like data values, locations, file paths, draw counts, column
names, and outputs needed.
There are a few reasons you might use files from this folder - changing data inputs, adjusting information for the artifact, or validating file paths. The below files are not all inclusive, but represent most of what you should be aware of.
data_keys
The data_keys.py file dictates which keys are run to make the artifact. Any
keys not included in the MAKE_ARTIFACTS_KEY_GROUPS list at the bottom of the
file will be ignored when making artifacts. More information on how and when to
edit this can be found on the artifact building page.
metadata
The metadata.py file stores metadata information for the model. For example,
locations the model can be run for, age groups, draw counts, index columns or similar.
Research will not be expected to edit this information for the most part, but may
need to change the locations for artifact building, although this can also be done
from the command line. More information
can be found on the artifact building page.
paths
The paths.py file contains the file paths for all input data. When editing input data (especially
RT generated data), check to see if you need to update the file path or file name here.
Data
The data folder is tools and information on loading and creating the input
data for the simulation. This includes loader functions for gathering
GBD or RT generated data into artifact formatting. You will primarily
interact with this folder when doing artifact generation.
Here, we only specifically include the loader file, as this should be the primary one we work with. However, there might be other files like utilities or extra_gbd which contain supporting functions. If you need, trace functions back to these other files.
loader
The loader.py file loads all of the data for the simulation, formats it, and
saves it to the artifact. At the top you will see a list of data keys that correspond
to information in the artifact. At the end of each data keys is the name of a function
that is used to generate that data. Many of the functions are included below in the
remainder of the file, though notably not all.
If you need to format data into the artifact, or adjust how information is pulled and saved, start by looking in this file. As mentioned above, some of the functions are stored elsewhere, so don’t be surprised if there is information on another page.
Model Specifications
The model_specifications folder contains information on
running the model with simulate or psimulate. You
will need to look through and adjust these files whenever you want
to run the simulation.
These files are .yaml files. There is general information on YAML basics here.
Information included here:
Which components (including observers) will be included in a model run (e.g., maybe you wish to run a model with interventions “turned off”)
Population size, seed count, and draw count
The time the simulation runs for and time step size
Any stratifications for observers
model_spec
The model_spec.yaml file contains the majority of the information
on what to include in a given model run. This includes things like what
components to include, the population size, what draw and artifact to use,
and stratifications for observers.
Some of this information is only used if you run a single model run, rather than many model runs (1 draw, seed, location, and scenario). But more information on this can be found on the running simulations page.
Engineering notes can be found on this model specs file page.
scenarios
The scenario.yaml file, usually within the branches folder, is used to determine what runs
are needed. In it usually quite a short file and only includes things like draw
count, seed count, and interventions to include. It is important to check this
matches the needed run size for V&V runs.
Engineering notes can be found on this branches file page.
Tools
The tools folder contains tools that work in the background of the simulation. Generally,
you won’t need to edit anything in this folder. However, there is some helpful information
here.
cli
The cli.py file contains some information on the commands for running simulations,
making results, and making the artifact. However, this information is
documented elsewhere is a more clear format, or you can run code in the command line
to get this information. For example psimulate --help.