A RIO checkpoint contains LP and event data (along with metadata) captured at the end of a ROSS simulation. These checkpoints can be used to restart and continue the simulation from where it left off.

Partitions

RIO takes advantage of the existing ROSS mapping API. Mainly, each KP in a ROSS PE becomes one partition in the RIO checkpoint. Be sure to set the correct g_tw_nkp value when initializing the simulation. As long as the lp-to-kp is valid, RIO can easily create many partitions from a single MPI rank.

Files

To facilitate multi-simulation checkpoint assembly (an advanced feature of RIO), there are many discrete files that make up a single RIO checkpoint. One file contains a human-readable description of the checkpoint while the remainder of the files contain binary data. RIO checkpoints are not guaranteed to work across different file/operating systems.

Human-Readable Description

The human-readable checkpoint description file has the extension .txt. This file includes version numbers (Git hashes), date and time of checkpoint creation, CMake build settings, and any command-line/run-time settings.

This file is not parsed by RIO when loading a checkpoint. Its purpose is to facilitate data preservation and allow a user to restart the checkpointed simulation.

Partition Metadata

There is only one partition metadata file, with the extension .rio-md. There are six pieces of metadata information captured for each partition in the checkpoint:

  • Partition number (note: the numbering scheme is neither unique nor linear)
  • Partition file: which data file contains the partition data
  • Partition offset: where the partition data begins within the data file
  • Partition size: size of the partition data within the data file
  • LP count: number of LPs in the partition
  • Event count: number of events in partition

LP Size Metadata File

There is only one LP metadata file. It has the extension .rio-lp. Since LPs may be of variable sizes, the size metadata for each LP in the system must be captured.

LP and Event Data Files

There may be multiple data files. This number can be configured through the --io-files=n command line option and the second argument to the io_store_checkpoint function.

Each data file has the extension .rio-data-n indicating the data file number.