GVT Hook - Running custom code at GVT computation time

Sometimes, we want to switch our model as it is running, or we want to pause the simulation and check what it is doing, or we want to checkpoint the whole thing. Well, that is finally possible with the GVT hook.

A GVT hook is a function that runs after GVT is performed. It can be run at after every GVT operation or at specific points in the simulation. Basically, we can write a procedure that is not just the processing of an event by an LP. Because it happens at GVT, we can execute the function in all PEs/cores at the same time and we can even run MPI operations within the hook.

So, the hook allows you to pause the main loop initialized by tw_run() at a cosistent place, at once, on all PEs your simulation is running.

NOTE: If you intend to check or edit the state of any LP or event, make sure you run the function tw_scheduler_rollback_and_cancel_events_pe(pe) right at the top of your GVT hook.

GVT Hook function

The GVT hook has the following signature

void (*) (tw_pe * pe, bool past_end_time);

A simple example can be found in the phold example/variant phold-gvt-hook.main.c:

void gvt_hook(tw_pe * pe, bool past_end_time) {
    tw_stime gvt = pe->GVT_sig.recv_ts;

    if (g_tw_mynode == 0) {
        printf("Current GVT time %f\n", gvt);
    }
}

This very simple function just prints the GVT at which it was called.

In order to connect the function to ROSS, set the variable g_tw_gvt_hook to your function. This is often done in the main function, and has to be done before tw_run.

A more sofisticated hook would traverse the LPs in the simulation. Or you could traverse the event queue. Warning: traversing the event queue and the LPs list can be changed at any point in time. The API is stable but we recommend abiding to the skelleton we provide below. Any other changes to the event queue or the LP states might lead to invalid states.

void process_events(tw_pe * pe) {
    int events_processed = 0; // Total events processed from queue
    int events_enqueued = 0;  // Events put back in queue
    int events_deleted = 0;   // Events deleted
    tw_event * dequed_events = NULL; // Linked list of non-deleted events, to be placed back in the queue

    // ===== Traversing events =====
    tw_event * next_event = tw_pq_dequeue(pe->pq);

    // If there aren't any events left to process, then this PE has nothing to do
    if (next_event == NULL) {
        return;
    }

    // Traversing all events stored in the queue
    while (next_event) {
        events_processed++;
        assert(next_event->prev == NULL);
        assert(tw_event_sig_compare_ptr(&next_event->sig, &gvt_sig) >= 0);

        if (next_event->event_id && next_event->state.remote) {
            tw_hash_remove(pe->hash_t, next_event, next_event->send_pe);
        }

        // PROCESS EVENT HERE
        // ...

        if (should_event_be_deleted) {
            tw_event_free(pe, next_event);
            events_deleted++;
        } else {
            next_event->prev = dequed_events;
            dequed_events = next_event;
        }

        next_event = tw_pq_dequeue(pe->pq);
    }

    // Reinjecting non-deleted events into simulation
    while (dequed_events) {
        tw_event * const prev_event = dequed_events;
        dequed_events = dequed_events->prev;
        prev_event->prev = NULL;
        tw_pq_enqueue(pe->pq, prev_event);

        if (prev_event->event_id && prev_event->state.remote) {
            tw_hash_insert(pe->hash_t, prev_event, prev_event->send_pe);
        }

        events_enqueued++;
    }
}

void process_lps(tw_pe * pe) {
    for (tw_lpid local_lpid = 0; local_lpid < g_tw_nlp; local_lpid++) {
        tw_lp * const lp = g_tw_lp[local_lpid];
        assert(local_lpid == lp->id);

        // We have to setup some variables that ROSS expects to be set appropiately in order to schedule new events. These changes might be voided by future versions of ROSS
        lp->kp->last_sig = gvt_sig;
        pe->cur_event = pe->abort_event;
        pe->cur_event->caused_by_me = NULL;
        pe->cur_event->sig = pe->GVT_sig;

        // PROCESS lp, we can even schedule new events now
        // ...
    }
}

void gvt_hook(tw_pe * pe, bool past_end_time) {
    tw_scheduler_rollback_and_cancel_events_pe(pe);
    process_events(pe);
    process_lps(pe);
}

The example above is adapted from a complex GVT hook in CODES in the network-surrogate.c

Hook triggers

In order to trigger the GVT hook, you can use one of three strategies:

Every N GVT operations (tw_trigger_gvt_hook_every)
At a specific timestamp/point in virtual time (tw_trigger_gvt_hook_at)
By the model, when processing an event (tw_trigger_gvt_hook_now)

NOTE: Some triggers are more expensive than others. Least expensive to most expensive: disabled -> every N gvt -> timestamp -> when the model triggers.

Every `N` GVTs

When running in either parallel optimistic or conservative modes, you can simply trigger the GVT hook every N GVT operations:

int main() {
    // ...
    tw_trigger_gvt_hook_every(500);
    // ...
    tw_run();
    // ...
}

Given the intrinsic drift between PEs, GVT operations do NOT happen deterministically. Thus, GVT hook calls will happen always at different timestamps! This means that two models running under identical situations, and running a GVT hook, might not produce the same results.

At a specific timestamp

If you want to run the GVT hook at a particular GVT timestamp/point in virtual time, you can make use of tw_trigger_gvt_hook_at. Notice that because it can only receive ONE timestamp at the time, you have to call it again within your hook if you want it triggered again in the future.

For example:

void gvt_hook(tw_pe * pe, bool past_end_time) {
    tw_stime gvt = pe->GVT_sig.recv_ts;

  if (g_tw_mynode == 0) {
    printf("Current GVT time %f\n", gvt);
  }

  static float trigger_at = 2.0; // initial value is 2.0, then 4, 8, 16, ...
  tw_trigger_gvt_hook_at(trigger_at);
  trigger_at *= 2;
}

int main() {
    // ...
    tw_trigger_gvt_hook_at(1.0);
    // ...
}

Because we know precisely when we want to stop, launching the GVT will always be deterministic and it works on most execution modes (sequential and parallel).

When the model asks for it

If you want an LP to trigger the GVT hook after it process its current event, then you can use: tw_trigger_gvt_hook_now.

Because we are asking for the GVT hook to be triggered AT event processing time, we need to rollback this call too! Also, we need to tell ROSS that we intend to use this mode in the main (tw_trigger_gvt_hook_when_model_calls).

Here’s an example where we trigger GVT on a special event (which we simulate as happening when we roll a zero with a 1001-sided die):

void event_handler(struct your_lp_type *s, tw_bf *bf, your_msg_type *m, tw_lp *lp) {
  // ...
  // trigger GVT hook around every 1000 events
  bf->c0 = 0;
  int const random_occurence = tw_rand_integer(lp->rng, 0, 1000);
  if (lp->gid == 0 && random_occurence == 0) {
    bf->c0 = 1;
    tw_trigger_gvt_hook_now(lp);
  }
}

void event_handler_rc(struct your_lp_type *s, tw_bf *bf, your_msg_type *m, tw_lp *lp) {
  if (bf->c0) {
    tw_trigger_gvt_hook_now_rev(lp);
  }
  tw_rand_reverse_unif(lp->rng);
  // ...
}

int main() {
    // ...
    tw_trigger_gvt_hook_when_model_calls();
    // ...
    tw_run();
    // ...
}

If a model is deterministic (ie, two different runs with identical inputs produce the same result), then it will always trigger the GVT hook at the same timestamps. If your model is not deterministic when running it on optimistic mode, you can check if there are bugs in your reverse handlers by using the sequential rollback check.

ROSS

GVT Hook function

Hook triggers

Every N GVTs

At a specific timestamp

When the model asks for it

Every `N` GVTs