Running the Simulator

Quick Help

Now that you have built ROSS (and possibly learned a bit about building models), change to the ross-build/models/phold directory. You should see several files including a phold binary. Running the binary with the --help flag should give the following output:

odin:phold$ ./phold --help
usage: phold [options] [-- [args]]

PHOLD Model:
  --remote=ts           desired remote event rate (default 0.25)
  --nlp=n               number of LPs per processor (default 8)
  --mean=ts             exponential distribution mean for timestamps (default 1.00)
  --mult=ts             multiplier for event memory allocation (default 1.40)
  --lookahead=ts        lookahead for events (default 1.00)
  --start-events=n      number of initial messages per LP (default 1)
  --stagger=n           Set to 1 to stagger event uniformly across 0 to end time. (default 0)
  --memory=n            additional memory buffers (default 100)
  --run=str             user supplied run name (default undefined)

ROSS MPI Kernel:
  --read-buffer=n       network read buffer size in # of events (default 16)
  --send-buffer=n       network send buffer size in # of events (default 1024)

ROSS Kernel:
  --synch=n             Sychronization Protocol: SEQUENTIAL=1, CONSERVATIVE=2, OPTIMISTIC=3, OPTIMISTIC_DEBUG=4, OPTIMISTIC_REALTIME=5 (default 0)
  --nkp=n               number of kernel processes (KPs) per pe (default 16)
  --end=ts              simulation end timestamp (default 100000.00)
  --batch=n             messages per scheduler block (default 16)
  --extramem=n          Number of extra events allocated per PE. (default 0)
  --buddy-size=n        delta encoding buddy system allocation (2^X) (default 0)
  --lz4-knob=n          LZ4 acceleration factor (higher = faster) (default 17)
  --cons-lookahead=ts   Set g_tw_lookahead (default 0.01)
  --max-opt-lookahead=n Optimistic simulation: maximum lookahead allowed in virtual clock time (default 18446744073709551615)
  --avl-size=n          AVL Tree contains 2^avl-size nodes (default 18)

ROSS MPI GVT:
  --gvt-interval=n      GVT Interval: Iterations through scheduling loop (synch=1,2,3,4), or ms between GVTs (synch=5) (default 16)
  --report-interval=ts  percent of runtime to print GVT (default 0.01)

ROSS Timing:
  --clock-rate=ts       CPU Clock Rate (default 1000000000.00)

ROSS Instrumentation:
  --engine-stats=n      Collect sim engine level stats; 0 don't collect, 1 GVT-sampling, 2 RT sampling, 3 VT sampling, 4 All sampling modes (default 0)
  --model-stats=n       Collect model level stats (requires model-level implementation); 0 don't collect, 1 GVT-sampling, 2 RT sampling, 3 VT sampling, 4 all sampling modes (default 0)
  --num-gvt=n           number of GVT computations between GVT-based sampling points (default 10)
  --rt-interval=n       real time sampling interval in ms (default 1000)
  --vt-interval=ts      Virtual time sampling interval (default 1000000.00)
  --vt-samp-end=ts      End time for virtual time sampling (if different from g_tw_ts_end) (default 0.00)
  --pe-data=n           Turn on/off collection of sim engine data at PE level (default 1)
  --kp-data=n           Turn on/off collection of sim engine data at KP level (default 0)
  --lp-data=n           Turn on/off collection of sim engine data at LP level (default 0)
  --event-trace=n       collect detailed data on all events for specified LPs; 0, no trace, 1 full trace, 2 only events causing rollbacks, 3 only committed events (default 0)
  --stats-prefix=str    prefix for filename(s) for stats output (default )
  --stats-path=str      path to directory to save instrumentation output (default )
  --buffer-size=n       size of buffer in bytes for stats collection (default 8000000)
  --buffer-free=n       percentage of free space left in buffer before writing out at GVT (default 15)
  --disable-output=n    used for perturbation analysis; buffer never dumped to file when 1 (default 0)

Specialized ROSS LPs:
  --sample-count=n      Number of samples to allocate in memory (default 65536)
  --help                show this message

Example

There are quite a few options available to us. Often, only one or two is necessary depending on your platform. In this case, for example, only one option is required: synch. The synch option tells ROSS which synchronization protocol to use. When run with the sequential protocol, I see the following:

odin:phold$ ./phold --synch=1
./phold --sync=1 

Fri May  3 15:23:43 2019

ROSS Version: 7.0.1

tw_net_start: Found world size to be 1 
========================================
PHOLD Model Configuration..............
   Lookahead..............1.000000
   Start-events...........1
   stagger................0
   Mean...................0.000000
   Mult...................1.400000
   Memory.................100
   Remote.................0.250000
========================================


ROSS Core Configuration: 
    Total PEs                                                    1
    Total KPs                                          [Nodes (1) x KPs (16)] 16
    Total LPs                                                    8
    Simulation End Time                                  100000.00
    LP-to-PE Mapping                                        linear


ROSS Event Memory Allocation:
    Model events                                               112
    Network events                                              16
    Total events                                               127

*** START SEQUENTIAL SIMULATION ***

GVT #0: simulation 1% complete, max event queue size 8 (GVT = 1001.0000).
AVL tree size: 0
...(some output removed for brevity)
GVT #0: simulation 99% complete, max event queue size 8 (GVT = 99001.0000).
AVL tree size: 0
*** END SIMULATION ***


    : Running Time = 0.3076 seconds

TW Library Statistics:
    Total Events Processed                                  799992
    Events Aborted (part of RBs)                                 0
    Events Rolled Back                                           0
    Event Ties Detected in PE Queues                        699993
    Efficiency                                              100.00 %
    Total Remote (shared mem) Events Processed                   0
    Percent Remote Events                                     0.00 %
    Total Remote (network) Events Processed                      0
    Percent Remote Events                                     0.00 %

    Total Roll Backs                                             0
    Primary Roll Backs                                           0
    Secondary Roll Backs                                         0
    Fossil Collect Attempts                                      0
    Total GVT Computations                                       0

    Net Events Processed                                    799992
    Event Rate (events/sec)                              2600585.1
    Total Events Scheduled Past End Time                         8

TW Memory Statistics:
    Events Allocated                                           128
    Memory Allocated                                            77
    Memory Wasted                                              434

TW Data Structure sizes in bytes (sizeof):
    PE struct                                                  624
    KP struct                                                  144
    LP struct                                                  136
    LP Model struct                                              8
    LP RNGs                                                     80
    Total LP                                                   224
    Event struct                                               152
    Event struct with Model                                    160

TW Clock Cycle Statistics (MAX values in secs at 1.0000 GHz):
    Initialization                                          0.0144
    Priority Queue (enq/deq)                                0.1135
    AVL Tree (insert/delete)                                0.0000
    LZ4 (de)compression                                     0.0000
    Buddy system                                            0.0000
    Event Processing                                        0.0000
    Event Cancel                                            0.0000
    Event Abort                                             0.0000

    GVT                                                     0.0000
    Fossil Collect                                          0.0000
    Primary Rollbacks                                       0.0000
    Network Read                                            0.0000
    Other Network                                           0.0000
    Instrumentation (computation)                           0.0000
    Instrumentation (write)                                 0.0000
    Total Time (Note: Using Running Time above for Speedup)      0.7998

TW GVT Statistics: MPI AllReduce
    GVT Interval                                                16
    GVT Real Time Interval (cycles)                    0
    GVT Real Time Interval (sec)                        0.00000000
    Batch Size                                                  16

    Forced GVT                                                   0
    Total GVT Computations                                       0
    Total All Reduce Calls                                       0
    Average Reduction / GVT                                   -nan

Options

ROSS Kernel Options

`--synch={1,2,3,4,5}`

The synch option can take the following values:

1 (sequential): Instructs ROSS to use a sequential engine, i.e. the simulation will consist of only one processor (PE).
2 (conservative): Instructs ROSS to use a conservative engine, i.e. all LPs will have the same notion of virtual time.
3 (optimistic): Instructs ROSS to use an optimistic engine, i.e. LPs can have varying notions of virtual time, though rollbacks will be possible and a reverse event handler will be required in this case.
4 (optimistic debug): Executes ROSS in a serial simulation until it runs out of memory. Then it rolls every message back. This can be used to test rollback functions.
5 (optimistic real time): Similar to optimistic, but GVT is triggered based on the amount of time elapsed rather than number of events processed. See this post for details.

`--nkp=n`

nkp specifies the number of Kernel Processes (KPs) we have per Processing Element (PE). LPs are mapped onto KPs which aggregate the events processed by those LPs.

`--end=ts`

end specifies the final timestamp to process. Discrete event simulations (typically) schedule a new event from within the event handler. Therefore, in the general case, it will never stop unless the simulator notices that it has simulated past the point of interest.

`--batch=n`

batch is the number of events to be processed in an iteration of the main scheduling loop. This can control how often you poll the network for events (i.e., the smaller the number, the more often you poll the network).

`--extramem=n`

extramem specifies the amount of additional memory to allocate. This helps during simulations as we never dynamically allocate memory at runtime; we just perform a GVT and reclaim event memory. If you see a high number in the “Forced GVT” output above, try increasing this value. A good starting value for n would be 2 * batch * GVT.