Getting Started with SPARC-V9 MAI

6 Getting Started with SPARC-V9 MAI

There are a number of pre-configured setups included in the Simics distribution that uses the MAI to simulate out-of-order processors. They are all examples on how to use the API in different ways. The scripts reside in the same directory as the ordinary Simics scrips for a specific target. The name of the scripts contains ooo (for out-of-order). Here follows a short description of each script and how they can be further configured:

simics/targets/sunfire/bagle-ooo-common.simics
simics/targets/sunfire/donut-ooo-common.simics
simics/targets/serengeti/sarek-ooo-common.simics
These scripts uses the module ooo-micro-arch (see the source code in simics/src/extensions/ooo-micro-arch) to demonstrate how MAI works. The model can fetch, execute, and commit a configurable number of instructions per cycle. No branch speculation is performed, thus if an unresolved branch is found the fetches are stalled until the outcome of the branch is determined.
If an exception occurs the instruction tree is drained and all the speculative instructions beyond the one that caused the exception are discarded.
A pipeline is modeled with a combined fetch/decode stage, an execute state, and a commit stage.
Each processor gets an object of the ooo_micro_arch class attached to it that handles the simulation. These objects have some attributes that can be changed to alter the behavior of the model:
- <ooo_micro_arch>.fetches_per_cycles controls the number of instructions that can be fetched and decoded per cycle. Default is 1.
- <ooo_micro_arch>.execute_per_cycles controls the number of instructions that can be executed per cycle. The instructions can be dependent of each other. Default is 1.
- <ooo_micro_arch>.commits_per_cycles controls how many instructions that can be committed per cycle. Default is 1.

simics/targets/sunfire/bagle-ma-common.simics
simics/targets/sunfire/donut-ma-common.simics
simics/targets/serengeti/sarek-ma-common.simics
These scripts use the sample-micro-arch module (see simics/src/extensions/sample-micro-arch). The processors modeled can fetch/decode, execute, and commit a configurable number of instructions per cycle.
The model has a simple branch-predictor that uses a hash table (Branch Target Buffer ) to lookup the target address from the address of the branch instruction. This allows for branch speculation. The hash table is updated for every successfully committed branch.
Besides speculating on the target address, the model also speculate fall through for every branch. This way two possible execution paths are created for every branch. This makes the instruction tree into a binary tree. The number of instructions executed and fetch per cycle is actually per branch in the instruction tree.
There is a compile time switch available called VALUE_PREDICTION that can be defined to switch on value prediction of loads. It works like a small cache that maps logical addresses to values. When a load is issued, the cache is looked up first to quickly get value that can be used by later instructions. When the load is finished the speculated value is checked against the real value. If they mismatch, the later instructions need to be squashed.
Each processor gets an object of the sample_micro_arch class attached to it that handles the simulation. The class implements the following attributes:
- <sample_micro_arch>.fetches_per_cycles controls the number of instructions that can be fetched and decoded per cycle. Default is 4.
- <sample_micro_arch>.execute_per_cycles controls the number of instructions that can be executed per cycle. The instructions can be dependent of each other. Default is 4.
- <sample_micro_arch>.retires_per_cycles controls how many stores that can be retired to memory per cycle. Default is 4.
- <sample_micro_arch>.commits_per_cycles controls how many instructions that can be committed per cycle. Default is 4.
- <sample_micro_arch>.out_of_order_retire. If non-zero the retire phase can be performed out of order. Default is 0.

The following attributes in each CPU object can also be used to further configure the models:

<processor>.reorder_buffer_size controls the total number of instructions that fit in the instruction tree.

<processor>.auto_speculate_cwp. If set to non-zero the CWP register is automatically speculated, i.e. if a save instruction is encountered in the instruction stream the value of the CWP register will automatically be incremented (modulo the number of windows) for the following instructions, and if a restore or a return instruction is found CWP will be decremented automatically.

Previous - Up - Next