Instruction Tree

3.2 Instruction Tree

Simics builds a tree of active instructions to keep track of dependence information. As instructions enter the out-of-order window, they are added as leaves to the tree with the previous instruction in program order as their parent. Instructions can only be committed from the window when they are at the root of the tree. Branches occur at speculation points. Currently two forms of speculation are supported, control speculation and data speculation.

Control speculation occurs when the user model generates a fetch address that has not yet been produced by an earlier correct instruction . The instruction being fetched will be regarded as speculative but not necessarily incorrect . As soon as the previous instruction executes non-speculatively the address will be compared against the speculative address. If there is a match the instruction will be considered correct and no longer speculative.

Data speculation occurs when the user supplies an input value to an instruction that has not yet been generated by an earlier correct instruction. As with fetches the instruction will be speculative until the correct value arrives. If the value does not match, the instruction becomes incorrect and need to be rewinded and re-executed or squashed.

If the control or data speculation generates an address or value that has already been produced by an earlier correct instruction the comparison occurs immediately. Again, for matches the instruction will be correct and for mismatches the instruction will be speculative.

To keep track of speculation, Simics labels each instruction as speculative or correct. Speculative does not imply incorrect, unless a correct branch exists. This structure allows multiple speculative instruction paths to be active. In the general case, to know if an instruction is correct or incorrect, all previous instructions need to be executed and non-speculative.

While Simics and the underlying MAI support an unlimited number of speculative execution paths, the current way it is used can add restrictions. The MAI is limited only by the aggressiveness of the user model. The model can inspect the state of a branch, deciding when incorrect instruction should be squashed.

The maximum size of the instruction tree is set by an attribute called reorder_buffer_size in each CPU.

3.2.1 Tracking Instruction Execution

To keep track of the execution, each instruction can be queried on its current phase, its current status and a flag indicating whether the instruction is speculative or not. The phase describes where the instruction is in an idealized pipeline. While this might suggest an enormous number of possible states for each instruction, not all combinations are legal or meaningful.

The six phases of an instruction are:

Init: A place for the instruction has been reserved in the out-of-order window. At this point, it is possible to speculatively set the fetch address.
Fetch: The instruction has been fetched from memory. If Simics was configured to do so, an instruction fetch transaction has been sent to the memory hierarchy and it has finished stalling. The instruction opcode is available.
Decode: The instruction has been decoded: its type and its input and output registers are known. It is possible to query and set the values of the input registers. If not all the input values are known the instruction cannot be executed. No memory transaction is performed during this phase so it can not stall.
Execute: The instruction has been executed: all output values have been produced. Loads have been completed; stores have been issued and are waiting in the LSQ. The instruction may still be speculative.
Retire: To enter this phase, the instruction has to be non-speculative. Stores are sent to memory from the LSQ.
Commit: To enter this phase, an instruction must be root of the tree. The instruction results are committed to the architectural register state.

An instruction is considered non-speculative when all the input values have been validated against the previous non-speculative instructions that produced them. The fetch address (PC) is included in the input values (control speculation).

The Retire and Commit phases could be considered as part of a global commit phase. The reason they are separated is to allow users to retire stores out-of-order. Simics requires however that commit to the architectural state be done in-order (when instructions are root of the tree).

The status of an instruction is one of:

Waiting: Some input is missing and no speculation is done.
Ready: All needed input have been produced or speculated.
Stalling: Executing a memory transaction. This can happen during the Fetch, Execute and Retire phases.
Faulting: The instruction has triggered an exception.
Trap (x86 only): The instruction has triggered a trap.
Interrupt (x86 only): An immediate interrupt must be taken before executing this instruction.

3.2.2 Instruction Tree Example

Here follows an example of an instruction tree using SPARC assembly:

  0 <0xf000e7b8> stx %g4, [%o2 + -16]           R 
  1 <0xf000e7bc> cmp %g4, 0                     R
  2 <0xf000e7c0> bne,pt %xcc, 0xf000e7ac        E
      3 <0xf000e7c4> add %o2, 16, %o2               E
      4 <0xf000e7ac> ldx [%o2 + 8], %g4             D (stalling)
      5 <0xf000e7b0> stx %g4, [%o2 + -8]            D (waiting for %g4) S
      6 <0xf000e7b4> ldx [%o2 + 0], %g4             D (stalling) 
      7 <0xf000e7b8> stx %g4, [%o2 + -16]           D (waiting for %g4) S
      8 <0xf000e7bc> cmp %g4, 0                     D (waiting for %g4) S
      9 <0xf000e7c0> bne,pt %xcc, 0xf000e7ac        D (waiting for %cc) S
  3 <0xf000e7c4> add %o2, 16, %o2               E S
  4 <0xf000e7c8> ld [%i4 + 0], %g4              D (stalling)
  5 <0xf000e7cc> add %g4, 1, %g4                D (waiting for %g4) S
  6 <0xf000e7d0> st %g4, [%i4 + 0]              D (waiting for %g4) S
  7 <0xf000e7d4> jmpl [%i7 + 8], %g0            E S
      8 <0xf000e7d8> restore %g0, %g0, %g0          E S
      9 <0xf0009c18> sethi %hi(0x4000), %i2         E S
     10 <0xf0009c1c> bne 0xf00096a8                 E S
         11 <0xf0009c20> nop                            E S
         12 <0xf00096a8> mov %i2, %o0                   E S
     11 <0xf0009c20> mov %i2, %o0                   E S
     12 <0xf0009c24> srl %o0, 0, %l0                E S
     13 <0xf0009c28> sethi %hi(0xf04b2400), %o0     E S
     14 <0xf0009c2c> add %o0, 896, %i0              E S
     15 <0xf0009c30> srl %i3, 0, %o1                E S
     16 <0xf0009c34> mov %i0, %o0                   E S
  8 <0xf000e7d8> restore %g0, %g0, %g0          E S
  9 <0xf000e7dc> bcs,a,pt %xcc, 0xf000e814      E S
 10 <0xf000e7e4> ldx [%o2 + 0], %o0             D (ready) S
 11 <0xf000e7e8> cmp %o0, 0                     F
 12 <0xf000e7ec> be,a,pt %xcc, 0xf000e814       F
 13 <0xf000e7f0> ld [%i4 + 0], %g4              F

The tree shows instructions in several different phases. Instructions 2, 7, and 10 are branch instructions that each have 2 predicted targets (jump target and fall through). This has created 4 possible paths through the tree. F stands for Fetched, D for Decoded, E for Executed, and R for retired. Committed instructions have been removed from the tree. S means that the instruction is marked as speculative.

Previous - Up - Next