Load-Store Queue

5.1 Load-Store Queue

The LSQ built into Simics is based on the instruction tree. Store transactions are kept in the tree to make it possible to inspect the store queue state in the different speculation paths.

Inserting a store transaction in the queue takes no time and the LSQ has an infinite size. It is up to the user module to set delays and restrictions to limit the LSQ.

The LSQ enforces program-order consistency. The following rules apply:

Loads from memory: A load transaction is allowed to execute only if all previous stores have been executed (i.e., there have been inserted in the LSQ). If there are instructions potentially performing stores higher up in the execution path that are still at the decode phase, the load is blocked until the blocking stores have been executed.
If the load is not blocked, it is matched against speculative stores in the LSQ (in the corresponding execution path). If matches are found, the data is retrieved from the LSQ. If more data is needed, the load is sent to memory. The resulting data is merged with the LSQ data and the result is returned to the processor.
Loads from devices: A load transaction that accesses a device is only allowed to execute as root of the tree (which makes it non-speculative).
Stores to memory: When the instruction enters the execute phase, stores are inserted in the LSQ.
When the instruction enters the retire phase, the corresponding stores are sent to memory (in order if the instruction has performed several). The LSQ searches for potential conflicting stores and forces stores that would overlap each other to execute in-order.
Stores to devices: A store transaction that accesses a device is only allowed to execute as root of the tree (which makes it non-speculative).
Atomic instructions: Atomic instructions will be blocked if they may conflict within a certain granularity with an earlier load in the tree that has not been executed or an earlier store in the tree that has not been retired. This is because atomic instructions lock a ram region when they have reached the execute phase (see section 5.2) which means that a deadlock situation may occur if an atomic instruction executes before an earlier conflicting instruction. The earlier instruction will in this case wait for the lock to be released and the atomic instruction will wait for the earlier instruction to complete according to the above rules of the LSQ. To avoid this atomic instructions will have to wait until there are no conflicts. Due to the locking mechanism the load part of an atomic instruction will always be sent to memory although all its data can be found in the LSQ.
The granularity can be set by an attribute in the processor object, <processor>.lock_granularity. If set to zero atomic instructions will not be blocked and it is up to the processor model to avoid deadlocks. If set to non-zero it should be set to the same granularity as set for the ram objects. The default granularity is 8 bytes which is the same as the default granularity for the ram object.

Some memory transactions bypass the LSQ:

Instruction fetches are always sent directly to memory. If you want your model to handle self-modifying code (instead of letting the software synchronize this itself), you will need to reorder instruction fetches when they arrive to the memory hierarchy.

Control (cache lock, flush, ...) and prefetch transactions bypass the LSQ as well.

Some special x86 transactions may bypass the LSQ but there are usually associated to synchronizing instruction so it will not matter. The model limitations are listed in section 4.6.

5.1.1 Disabling the Load-Store Queue

The internal LSQ can be disabled by setting the attribute lsq_enabled to 0 in each CPU object. When the LSQ is disabled all memory transactions are sent to memory during the execution phase. The retire phase thus becomes superfluous and proceeding instructions through it will have no effect.

Note that when the LSQ is turned off program order consistency will not be maintained any longer by Simics. It is up to the timing model to ensure that memory operations does not complete in wrong order. It is therefore strongly recommended that the LSQ never disabled unless you are 100% sure of what you are doing.

The main reason for disabling the LSQ would be if a model of a memory system should be simulated that is more relaxed than what is allowed by the internal LSQ.

Previous - Up - Next