How are C ++ atom operations implemented internally

Why can't the load part of the RMW atomic command pass the previous memory to an unused location in the TSO (x86) memory consistency model? - Memory, CPU, Atom, CPU architecture, instructions

The x86 architecture is known to not implement a sequential consistency memory model due to the use of write buffers so that memory-> load re-ordering can take place (later loads can be performed while the earlier memories are still in write buffers, the wait for the L1 cache to be committed).

In the An Introduction to Storage Consistency and Coherence information on Read-Modify-Write (RMW) operations, see the Total Store Order (TSO) storage consistency model (which is said to be very similar to x86):

... we are considering the RMW as a cargo, immediately followed by a deal. The load part of Der RMW cannot forward previous loads due to the TSO's order rules. At first it might turn out that the loading part of the RMW could pass previous memory in the write buffer, but this is not allowed. Whether the loading part of the RMW passes an earlier store, then the store. A part of the RMW would also have to pass through the earlier store because the RMW is an atomic pair. But since stores are not allowed to pass each other in the TSO, the loading part of the RMW does not pass an earlier store either.

Ok, the atomic operation must be atomic, i.e. the location that RMW is accessing cannot be accessed by any other threads / cores during the RMW operation. However, if the earlier storage is going through some of the atomic process, it is not related to the location that RMW accesses the following statements (in pseudocode):

The first memory becomes the write buffer and waits for its turn. In the meantime, the atomic operation loads the value from another location (also in a different cache line), passes the first memory, and adds the next memory into the write buffer after the first. In the global storage order we see the following order:

Load (part of the atom) -> Save (ordinal) -> Save (part of the atom)

Yes, it may not be the best solution from the performance point of view as we need to keep the cache line read-write for the atomic operation until all previous memories are committed from the write buffer, but Performance aside, there are violations of the TSO storage consistency model if we allow the load portion of the RMW operation to pass the earlier storages to unused locations?

Reply:

4 for the answer № 1

You could ask the same question about each store + charging pair at different addresses: the charging can be carried out earlier internally than in the older store because the order is not correct. In X86 this would be allowed because:

Loads can be rearranged with older stores in different locations, but not with older stores in the same location

(Source: Whitepaper on ordering Intel 64 architecture memory)

In your example, however, the lock lock would prevent this because (from the same rule set):

Blocked instructions have a total order

This means that the lock would enforce a memory barrier, like a mfence (and in fact some compilers use a locked operation as a fence). This will normally cause the CPU to stop executing the load until the memory buffer is empty, forcing the memory to execute first.