cpu-cache – Row Coding

What is meant by data cache and instruction cache?

November 28, 2023 by Tarik

Instruction fetches can be done in chunks with the assumption that much of the time you are going to run through many instructions in a row. so instruction fetches can be more efficient, there is likely a handful or more clocks of overhead per transaction then the delay for the memory to have the data … Read more

Can I force cache coherency on a multicore x86 CPU?

November 24, 2023 by Tarik

volatile only forces your code to re-read the value, it cannot control where the value is read from. If the value was recently read by your code then it will probably be in cache, in which case volatile will force it to be re-read from cache, NOT from memory. There are not a lot of … Read more

Why is the size of L1 cache smaller than that of the L2 cache in most of the processors?

November 20, 2023 by Tarik

L1 is very tightly coupled to the CPU core, and is accessed on every memory access (very frequent). Thus, it needs to return the data really fast (usually within on clock cycle). Latency and throughput (bandwidth) are both performance-critical for L1 data cache. (e.g. four cycle latency, and supporting two reads and one write by … Read more

How can I do a CPU cache flush in x86 Windows?

July 16, 2023 by Tarik

Fortunately, there is more than one way to explicitly flush the caches. The instruction “wbinvd” writes back modified cache content and marks the caches empty. It executes a bus cycle to make external caches flush their data. Unfortunately, it is a privileged instruction. But if it is possible to run the test program under something … Read more

What’s the difference between conflict miss and capacity miss

July 14, 2023 by Tarik

The important distinction here is between cache misses caused by the size of your data set, and cache misses caused by the way your cache and data alignment are organized. Lets assume you have a 32k direct mapped cache, and consider the following 2 cases: You repeatedly iterate over a 128k array. There’s no way … Read more

Where is the L1 memory cache of Intel x86 processors documented?

July 14, 2023 by Tarik

It is near impossible to find specs on Intel caches. When I was teaching a class on caches last year, I asked friends inside Intel (in the compiler group) and they couldn’t find specs. But wait!!! Jed, bless his soul, tells us that on Linux systems, you can squeeze lots of information out of the … Read more

Why is linear read-shuffled write not faster than shuffled read-linear write?

July 12, 2023 by Tarik

This is a complex problem closely related to architectural features of modern processors and your intuition that random read are slower than random writes because the CPU has to wait for the read data is not verified (most of the time). There are several reasons for that I will detail. Modern processors are very efficient … Read more

Do current x86 architectures support non-temporal loads (from “normal” memory)?

July 5, 2023 by Tarik

To answer specifically the headline question: Yes, recent1 mainstream Intel CPUs support non-temporal loads on normal 2 memory – but only “indirectly” via non-temporal prefetch instructions, rather than directly using non-temporal load instructions like movntdqa. This is in contrast to non-temporal stores where you can just use the corresponding non-temporal store instructions3 directly. The basic … Read more

Are CPU registers and CPU cache different? [closed]

June 12, 2023 by Tarik

Yes, CPU register is just a small amount of data storage, that facilitates some CPU operations. CPU cache, it is a high speed volatile memory which is bigger in size, that helps the processor to reduce the memory operations.

How are cache memories shared in multicore Intel CPUs?

June 9, 2023 by Tarik

In a multiprocessor system or a multicore processor (Intel Quad Core, Core two Duo etc..) does each cpu core/processor have its own cache memory (data and program cache)? Yes. It varies by the exact chip model, but the most common design is for each CPU core to have its own private L1 data and instruction … Read more