C++ cache aware programming

According to “What every programmer should know about memory“, by Ulrich Drepper you can do the following on Linux: Once we have a formula for the memory requirement we can compare it with the cache size. As mentioned before, the cache might be shared with multiple other cores. Currently {There definitely will sometime soon be … Read more

Why does the speed of memcpy() drop dramatically every 4KB?

Memory is usually organized in 4k pages (although there’s also support for larger sizes). The virtual address space your program sees may be contiguous, but it’s not necessarily the case in physical memory. The OS, which maintains a mapping of virtual to physical addresses (in the page map) would usually try to keep the physical … Read more

simplest tool to measure C program cache hit/miss and cpu time in linux?

Use perf: perf stat ./yourapp See the kernel wiki perf tutorial for details. This uses the hardware performance counters of your CPU, so the overhead is very small. Example from the wiki: perf stat -B dd if=/dev/zero of=/dev/null count=1000000 Performance counter stats for ‘dd if=/dev/zero of=/dev/null count=1000000’: 5,099 cache-misses # 0.005 M/sec (scaled from 66.58%) … Read more

What is a cache hit and a cache miss? Why would context-switching cause cache miss?

Can someone explain in an easy to understand way the concept of cache miss and its probable opposite (cache hit)? A cache miss, generally, is when something is looked up in the cache and is not found – the cache did not contain the item being looked up. The cache hit is when you look … Read more

Understanding std::hardware_destructive_interference_size and std::hardware_constructive_interference_size

The intent of these constants is indeed to get the cache-line size. The best place to read about the rationale for them is in the proposal itself: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html I’ll quote a snippet of the rationale here for ease-of-reading: […] the granularity of memory that does not interfere (to the first-order) [is] commonly referred to as … Read more

Line size of L1 and L2 caches

Cache-Lines size is (typically) 64 bytes. Moreover, take a look at this very interesting article about processors caches: Gallery of Processor Cache Effects You will find the following chapters: Memory accesses and performance Impact of cache lines L1 and L2 cache sizes Instruction-level parallelism Cache associativity False cache line sharing Hardware complexities

How does one write code that best utilizes the CPU cache to improve performance?

The cache is there to reduce the number of times the CPU would stall waiting for a memory request to be fulfilled (avoiding the memory latency), and as a second effect, possibly to reduce the overall amount of data that needs to be transfered (preserving memory bandwidth). Techniques for avoiding suffering from memory fetch latency … Read more