cpu-cache – Page 2

C++ cache aware programming

May 19, 2023 by Tarik

According to “What every programmer should know about memory“, by Ulrich Drepper you can do the following on Linux: Once we have a formula for the memory requirement we can compare it with the cache size. As mentioned before, the cache might be shared with multiple other cores. Currently {There definitely will sometime soon be … Read more

Why does the speed of memcpy() drop dramatically every 4KB?

May 14, 2023 by Tarik

Memory is usually organized in 4k pages (although there’s also support for larger sizes). The virtual address space your program sees may be contiguous, but it’s not necessarily the case in physical memory. The OS, which maintains a mapping of virtual to physical addresses (in the page map) would usually try to keep the physical … Read more

Which ordering of nested loops for iterating over a 2D array is more efficient [duplicate]

March 2, 2023 by Tarik

The first method is slightly better, as the cells being assigned to lays next to each other. First method: [ ][ ][ ][ ][ ] …. ^1st assignment ^2nd assignment [ ][ ][ ][ ][ ] …. ^101st assignment Second method: [ ][ ][ ][ ][ ] …. ^1st assignment ^101st assignment [ ][ ][ … Read more

simplest tool to measure C program cache hit/miss and cpu time in linux?

February 24, 2023 by Tarik

Use perf: perf stat ./yourapp See the kernel wiki perf tutorial for details. This uses the hardware performance counters of your CPU, so the overhead is very small. Example from the wiki: perf stat -B dd if=/dev/zero of=/dev/null count=1000000 Performance counter stats for ‘dd if=/dev/zero of=/dev/null count=1000000’: 5,099 cache-misses # 0.005 M/sec (scaled from 66.58%) … Read more

What is a cache hit and a cache miss? Why would context-switching cause cache miss?

February 18, 2023 by Tarik

Can someone explain in an easy to understand way the concept of cache miss and its probable opposite (cache hit)? A cache miss, generally, is when something is looked up in the cache and is not found – the cache did not contain the item being looked up. The cache hit is when you look … Read more

Understanding std::hardware_destructive_interference_size and std::hardware_constructive_interference_size

December 5, 2022 by Tarik

The intent of these constants is indeed to get the cache-line size. The best place to read about the rationale for them is in the proposal itself: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0154r1.html I’ll quote a snippet of the rationale here for ease-of-reading: […] the granularity of memory that does not interfere (to the first-order) [is] commonly referred to as … Read more

Line size of L1 and L2 caches

December 3, 2022 by Tarik

Cache-Lines size is (typically) 64 bytes. Moreover, take a look at this very interesting article about processors caches: Gallery of Processor Cache Effects You will find the following chapters: Memory accesses and performance Impact of cache lines L1 and L2 cache sizes Instruction-level parallelism Cache associativity False cache line sharing Hardware complexities

Write-back vs Write-Through caching?

October 20, 2022 by Tarik

The benefit of write-through to main memory is that it simplifies the design of the computer system. With write-through, the main memory always has an up-to-date copy of the line. So when a read is done, main memory can always reply with the requested data. If write-back is used, sometimes the up-to-date data is in … Read more

How does one write code that best utilizes the CPU cache to improve performance?

October 13, 2022 by Tarik

The cache is there to reduce the number of times the CPU would stall waiting for a memory request to be fulfilled (avoiding the memory latency), and as a second effect, possibly to reduce the overall amount of data that needs to be transfered (preserving memory bandwidth). Techniques for avoiding suffering from memory fetch latency … Read more

Approximate cost to access various caches and main memory?

October 1, 2022 by Tarik

Numbers everyone should know 0.5 ns – CPU L1 dCACHE reference 1 ns – speed-of-light (a photon) travel a 1 ft (30.5cm) distance 5 ns – CPU L1 iCACHE Branch mispredict 7 ns – CPU L2 CACHE reference 71 ns – CPU cross-QPI/NUMA best case on XEON E5-46* 100 ns – MUTEX lock/unlock 100 ns … Read more