instructions – Row Coding

Difference between movq and movabsq in x86-64

November 23, 2023 by Tarik

Unless your 64-bit value can be encoded as a 32-bit-sign-extended immediate, you have to move it to a register first and then store. (Or do two separate 32-bit stores, or other worse workaround to get the bytes where you want them.) In NASM / Intel syntax, mov r64, 0x… picks a MOV encoding based on … Read more

C code loop performance

August 26, 2023 by Tarik

I noticed in the comments that: The loop takes 5 cycles to execute. It’s “supposed” to take 4 cycles. (since there’s 4 adds and 4 mulitplies) However, your assembly shows 5 SSE movssl instructions. According to Agner Fog’s tables all floating-point SSE move instructions are at least 1 inst/cycle reciprocal throughput for Nehalem. Since you … Read more

How does x86 pause instruction work in spinlock and can it be used in other scenarios?

July 16, 2023 by Tarik

PAUSE notifies the CPU that this is a spinlock wait loop so memory and cache accesses may be optimized. See also pause instruction in x86 for some more details about avoiding the memory-order mis-speculation when leaving the spin-loop. PAUSE may actually stop CPU for some time to save power. Older CPUs decode it as REP … Read more

C code loop performance [continued]

February 11, 2023 by Tarik

Try using EMON profiling in Vtune, or some equivalent tool like oprof Vtune for Linux (you can search for the Windows version) oprofile EMON (Event Monitoring) profiling => like a time based tool, but it can tell you what performance event is causing the problem. Although, you should start out with a time based profile … Read more

What does the endbr64 instruction actually do?

February 11, 2023 by Tarik

It stands for “End Branch 64 bit” — or more precisely, Terminate Indirect Branch in 64 bit. Here is the operation: IF EndbranchEnabled(CPL) & EFER.LMA = 1 & CS.L = 1 IF CPL = 3 THEN IA32_U_CET.TRACKER = IDLE IA32_U_CET.SUPPRESS = 0 ELSE IA32_S_CET.TRACKER = IDLE IA32_S_CET.SUPPRESS = 0 FI FI; The instruction is otherwise … Read more

Are there any smart cases of runtime code modification?

November 7, 2022 by Tarik

There are many valid cases for code modification. Generating code at run time can be useful for: Some virtual machines use JIT compilation to improve performance. Generating specialized functions on the fly has long been common in computer graphics. See e.g. Rob Pike and Bart Locanthi and John Reiser Hardware Software Tradeoffs for Bitmap Graphics … Read more

`testl` eax against eax?

November 2, 2022 by Tarik

Why do ARM chips have an instruction with Javascript in the name (FJCVTZS)?

October 15, 2022 by Tarik

It is because JS uses double precision for the numbers, but if you want to perform operations with bits, the task is nontrivial, so a specific instruction to convert JS double into integer makes the thing easier. This ARM link explains it very well: https://community.arm.com/processors/b/blog/posts/armv8-a-architecture-2016-additions In order to add more information regarding fuz’s comment, the … Read more