I've been told and have read from Intel's manuals that it is possible to write instructions to memory, but the instruction prefetch queue has already fetched the stale instructions and will execute those old instructions. I have been unsuccessful in observing this behavior. My methodology is as follows.
The Intel software development manual states from section 11.6 that
A write to a memory location in a code segment that is currently cached in the processor causes the associated cache line (or lines) to be invalidated. This check is based on the physical address of the instruction. In addition, the P6 family and Pentium processors check whether a write to a code segment may modify an instruction that has been prefetched for execution. If the write affects a prefetched instruction, the prefetch queue is invalidated. This latter check is based on the linear address of the instruction.
So, it looks like if I hope to execute stale instructions, I need to have two different linear addresses refer to the same physical page. So, I memory map a file to two different addresses.
int fd = open("code_area", O_RDWR | O_CREAT, S_IRWXU | S_IRWXG | S_IRWXO); assert(fd>=0); write(fd, zeros, 0x1000); uint8_t *a1 = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_FILE | MAP_SHARED, fd, 0); uint8_t *a2 = mmap(NULL, 0x1000, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_FILE | MAP_SHARED, fd, 0); assert(a1 != a2);
I have an assembly function that takes a single argument, a pointer to the instruction I want to change.
fun: push %rbp mov %rsp, %rbp xorq %rax, %rax # Return value 0 # A far jump simulated with a far return # Push the current code segment %cs, then the address we want to far jump to xorq %rsi, %rsi mov %cs, %rsi pushq %rsi leaq copy(%rip), %r15 pushq %r15 lretq copy: # Overwrite the two nops below with `inc %eax'. We will notice the change if the # return value is 1, not zero. The passed in pointer at %rdi points to the same physical # memory location of fun_ins, but the linear addresses will be different. movw $0xc0ff, (%rdi) fun_ins: nop # Two NOPs gives enough space for the inc %eax (opcode FF C0) nop pop %rbp ret fun_end: nop
In C, I copy the code to the memory mapped file. I invoke the function from linear address
a1, but I pass a pointer to
a2 as the target of the code modification.
#define DIFF(a, b) ((long)(b) - (long)(a)) long sz = DIFF(fun, fun_end); memcpy(a1, fun, sz); void *tochange = DIFF(fun, fun_ins); int val = ((int (*)(void*))a1)(tochange);
If the CPU picked up the modified code, val==1. Otherwise, if the stale instructions were executed (two nops), val==0.
I've run this on a 1.7GHz Intel Core i5 (2011 macbook air) and an Intel(R) Xeon(R) CPU X3460 @ 2.80GHz. Every time, however, I see val==1 indicating the CPU always notices the new instruction.
Has anyone experience with the behavior I want to observe? Is my reasoning correct? I'm a little confused about the manual mentioning P6 and Pentium processors, and what the lack of mentioning my Core i5 processor. Perhaps something else is going on that causes the CPU to flush its instruction prefetch queue? Any insight would be very helpful!