Spectre type 2: Branch Target Injection

[background] At the last meeting, I said that I hadn't yet understood the Spectre variant that uses the Branch Target Injection CVE-2017-5715. I now do. Recall that Spectre and Meltdown use a side-channel for sneaking information past barriers. Speculative execution of instructions are supposed to have no observable effect when the misprediction occurs. The side-channel is the CPU cache for memory: that can be affected by mispredicted speculation. In the case of Meltdown, speculation is allowed to reference forbidden memory. An artifact of Intel's implementation. It can be fixed over by annoyingly expensive page table changes at the kernel / userland boundary. It can be fixed by an OS exploiting PCID feature added by Intel with Haswell processors but that is made intricate by the PCID field being too narrow. This mitigation can be done once-and-for-all in an OS. Spectre comes in two forms: mispredicting a simple conditional branch (one that can branch to a manifest location, or not) and mispredicting an indirect branch. To mitigate Spectre problem, each dangerous conditional branch needs to have code added by the programmer (or compiler). This code is expensive so it isn't a great idea to add it everywhere (which could be automatic). The fix for dangerous conditional branches is easy: add an LFENCE instruction. Unfortuately that slows down processing quite a lot. WebKit is using "index masking" instead, for speed. <https://webkit.org/blog/8048/what-spectre-and-meltdown-mean-for-webkit/> [foreground] Now we get to Branch Target Injection, the second form of Spectre. The indirect branch case is much trickier. An indirect branch is one where the target is not manifest in the instruction. Instead, it is somehow computed. Think: - call through a function pointer variable - method call in an object oriented language (a call through a function pointer, at least in the general case) - a return from a function - a case statement Fast processors nowadays predict where such a branch will lead. The heuristics used can be outsmarted by carefully crafted code and led to speculate ANYWHERE in the address space. This is awesomely scary. You cannot add protective code on the target because there is no single target. This has similarities to the attack exploitation method called "Return Oriented Programming": the attacker just has to find a useful code fragment somewhere in your codebase and aim the branch target prediction towards it. Google researchers have devised a trick to prevent indirect branch misprediction from doing a bad guy's bidding. They constructed a "retpoline" that essentially ties up misprediction in a harmless bit of code. See the "Construction (x86)" section of <https://support.google.com/faqs/answer/7625886> The cost is an ugly piece of code and no useful speculation.

On 13/01/18 12:21 PM, D. Hugh Redelmeier via talk wrote
Now we get to Branch Target Injection, the second form of Spectre.
The indirect branch case is much trickier. An indirect branch is one where the target is not manifest in the instruction. Instead, it is somehow computed. Think:
- call through a function pointer variable
- method call in an object oriented language (a call through a function pointer, at least in the general case)
- a return from a function
- a case statement
Fast processors nowadays predict where such a branch will lead. The heuristics used can be outsmarted by carefully crafted code and led to speculate ANYWHERE in the address space. This is awesomely scary. You cannot add protective code on the target because there is no single target. This has similarities to the attack exploitation method called "Return Oriented Programming": the attacker just has to find a useful code fragment somewhere in your codebase and aim the branch target prediction towards it.
Google researchers have devised a trick to prevent indirect branch misprediction from doing a bad guy's bidding. They constructed a "retpoline" that essentially ties up misprediction in a harmless bit of code. See the "Construction (x86)" section of <https://support.google.com/faqs/answer/7625886> The cost is an ugly piece of code and no useful speculation. --- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk
Thanks, Hugh! Oracle and Fujitsu (who actually make the chips) still hasn't said which machine types will suffer from speculation attacks, but did implement a hardware cache change recently in the M7 and M8 series (the conventional, glow-in-the-dark 5 GHz chipsets that speculate wildly) which they market as "Silicon Secured Memory". It reads as if they've been having trouble with "invalid [memory references], stale memory reference and buffer overflows", and have added microcode to cause SEGVs before the data arrives if you try to fetch a cache line that isn't the same "version" as your process. Version sounds like a short value used like a pid, but don't quote me on that: the papers are written by marketers, not engineers (;-)) See https://blogs.oracle.com/partnertech/sas-and-oracle-sparc-m7-silicon-secured... --dave -- David Collier-Brown, | Always do right. This will gratify System Programmer and Author | some people and astonish the rest davecb@spamcop.net | -- Mark Twain

| From: David Collier-Brown via talk <talk@gtalug.org> | Oracle and Fujitsu (who actually make the chips) still hasn't said which | machine types will suffer from speculation attacks, but did implement a | hardware cache change recently in the M7 and M8 series (the conventional, | glow-in-the-dark 5 GHz chipsets that speculate wildly) which they market as | "Silicon Secured Memory". The date on this document (2016 Feb 11) suggests that it isn't about Spectre-like or Meltdown-like attacks. | It reads as if they've been having trouble with "invalid [memory references], | stale memory reference and buffer overflows", and have added microcode to | cause SEGVs before the data arrives if you try to fetch a cache line that | isn't the same "version" as your process. Version sounds like a short value | used like a pid, but don't quote me on that: the papers are written by | marketers, not engineers (;-)) | | See | https://blogs.oracle.com/partnertech/sas-and-oracle-sparc-m7-silicon-secured... Interesting. This looks like it catches bugs where a program uses an old pointer that points to memory that has since been freed and possibly reallocated. - the metadata is only 4 bits, so there could be cases that are not caught (not too likely unless the error is engineered by an opponent) - the metadata is per cache-line. I'm guessing that that means the malloc(3) and free(3) routines must paint the memory with this metadata, with one operation per cache-line-sized chunk of RAM. An additional feature would be that if a pointer were used to attempt to reference the next object in memory, it would likely fail due to metadata clash. On Linux there is a library called Electric Fence (efence) that will catch roughly the same errors. When you link it in, each malloc allocates a whole new page of memory. Free unmaps that page. So references through dangling pointers become segfaults. Bonus: it places the object at the end of the page so that overruns cause segfaults. (Fine print: there is a global runtime option to place the objects at the start of pages so underrun is caught instead.) In theory, efence is quite expensive: each object takes a page of memory. Core dumps get quite large. But in many real-world programs, the memory isn't a problem. I used this extensively many years ago when memories were a lot smaller and addresses were only 32 bits. I would expect that the SPARC feature could be used in production code whereas few would use efence that way. With a sensible high level language, these checks should not be important. But for C and C++ it is useful. When trying to debug a failure, it is really nice to know automatically that it is or isn't due to this kind of error. It is also nice to know that this kind of error won't be silent.

On January 13, 2018 3:35:01 PM EST, "D. Hugh Redelmeier via talk" <talk@gtalug.org> wrote:
| From: David Collier-Brown via talk <talk@gtalug.org>
| Oracle and Fujitsu (who actually make the chips) still hasn't said which | machine types will suffer from speculation attacks, but did implement a | hardware cache change recently in the M7 and M8 series (the conventional, | glow-in-the-dark 5 GHz chipsets that speculate wildly) which they market as | "Silicon Secured Memory".
The date on this document (2016 Feb 11) suggests that it isn't about Spectre-like or Meltdown-like attacks.
| It reads as if they've been having trouble with "invalid [memory references], | stale memory reference and buffer overflows", and have added microcode to | cause SEGVs before the data arrives if you try to fetch a cache line that | isn't the same "version" as your process. Version sounds like a short value | used like a pid, but don't quote me on that: the papers are written by | marketers, not engineers (;-)) | | See | https://blogs.oracle.com/partnertech/sas-and-oracle-sparc-m7-silicon-secured...
Interesting. This looks like it catches bugs where a program uses an old pointer that points to memory that has since been freed and possibly reallocated.
- the metadata is only 4 bits, so there could be cases that are not caught (not too likely unless the error is engineered by an opponent)
- the metadata is per cache-line. I'm guessing that that means the malloc(3) and free(3) routines must paint the memory with this metadata, with one operation per cache-line-sized chunk of RAM.
An additional feature would be that if a pointer were used to attempt to reference the next object in memory, it would likely fail due to metadata clash.
On Linux there is a library called Electric Fence (efence) that will catch roughly the same errors. When you link it in, each malloc allocates a whole new page of memory. Free unmaps that page. So references through dangling pointers become segfaults. Bonus: it places the object at the end of the page so that overruns cause segfaults. (Fine print: there is a global runtime option to place the objects at the start of pages so underrun is caught instead.)
In theory, efence is quite expensive: each object takes a page of memory. Core dumps get quite large. But in many real-world programs, the memory isn't a problem. I used this extensively many years ago when memories were a lot smaller and addresses were only 32 bits.
I would expect that the SPARC feature could be used in production code whereas few would use efence that way.
With a sensible high level language, these checks should not be important. But for C and C++ it is useful. When trying to debug a failure, it is really nice to know automatically that it is or isn't due to this kind of error. It is also nice to know that this kind of error won't be silent.
Speaking up, so to speak; Current buzz indicates Skylake and better CPU's survive these Lfence kludge's, hopefully to +-5% perf hits. Ad Hoc it looks like more than that on my new setup after this mornings dnfdragora updates I notice that Enhanced Berkeley Packet Filtering is now in kernel 4.x. http://www.brendangregg.com/ebpf.html https://github.com/iovisor/bcc/blob/master/INSTALL.md#fedora---binary Spartan to date, but Brendan Gregg has several enthusiastic videos out there. Linux Superpowers with eBPF https://youtu.be/bj3qdEDbCD4 -- Russell
participants (3)
-
D. Hugh Redelmeier
-
David Collier-Brown
-
Russell