
On Sat, May 21, 2016 at 01:33:50PM -0400, D. Hugh Redelmeier wrote:
<https://software.intel.com/en-us/articles/google-vp9-optimization>
Intel describing how they improved the performance of the VP9 decoder for Silvermont, a recent Atom core.
The meat is several not-really-obvious changes to the code to overcome limitations of the instruction decoder. The optimizations seem particular to Silvermont but the article says: Testing against the future Intel Atom platforms, codenamed Goldmont and Tremont, the VP9 optimizations delivered additional gains.
These optimizations did nothing for Core processors as far as I can tell. I don't know if it affects any AMD processors.
A RISC processor would not have a complex instruction decoder so this kind of hacking would not apply. I will admit that there are "hazards" in RISC processors that are worth paying attention to when selecting and ordering instructions but these tend to be clearer.
Another thing in the paper:
The overall results were outstanding. The team improved user-level performance by up to 16 percent (6.2 frames per second) in 64-bit mode and by about 12 percent (1.65 frames per second) in 32-bit mode. This testing included evaluation of 32-bit and 64-bit GCC and Intel® compilers, and concluded that the Intel compilers delivered the best optimizations by far for Intel® Atom™ processors. When you multiply this improvement by millions of viewers and thousands of videos, it is significant. The WebM team at Google also recognized this performance gain as extremely significant. Frank Gilligan, a Google engineering manager, responded to the team’s success: “Awesome. It looks good. I can’t wait to try everything out.” Testing against the future Intel Atom platforms, codenamed Goldmont and Tremont, the VP9 optimizations delivered additional gains.
Consider 64-bit. If 16% improvement is 6.2 f/s, then the remaining 84% would be 32.55 f/s. Not great, but OK.
For 32-bit, 12% is 1.65 f/s; the remaining 88% would be 12 f/s. Totally useless, I think.
Quite interesting how different these two are.
64 bit has twice the registers, which for a lot of code is a huge difference. That is the biggest improvement AMD made to x86. Scrapping x87 is probably number 2. -- Len Sorensen