example of why RISC was a good idea

D. Hugh Redelmeier

21 May 2016 21 May '16

5:33 p.m.

<https://software.intel.com/en-us/articles/google-vp9-optimization> Intel describing how they improved the performance of the VP9 decoder for Silvermont, a recent Atom core. The meat is several not-really-obvious changes to the code to overcome limitations of the instruction decoder. The optimizations seem particular to Silvermont but the article says: Testing against the future Intel Atom platforms, codenamed Goldmont and Tremont, the VP9 optimizations delivered additional gains. These optimizations did nothing for Core processors as far as I can tell. I don't know if it affects any AMD processors. A RISC processor would not have a complex instruction decoder so this kind of hacking would not apply. I will admit that there are "hazards" in RISC processors that are worth paying attention to when selecting and ordering instructions but these tend to be clearer. Another thing in the paper: The overall results were outstanding. The team improved user-level performance by up to 16 percent (6.2 frames per second) in 64-bit mode and by about 12 percent (1.65 frames per second) in 32-bit mode. This testing included evaluation of 32-bit and 64-bit GCC and Intel® compilers, and concluded that the Intel compilers delivered the best optimizations by far for Intel® Atom™ processors. When you multiply this improvement by millions of viewers and thousands of videos, it is significant. The WebM team at Google also recognized this performance gain as extremely significant. Frank Gilligan, a Google engineering manager, responded to the team’s success: “Awesome. It looks good. I can’t wait to try everything out.” Testing against the future Intel Atom platforms, codenamed Goldmont and Tremont, the VP9 optimizations delivered additional gains. Consider 64-bit. If 16% improvement is 6.2 f/s, then the remaining 84% would be 32.55 f/s. Not great, but OK. For 32-bit, 12% is 1.65 f/s; the remaining 88% would be 12 f/s. Totally useless, I think. Quite interesting how different these two are.

Show replies by date

James Knott

21 May 21 May

6:48 p.m.

On 05/21/2016 01:33 PM, D. Hugh Redelmeier wrote:

...

A RISC processor would not have a complex instruction decoder so this kind of hacking would not apply. I will admit that there are "hazards" in RISC processors that are worth paying attention to when selecting and ordering instructions but these tend to be clearer.

Many years ago, I used to maintain Data General Eclipse systems. The CPU used microcode to control AMD bit slice processors and associated logic. The microcode instructions were over 100 bits wide. Now *THAT'S* RISC. ;-) BTW, those CPUs had an option called Writable Control Store (WCS) where one could create custom instructions.

D. Hugh Redelmeier

8:01 p.m.

| From: James Knott <james.knott@rogers.com> | Many years ago, I used to maintain Data General Eclipse systems. The | CPU used microcode to control AMD bit slice processors and associated | logic. The microcode instructions were over 100 bits wide. Now | *THAT'S* RISC. ;-) Technically, that was called (horizontal) microcode. With WCS, a customer could sweat bullets and perhaps get an important performance improvement. It wasn't easy. Perhaps that is similar to the way GPUs can be used very effectively for some computations. My opinions: Microcode made sense when circuits were significantly faster than core memory and there was no cache: several microcode instructions could be "covered" by the time it took to fetch a word from core. Microcode can still make sense but only for infrequent things or for powerful microcode where one micro-instruction does just about all the work of one macro-instruction. Even with these considerations, it tends to make the pipeline longer and thus the cost of branches higher. The big thing about RISC was that it got rid of microcode. At just the right time -- when caches and semiconductor memory were coming onstream. Of course UNIX was required because it was the only popular portable OS. The idea of leaving (static) scheduling to the compiler instead of (dynamic) scheduling in the hardware is important but not quite right. Many things are not known until the actual opperations are done. For example, is a memory fetch going to hit the cache or not? I think that this is what killed the Itanium project. I think that both kinds of scheduling are needed. CISC losses: the Instruction Fetch Unit and the Instruction Decoder are complex and potential bottlenecks (they add to pipeline stages). CISC instruction sets live *way* past their best-before date. RISC losses: instructions are usually less dense. More memory is consumed. More cache (and perhaps memory) bandwidth is consumed too. Instruction sets are not allowed to change as quickly as the underlaying hardware so the instruction set is not as transparent as it should be. x86 almost vanquished RISC. No RISC worksations remain. On servers, RISC has retreated a lot. SPARC and Power don't seem to be growing. But from out in left field, ARM seems to be eating x86's lunch. ATOM, x86's champion, has been cancelled (at least as a brand).

Alvin Starr

22 May 22 May

12:59 a.m.

...

| From: James Knott <james.knott@rogers.com>

| Many years ago, I used to maintain Data General Eclipse systems. The | CPU used microcode to control AMD bit slice processors and associated | logic. The microcode instructions were over 100 bits wide. Now | *THAT'S* RISC. ;-)

Technically, that was called (horizontal) microcode. Geac did the same thing. Several years later when I was with ISG we developed a 128bit processor

On 05/21/2016 04:01 PM, D. Hugh Redelmeier wrote: that we jokingly called a VRISC processor because it had something like 6 instructions. We were using the processor in a graphics display system.

...

With WCS, a customer could sweat bullets and perhaps get an important performance improvement. It wasn't easy. Perhaps that is similar to the way GPUs can be used very effectively for some computations.

My opinions:

Microcode made sense when circuits were significantly faster than core memory and there was no cache: several microcode instructions could be "covered" by the time it took to fetch a word from core.

The Geac system was originally designed with core memory where the access times were in the range of micro-seconds and the clock speed of the microcode in the CPU was about 4Mhz built using 4bit bit-slice ALU's and a lot of random logic.

...

Microcode can still make sense but only for infrequent things or for powerful microcode where one micro-instruction does just about all the work of one macro-instruction. Even with these considerations, it tends to make the pipeline longer and thus the cost of branches higher.

Microcode also helped with reusing gates. For example coding a multiply instruction as a loop of adds and shifts. now days most processors have ripple multipliers.

...

The big thing about RISC was that it got rid of microcode. At just the right time -- when caches and semiconductor memory were coming onstream. Of course UNIX was required because it was the only popular portable OS.

RISC also benefited from increased transistor density.

...

The idea of leaving (static) scheduling to the compiler instead of (dynamic) scheduling in the hardware is important but not quite right. Many things are not known until the actual opperations are done. For example, is a memory fetch going to hit the cache or not? I think that this is what killed the Itanium project. I think that both kinds of scheduling are needed.

CISC losses: the Instruction Fetch Unit and the Instruction Decoder are complex and potential bottlenecks (they add to pipeline stages). CISC instruction sets live *way* past their best-before date.

RISC losses: instructions are usually less dense. More memory is consumed. More cache (and perhaps memory) bandwidth is consumed too. Instruction sets are not allowed to change as quickly as the underlaying hardware so the instruction set is not as transparent as it should be.

x86 almost vanquished RISC. No RISC worksations remain. On servers, RISC has retreated a lot. SPARC and Power don't seem to be growing. But from out in left field, ARM seems to be eating x86's lunch. ATOM, x86's champion, has been cancelled (at least as a brand).

The x86 although popular is not the best example of a CISC design. The National Semiconductor NS32000 which I believe was the first production 32bit microprocessor. The current x86 64bit is just the last of a long set of patches from the 8086. I believe the last original CPU design from intel was the iAPX 432. Intel had plans to dead end the x86 in favour if the Itanium as the step up to 64bit but AMD scuttled those plays by designing a 64 but instruction set addition. A number of Risc processors still live on mostly in embedded applications.. MIPS. ARM. Power(IBM) It was a shame to see the end of the Alpha it was a nice processor and opened the door to NUMA interprocessor interconnects that just came into the the Intel world.

...

--- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk

-- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

D. Hugh Redelmeier

4:11 a.m.

| From: Alvin Starr <alvin@netvel.net> | Geac did the same thing. | Several years later when I was with ISG we developed a 128bit processor | that we jokingly called a VRISC processor because it had something like | 6 instructions. | We were using the processor in a graphics display system. My impression is that GEAC was in a really special place in the market. It was highly vertical -- hardware through applications deployment. It may have helped vendor lock-in to have their own hardware. I don't actually know what their hardware advantage was, if any. Perhaps they understood transaction processing better than designers of minis or micros. | The Geac system was originally designed with core memory where the | access times were in the range of micro-seconds and the clock speed of | the microcode in the CPU was about 4Mhz built using 4bit bit-slice ALU's | and a lot of random logic. If you were interested in hardware in those days, that was an attractive approach. But if you were really interested in supporting credit unions and libraries, I don't see that this was a good use of your energy. (I did know Gus German before GEAC. Interesting guy. One of the original four undergrads who wrote the WATFOR compiler (for the IBM 7040/44).) | Microcode also helped with reusing gates. | For example coding a multiply instruction as a loop of adds and shifts. | now days most processors have ripple multipliers. Some of the original RISC machines had a "multiply step" instruction. You just wrote a sequence of them. The idea was that each instruction took exactly one cycle and a multiply didn't fit that. | RISC also benefited from increased transistor density. Every processor benefited from that. I guess RISC processors had more regularity and that made the design cycle take less engineering which in turn could improve time-to-market. But Intel had so many engineers that that advantage disappeared. But there was a point where all or a RISC CPU would fit on a single die but a comparable CISC CPU would not. This made a big difference. But we passed that roughly when the i386 came out. Of course Intel's process advantage helped a bit. | The x86 although popular is not the best example of a CISC design. | The National Semiconductor NS32000 which I believe was the first | production 32bit microprocessor. It sure wasn't the first if you count actually really working silicon. I have some scars to prove it. | The current x86 64bit is just the last of a long set of patches from the | 8086. Yes, but the amazing thing is that the i386 and AMD patches were actually quite elegant. If I remember correctly, Gordon Bell said roughly that you could update an architecture once, but after that things get to be a mess. I'm impressed that the AMD architecture is OK. I'm not counting all the little hacks (that I don't even know) like MMX, AVX, ... | I believe the last original CPU design from intel was the iAPX 432. i860? i960 (with Siemens)? Itanium (with HP)? | Intel had plans to dead end the x86 in favour if the Itanium as the step | up to 64bit but AMD scuttled those plays by designing a 64 but | instruction set addition. Yeah. It kind of pulled an Intel. The AMD architecture filled a growing gap in x86's capability and was good enough. One of Intel's motivations for Itanium seemed to be to own the architecture. It really was unhappy with having given a processor license to AMD. | A number of Risc processors still live on mostly in embedded applications.. | MIPS. I didn't mention MIPS. Mostly because they seem to be shrinking. Most MIPS processors that I see are in routers and newer models seem to be going to ARM. | It was a shame to see the end of the Alpha it was a nice processor and | opened the door to NUMA interprocessor interconnects that just came into | the the Intel world. If I remember correctly, some Alpha folks went to AMD, some went to Sun, and some went to Intel. Intel doesn't have all the good ideas even if it had all the processor sales. AMD's first generation of 64-bit processors was clearly superior to Intel's processors, up until the Core 2 came out. AMD still lost to Intel in the market. It's sad to see AMD's stale products now. I think that ARM's good idea was to stay out of Intel's field of view until they grew strong. Intel had an ARM license (transferred from DEC) from the StrongARM work. They decided to stop using it after producing some chips focused on networking etc. ("XScale"). They sold it to Marvell. There were and are a lot of architectures in the embedded space but ARM seems to be the one that scaled. It could be the wisdom of ARM Inc. but I don't know that.

Alvin Starr

6 a.m.

On 05/22/2016 12:11 AM, D. Hugh Redelmeier wrote:

...

| From: Alvin Starr <alvin@netvel.net>

| Geac did the same thing. | Several years later when I was with ISG we developed a 128bit processor | that we jokingly called a VRISC processor because it had something like | 6 instructions. | We were using the processor in a graphics display system.

My impression is that GEAC was in a really special place in the market. It was highly vertical -- hardware through applications deployment. It may have helped vendor lock-in to have their own hardware.

I don't actually know what their hardware advantage was, if any. Perhaps they understood transaction processing better than designers of minis or micros.

The Geac history that I was told went. Geac used HP minicomputers. Mike Sweet designed a disk controller that was actually faster and smarter than the HP mini so they took that hardware and just built their own computers. The 8000 was a 4 processor system but the processors each served specific functions(Disk,Tape,Comms,CPU). The thing was to support small banks you needed a system that was close to IBM mainframe performance. So you could be an IBM VAR or ????? This was in the 1980 time frame and at that time Geac had a system that could run a credit union and was about the size of an IBM communications concentrator. Geac had a lot of good technology but got caught up in a second system design and kind of got Osborned.

...

| The Geac system was originally designed with core memory where the | access times were in the range of micro-seconds and the clock speed of | the microcode in the CPU was about 4Mhz built using 4bit bit-slice ALU's | and a lot of random logic.

If you were interested in hardware in those days, that was an attractive approach. But if you were really interested in supporting credit unions and libraries, I don't see that this was a good use of your energy.

(I did know Gus German before GEAC. Interesting guy. One of the original four undergrads who wrote the WATFOR compiler (for the IBM 7040/44).)

| Microcode also helped with reusing gates. | For example coding a multiply instruction as a loop of adds and shifts. | now days most processors have ripple multipliers.

Some of the original RISC machines had a "multiply step" instruction. You just wrote a sequence of them. The idea was that each instruction took exactly one cycle and a multiply didn't fit that.

| RISC also benefited from increased transistor density.

Every processor benefited from that. I guess RISC processors had more regularity and that made the design cycle take less engineering which in turn could improve time-to-market. But Intel had so many engineers that that advantage disappeared.

But there was a point where all or a RISC CPU would fit on a single die but a comparable CISC CPU would not. This made a big difference. But we passed that roughly when the i386 came out. Of course Intel's process advantage helped a bit.

| The x86 although popular is not the best example of a CISC design. | The National Semiconductor NS32000 which I believe was the first | production 32bit microprocessor.

It sure wasn't the first if you count actually really working silicon. I have some scars to prove it.

While still at Geac I was part of the group evaluating future processors. Intel was trying to sell us the 286 and hinting at the upcoming 432. Motorola had the 68000 and a segmented memory manager co-processor. NS had the 16000(at the time) and plans for a paged virtual memory manager and a floating point co-processor. The group liked the NS design because it was clean and consistent. All the instructions had the same addressing modes I still have a NS32000 multibus system in my basement. Its running a variant of V7(kind of a bsd 3.x) that we got from Bill Jolitz.

...

| The current x86 64bit is just the last of a long set of patches from the | 8086.

Yes, but the amazing thing is that the i386 and AMD patches were actually quite elegant. If I remember correctly, Gordon Bell said roughly that you could update an architecture once, but after that things get to be a mess. I'm impressed that the AMD architecture is OK. I'm not counting all the little hacks (that I don't even know) like MMX, AVX, ...

| I believe the last original CPU design from intel was the iAPX 432.

i860? i960 (with Siemens)? Itanium (with HP)?

I wonder how much intel and the other partners brought to the table in the design of the 960 ane Itanium. My impression was the HP developed the base design as an outflow if their PA-RISC work. I have never been a great fan of Intel. To me it just seemed that they did little original work and leveraged their size and the work of others. But that is just my personal impression.

...

| Intel had plans to dead end the x86 in favour if the Itanium as the step | up to 64bit but AMD scuttled those plays by designing a 64 but | instruction set addition.

Yeah. It kind of pulled an Intel. The AMD architecture filled a growing gap in x86's capability and was good enough.

One of Intel's motivations for Itanium seemed to be to own the architecture. It really was unhappy with having given a processor license to AMD.

| A number of Risc processors still live on mostly in embedded applications.. | MIPS.

I didn't mention MIPS. Mostly because they seem to be shrinking. Most MIPS processors that I see are in routers and newer models seem to be going to ARM.

| It was a shame to see the end of the Alpha it was a nice processor and | opened the door to NUMA interprocessor interconnects that just came into | the the Intel world.

If I remember correctly, some Alpha folks went to AMD, some went to Sun, and some went to Intel.

The Alpha group were working closely with AMD close to the end and Hypertransport technology got spun off into a separate group. I liked the Alpha also and we had a number of them early on running Digital Unix and Linux. An interesting read is http://www.hypertransport.org/docs/news/Digitimes-Processor-War-01-27-06.pdf

...

Intel doesn't have all the good ideas even if it had all the processor sales.

AMD's first generation of 64-bit processors was clearly superior to Intel's processors, up until the Core 2 came out. AMD still lost to Intel in the market. It's sad to see AMD's stale products now.

I think that ARM's good idea was to stay out of Intel's field of view until they grew strong. Intel had an ARM license (transferred from DEC) from the StrongARM work. They decided to stop using it after producing some chips focused on networking etc. ("XScale"). They sold it to Marvell.

There were and are a lot of architectures in the embedded space but ARM seems to be the one that scaled. It could be the wisdom of ARM Inc. but I don't know that.

On the other hand there are now a number of interesting processor designs out there. Some based on FPGAs like the OpenCores project and others like the parallella,tilera, are more custom. So hopefully new designs will not have to come from the big guys.

...

--- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk

-- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

James Knott

11:54 a.m.

...

If I remember correctly, Gordon Bell said roughly that you could update an architecture once, but after that things get to be a mess. You might want to read "The Soul of a New Machine" for an interesting story. It was about Data General's attempt to build a 32 bit computer. This would have been the 3rd generation, after the Nova and Eclipse

On 05/22/2016 12:11 AM, D. Hugh Redelmeier wrote: lines. I read this book back in the days when I was maintaining Nova and Eclipse computers, so I had a bit more insight than most.

...

One of Intel's motivations for Itanium seemed to be to own the architecture. It really was unhappy with having given a processor license to AMD.

One problem with that is some customers, particularly military, were allergic to single source equipment. As I recall, Intel had to second source their products because of this. As a result, AMD, Siemens & NEC, IIRC, all made compatible "Intel" products under licence.

Lennart Sorensen

4:26 p.m.

On Sun, May 22, 2016 at 12:11:13AM -0400, D. Hugh Redelmeier wrote:

...

| From: Alvin Starr <alvin@netvel.net> | I believe the last original CPU design from intel was the iAPX 432.

i860? i960 (with Siemens)? Itanium (with HP)?

I think it depends if you consider those designs "original".

...

Yeah. It kind of pulled an Intel. The AMD architecture filled a growing gap in x86's capability and was good enough.

Well unlike intel they did actually clean up a bit while extending the size.

...

One of Intel's motivations for Itanium seemed to be to own the architecture. It really was unhappy with having given a processor license to AMD.

Well if they hadn't done that, then IBM would almost certainly not have used them in the PC. So instead they would probably have gone with the M68k. Too bad for us intel did license it.

...

I didn't mention MIPS. Mostly because they seem to be shrinking. Most MIPS processors that I see are in routers and newer models seem to be going to ARM.

I guess you haven't seen the Cavium Octeon chips then.

...

If I remember correctly, some Alpha folks went to AMD, some went to Sun, and some went to Intel.

Intel doesn't have all the good ideas even if it had all the processor sales.

We can thank the Alpha team for hyperthreading, QPI (and hypertransport), and a number of other goodies.

...

AMD's first generation of 64-bit processors was clearly superior to Intel's processors, up until the Core 2 came out. AMD still lost to Intel in the market. It's sad to see AMD's stale products now.

The netburst architecture was a stupid move by intel. They assumed they could increase clock rates to 10GHz+ and designed an architecture that required doing that.

...

I think that ARM's good idea was to stay out of Intel's field of view until they grew strong. Intel had an ARM license (transferred from DEC) from the StrongARM work. They decided to stop using it after producing some chips focused on networking etc. ("XScale"). They sold it to Marvell.

Actually they still have it. They some some of it to Marvell. Intel sold the PXA line (application processors) but kept the IXP and IOP line (network and I/O processors). Of course these days intel also owns Altera which is putting arm cores in FPGAs.

...

There were and are a lot of architectures in the embedded space but ARM seems to be the one that scaled. It could be the wisdom of ARM Inc. but I don't know that.

Or just a bit of dumb luck. They did seem to realize that making computers and chips was the wrong market to be in and that selling designs and licenses was a much better business plan. The initial design was as far as I know just a case of seeing a CPU design house, reading an IBM white paper on RISC, and deciding "Well we can do that too" and then doing it. -- Len Sorensen

Lennart Sorensen

4:13 p.m.

On Sat, May 21, 2016 at 08:59:55PM -0400, Alvin Starr wrote:

...

Microcode also helped with reusing gates. For example coding a multiply instruction as a loop of adds and shifts. now days most processors have ripple multipliers.

Sure speeds up multiplies though.

...

The x86 although popular is not the best example of a CISC design. The National Semiconductor NS32000 which I believe was the first production 32bit microprocessor. The current x86 64bit is just the last of a long set of patches from the 8086.

I would change that to 4004.

...

I believe the last original CPU design from intel was the iAPX 432.

Maybe. And even though it flopped they still insisted on trying such a design in the Itanium again. And again it flopped and didn't work. When will intel learn that compile time scheduling is NEVER going to happen in general purpose use?

...

Intel had plans to dead end the x86 in favour if the Itanium as the step up to 64bit but AMD scuttled those plays by designing a 64 but instruction set addition.

The Itanium being an awful design probably did most of the damage.

...

A number of Risc processors still live on mostly in embedded applications.. MIPS. ARM. Power(IBM)

Well IBM in the server and HPC market, Freescale (well NXP now) in the embedded market. Well AppliedMicro does a bit of powerpc still too.

...

It was a shame to see the end of the Alpha it was a nice processor and opened the door to NUMA interprocessor interconnects that just came into the the Intel world.

Unfortunately a case of horrible management and being too worried about hurting sales of your former product even though your compretitors didn't mind hurting it at all. -- Len Sorensen

William Park

3:27 p.m.

On Sat, May 21, 2016 at 04:01:03PM -0400, D. Hugh Redelmeier wrote:

...

x86 almost vanquished RISC. No RISC worksations remain. On servers, RISC has retreated a lot. SPARC and Power don't seem to be growing. But from out in left field, ARM seems to be eating x86's lunch. ATOM, x86's champion, has been cancelled (at least as a brand).

You guys are talking about all the stuffs that failed... mainly because they priced themselves right out of the market. RIP. -- William

Alvin Starr

4:02 p.m.

On 05/22/2016 11:27 AM, William Park wrote:

...

...
x86 almost vanquished RISC. No RISC worksations remain. On servers, RISC has retreated a lot. SPARC and Power don't seem to be growing. But from out in left field, ARM seems to be eating x86's lunch. ATOM, x86's champion, has been cancelled (at least as a brand). You guys are talking about all the stuffs that failed... mainly because

On Sat, May 21, 2016 at 04:01:03PM -0400, D. Hugh Redelmeier wrote: they priced themselves right out of the market. RIP. Few of these technologies failed because of price. One thing that will put several nails in the coffin of a processor is not having a windows port. Another is having a proprietary OS.

-- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

Lennart Sorensen

4:32 p.m.

On Sun, May 22, 2016 at 12:02:08PM -0400, Alvin Starr wrote:

...

Few of these technologies failed because of price. One thing that will put several nails in the coffin of a processor is not having a windows port. Another is having a proprietary OS.

Power, MIPS and Alpha all ran Windows NT 4. Itanium ran XP and Server. Didn't seem to do any of them any good. ARM did not run windows until Windows CE and Windows RT. -- Len Sorensen

James Knott

4:37 p.m.

On 05/22/2016 12:32 PM, Lennart Sorensen wrote:

...

Power, MIPS and Alpha all ran Windows NT 4. Itanium ran XP and Server. Didn't seem to do any of them any good.

A

There was also a Power version of OS/2.

Lennart Sorensen

4:43 p.m.

On Sun, May 22, 2016 at 12:37:10PM -0400, James Knott wrote:

...

On 05/22/2016 12:32 PM, Lennart Sorensen wrote:

...
Power, MIPS and Alpha all ran Windows NT 4. Itanium ran XP and Server. Didn't seem to do any of them any good.

A

There was also a Power version of OS/2.

I thought that was never finished and released. Not that running OS/2 evern did any CPU any good (or bad it would seem). -- Len Sorensen

James Knott

5:35 p.m.

On 05/22/2016 12:43 PM, Lennart Sorensen wrote:

...

On Sun, May 22, 2016 at 12:37:10PM -0400, James Knott wrote:

...
On 05/22/2016 12:32 PM, Lennart Sorensen wrote:

...
Power, MIPS and Alpha all ran Windows NT 4. Itanium ran XP and Server. Didn't seem to do any of them any good.

A There was also a Power version of OS/2. I thought that was never finished and released.

When I was at IBM, in the late 90s, there were some Power systems running OS/2. I don't know if it was ever sold.

...

Not that running OS/2 evern did any CPU any good (or bad it would seem).

James Knott

5:49 p.m.

On 05/22/2016 12:43 PM, Lennart Sorensen wrote:

...

Not that running OS/2 evern did any CPU any good (or bad it would seem).

Actually, OS/2 lasted for quite a while in the financial sector because it was far more stable than Windows. It ran the ATMs for many years and was also used in point of sales systems (the servers, not POS terminals). Another thing it had with Warp 3 and later was excellent networking support. It included many things that MS charged extra for with NT. I seem to recall with NT, you had to pay for the server version according to how many users you were planning on having. This didn't happen with Warp Server. I even recall a demo, where you could, for example, share Novell Netware servers with someone who wasn't running a Netware client. This was done by mounting the Novell file share on the Warp Server and sharing it from there. Back then, even basic telnet and ftp servers were built into OS/2, but were extra cost with NT. When I was at IBM, I set up an ftp server on my own computer, where I could find files, when helping someone with there computer.

D. Hugh Redelmeier

9:09 p.m.

| From: James Knott <james.knott@rogers.com> | | Actually, OS/2 lasted for quite a while in the financial sector because | it was far more stable than Windows. I remember hearing that Imperial Oil used it a lot (from someone having to deal with migrating away). Technical qualities might not have mattered. Banks and big oil companies computerized early and they seemed to be infected early with the idea that IBM was always the safe choice. They also bought into Token Ring for a similar reason. Yet the open ethernet standard and marketplace one. It was just hard for central planners to handle. I was annoyed at OS/2 because it was announced long before it was reasonable to use it. UNIX was ready and able to do the job but the earth was scorched by these promises. Of course it looked safer to wait for IBM and Microsoft (the old reliable AND the young turks). This is an echo of the lawsuit that CDC brought against IBM in the 1960s. CDC had supercomputers but sales were blocked by IBM promises that were put off and then never actually delivered (hear of an actual IBM/360 model 60 or 70? The model 90 was poor and soon replaced. Before that, Stretch (7030) was a failure too.). It is a bit like the Itanium. Most of the RISC processor workstation vendors folded (or were folded) when the Itanium loomed. In actual delivery it was a bit of a squib: late and slow. This bought time for the x86 to get good enough so that it won.

James Knott

10:36 p.m.

On 05/22/2016 05:09 PM, D. Hugh Redelmeier wrote:

...

They also bought into Token Ring for a similar reason. Yet the open ethernet standard and marketplace one. It was just hard for central planners to handle.

Token ring had advantages over coax or hub Ethernet networks with collisions. Switches eliminated those advantages. BTW, I recall trying and failing to get Linux working on a microchannel PS/2 & token ring. I was able to get it going on a ThinkPad with token ring card, but it required a minor change to a config file. I was using Mandrake at the time, IIRC.

Alvin Starr

5:27 p.m.

On 05/22/2016 12:32 PM, Lennart Sorensen wrote:

...

On Sun, May 22, 2016 at 12:02:08PM -0400, Alvin Starr wrote:

...
Few of these technologies failed because of price. One thing that will put several nails in the coffin of a processor is not having a windows port. Another is having a proprietary OS. Power, MIPS and Alpha all ran Windows NT 4. Itanium ran XP and Server. Didn't seem to do any of them any good.

ARM did not run windows until Windows CE and Windows RT.

The Alpha was very early on with Windows the DEC and MS guys were working closely (I had dealings with both at the time) but MS dropped the Alpha and there has been some suggestion that that decision was externally influenced. Up to the point of being dropped the Alpha looked like it was going to make a go of it. They had a couple of external manufacturers of the chipset and they were tightly involved with AMD with Hypertransport. I believe that the Itanium died because it could never quite deliver on the performance that it promised. That and the fact that AMD came up with the x86_64 which would run 32bit windows just fine and would work with the existing software. I can't comment much in Mips or Power. Arm got a big shot in the arm(pun intended) when it got windows ports because there are a lot of people using embedded windows and then the arm became open to them as a platform. But I think the biggest thing for ARM has been the cell phone and tablets, most of which do not run windows but linux or some RTOS. Intel and AMD have been rushing to fill in that gap with smaller processors. I did not say the windows would guarantee success just that it helps in a significant way. -- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

Lennart Sorensen

4:07 p.m.

On Sat, May 21, 2016 at 04:01:03PM -0400, D. Hugh Redelmeier wrote:

...

Technically, that was called (horizontal) microcode.

With WCS, a customer could sweat bullets and perhaps get an important performance improvement. It wasn't easy. Perhaps that is similar to the way GPUs can be used very effectively for some computations.

My opinions:

Microcode made sense when circuits were significantly faster than core memory and there was no cache: several microcode instructions could be "covered" by the time it took to fetch a word from core.

Microcode can still make sense but only for infrequent things or for powerful microcode where one micro-instruction does just about all the work of one macro-instruction. Even with these considerations, it tends to make the pipeline longer and thus the cost of branches higher.

The big thing about RISC was that it got rid of microcode. At just the right time -- when caches and semiconductor memory were coming onstream. Of course UNIX was required because it was the only popular portable OS.

The idea of leaving (static) scheduling to the compiler instead of (dynamic) scheduling in the hardware is important but not quite right. Many things are not known until the actual opperations are done. For example, is a memory fetch going to hit the cache or not? I think that this is what killed the Itanium project. I think that both kinds of scheduling are needed.

CISC losses: the Instruction Fetch Unit and the Instruction Decoder are complex and potential bottlenecks (they add to pipeline stages). CISC instruction sets live *way* past their best-before date.

RISC losses: instructions are usually less dense. More memory is consumed. More cache (and perhaps memory) bandwidth is consumed too. Instruction sets are not allowed to change as quickly as the underlaying hardware so the instruction set is not as transparent as it should be.

x86 almost vanquished RISC. No RISC worksations remain. On servers, RISC has retreated a lot. SPARC and Power don't seem to be growing. But from out in left field, ARM seems to be eating x86's lunch. ATOM, x86's champion, has been cancelled (at least as a brand).

But x86 internally has been RISC since the Pentium Pro with an instruction decoder to convert the x86 instructions to the internal format. Now I suspect it might not be pure RISC, but then again neither is ARM or probably anyone else these days. -- Len Sorensen

Lennart Sorensen

4:03 p.m.

On Sat, May 21, 2016 at 02:48:50PM -0400, James Knott wrote:

...

Many years ago, I used to maintain Data General Eclipse systems. The CPU used microcode to control AMD bit slice processors and associated logic. The microcode instructions were over 100 bits wide. Now *THAT'S* RISC. ;-)

BTW, those CPUs had an option called Writable Control Store (WCS) where one could create custom instructions.

That sounds more like the opposite of RISC. Much more like VAX or mainframes used to be as far as I know. Maybe even VLIW, although probably not. Now being able to define new instructions using low level RISC features might make some sense, although how much the savings would be in execution time or binary size I don't know. I have a hard time imagining much gain there. -- Len Sorensen

James Knott

4:20 p.m.

On 05/22/2016 12:03 PM, Lennart Sorensen wrote:

...

...
Many years ago, I used to maintain Data General Eclipse systems. The CPU used microcode to control AMD bit slice processors and associated logic. The microcode instructions were over 100 bits wide. Now *THAT'S* RISC. ;-)

BTW, those CPUs had an option called Writable Control Store (WCS) where one could create custom instructions. That sounds more like the opposite of RISC. Much more like VAX or

On Sat, May 21, 2016 at 02:48:50PM -0400, James Knott wrote: mainframes used to be as far as I know. Maybe even VLIW, although probably not.

I also used to work on VAX 11/780 systems back then. With the VAX, the microcode was loaded from an 8" floppy at boot. There were occasional updates for it. I suppose one could also write custom instructions for the VAX (BTW, I'm not an anti-VAXer <g>). I said "RISC" because the core of the CPU was bit slice processors, which were very simple devices, providing basic arithmetic & logic functions and the microcode controlled them, along with some glue logic.

...

Now being able to define new instructions using low level RISC features might make some sense, although how much the savings would be in execution time or binary size I don't know. I have a hard time imagining much gain there.

I was thinking it might be used for specific areas. For example, many years ago, computers were built for business and ran COBOL or for science & engineering, with FORTRAN. Back then the technology was so primitive that what we now call a general purpose computer was not practical. So, there may be some instructions that could be added for better performance in certain applications. Incidentally, a few years ago, I read a book about IBM's early computers and the design decisions made for business vs science/engineering computers. Back then business computers worked with some form of decimal digits, but S&E used floating point.

Alvin Starr

4:26 p.m.

On 05/22/2016 12:03 PM, Lennart Sorensen wrote:

...

...
Many years ago, I used to maintain Data General Eclipse systems. The CPU used microcode to control AMD bit slice processors and associated logic. The microcode instructions were over 100 bits wide. Now *THAT'S* RISC. ;-)

BTW, those CPUs had an option called Writable Control Store (WCS) where one could create custom instructions. That sounds more like the opposite of RISC. Much more like VAX or

On Sat, May 21, 2016 at 02:48:50PM -0400, James Knott wrote: mainframes used to be as far as I know. Maybe even VLIW, although probably not.

Now being able to define new instructions using low level RISC features might make some sense, although how much the savings would be in execution time or binary size I don't know. I have a hard time imagining much gain there.

The real laugh is that RISC processors most often have more instructions than CISC processors. Can you consider the JVM a CISC machine? -- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

Lennart Sorensen

4 p.m.

On Sat, May 21, 2016 at 01:33:50PM -0400, D. Hugh Redelmeier wrote:

...

<https://software.intel.com/en-us/articles/google-vp9-optimization>

Intel describing how they improved the performance of the VP9 decoder for Silvermont, a recent Atom core.

The meat is several not-really-obvious changes to the code to overcome limitations of the instruction decoder. The optimizations seem particular to Silvermont but the article says: Testing against the future Intel Atom platforms, codenamed Goldmont and Tremont, the VP9 optimizations delivered additional gains.

These optimizations did nothing for Core processors as far as I can tell. I don't know if it affects any AMD processors.

A RISC processor would not have a complex instruction decoder so this kind of hacking would not apply. I will admit that there are "hazards" in RISC processors that are worth paying attention to when selecting and ordering instructions but these tend to be clearer.

Another thing in the paper:

The overall results were outstanding. The team improved user-level performance by up to 16 percent (6.2 frames per second) in 64-bit mode and by about 12 percent (1.65 frames per second) in 32-bit mode. This testing included evaluation of 32-bit and 64-bit GCC and Intel® compilers, and concluded that the Intel compilers delivered the best optimizations by far for Intel® Atom™ processors. When you multiply this improvement by millions of viewers and thousands of videos, it is significant. The WebM team at Google also recognized this performance gain as extremely significant. Frank Gilligan, a Google engineering manager, responded to the team’s success: “Awesome. It looks good. I can’t wait to try everything out.” Testing against the future Intel Atom platforms, codenamed Goldmont and Tremont, the VP9 optimizations delivered additional gains.

Consider 64-bit. If 16% improvement is 6.2 f/s, then the remaining 84% would be 32.55 f/s. Not great, but OK.

For 32-bit, 12% is 1.65 f/s; the remaining 88% would be 12 f/s. Totally useless, I think.

Quite interesting how different these two are.

64 bit has twice the registers, which for a lot of code is a huge difference. That is the biggest improvement AMD made to x86. Scrapping x87 is probably number 2. -- Len Sorensen

3363

Age (days ago)

3364

Last active (days ago)

List overview

Download

23 comments

5 participants

participants (5)

Alvin Starr
D. Hugh Redelmeier
James Knott
Lennart Sorensen
William Park