Re: [GTALUG] How to go fast without speculating... maybe

30 Jan 2018

      [I speak as if I'm an expert, but I'm not.  Beware.]

| From: Alvin Starr via talk <talk@gtalug.org>

| To: talk@gtalug.org
| 
| A number of groups have tried to develop extremely parallel processors but all
| seem to have gained little traction.
| 
| There was the XPU 128, the Epiphany(http://www.adapteva.com/) and more
| recently the Xenon Phi and AMD Epyc.

GPUs are extremely parallel by CPU standards.  And they certainly are
getting traction.

This shows that you may have to grow up in a niche before you can
expand into a big competitive market.

- GPUs were useful as GPUs and evolved through many generations

- only then did folks try to use them for more-general-purpose
  computing

Downside: GPUs didn't have a bunch of things that we take for granted
in CPUs.  Those are gradually being added.

Another example: ARM is just now (more than 30 years on) targeting
datacentres.  Interestingly, big iron has previously mostly been
replaced by co-ordinated hordes of x86 micros.

| At one point I remember reading a article about sun developing an asynchronous
| CPU which would be interesting.

Yeah, and many tried Gallium Arsenide too.  That didn't work out,
probably due to the mature expertise in CMOS.  I guess you could say
it was also due to energy efficiency being more important that speed
(as things get faster, they get hotter, and even power-sipping CMOS
reached the limit of cooling).

The techniques of designing and debugging asynchronous circuits are
not as well-developped as those for syncronous designs.  That being
said, clock distribution in a modern CPU is apparently a large
problem.

| All these processors run into the same set of problems.
|     1) x86 silicon is amazingly cheap.

Actually, this points to an opening.

Historically, Intel has been a node or so ahead of all other silicon
fabs.  This meant that their processors were a year or two ahead of
everyone else on the curve of Moore's law.

That meant that even when RISC was ahead of x86, the advantage was
precarious.  Eventually, the vendors threw in the towel: lots of risk
with large engineering costs and relatively low payoffs.  Some went to
the promise of Itanium (SGI, HP (who had eaten Apollo and Compaq (who
had eaten DEC)).  Power motors on but has shrunk (loosing game
machines, automotive (I think), desktops and laptops (Apple), and
workstations).  SPARC is barely walking dead except for contractual
obligations.

But now, with Moore's Law fading for a few years, maybe smaller
efficiency gains start to count.  RISC might be worth reviving.  But
the number of transistors on a die mean that the saving by making a
processor core smaller doesn't count for a lot.  Unless you multiply it by
a considerable constant: many processors on the same die.

The Sun T series looked very interesting to me when it came out.  It
looked to me as if the market didn't take note.  Perhaps too many had
already written Sun off -- at least to the extent of using their
hardware for new purposes.  Also Sun's cost structure for marketing
and sales was probably a big drag.

|     2) supporting multiple CPUs cause more software support for each 
| new CPU architecture.

That hurt RISC, but the vendors knew that they were limited to
organizations that used UNIX and could recompile all their
applications.  The vendors did try to broaden this but, among others,
Microsoft really screwed them.  Microsoft promised ports to pretty
much all RISCs but failed to deliver with credible support on any.

Even AMD's 64-bit architecture was screwed by Microsoft.  Reasonable
64-bit Windows was promised to AMD for when they shipped (i.e. before
Intel shipped) but 64-bit Windows didn't show up within the useful
lifetime of the first AMD 64-bit chips.

|     3) very little software is capable of truly taking advantage of many
| parallel threads without really funky compilers and software design tools.

A lot of software, by cycles consumed, can use parallelism.

The Sun T series was likely very useful for running Web front-ends,
something that is embarrassingly parallel.

Data mining folks seem to have used map/reduce and the like to allow
parallel processing.

GPUs grew up working on problems that are naturally parallel.

What isn't easy to do in parallel is a program written in our normal
programming languages: C / C++ / JAVA / FORTRAN.  Each has had
parallelism bolted on in a way that is not natural to use.

|     4) having designed a fancy CPU most companies try very hard to keep their
| proprietary knowledge all within their own control where the x86 instruction
| set must be just about open source now days.

No.  There are a very few license to produce x86 processors.  Intel,
AMD, IBM, and a very few others that were inherited from dead
companies.  For example, I think Cyrix (remember them?) counted on
using IBM's license through using IBM's fab (IBM no longer has a fab).
I don't remember how NCR and Via got licenses.  AMD's license is the
clearest and Intel tried to revoke it -- what a fight!

RISC-V looks interesting.

|     5) getting motherboard manufacturers to take a chance on a new CPU is not
| an easy thing.

It's not clear whether this matters much.  It matters for workstations
but that isn't really a contested space any longer.  Even though you
and I care.

| Even people with deep pockets like DEC with their Alpha CPU and IBM with their
| Power CPUs have not been able to make a significant inroad into the commodity
| server world.

In retrospect, we all know what they should have done.  But would that
have worked?  Similar example: Nokia and BlackBerry were in similar
holes and tried different ways out but neither worked.

Power was widely adoped (see above).

The Alpha was elegant.  DEC tried to build big expensive systems.
This disappointed many TLUGers (as we were then known) because that's
not we'd dream of buying.  Their engineering choices were the opposite
of: push out a million cheap systems to drive forward on the learning
curves.  HP was one of the sources of the Itanium design and so when
they got Compaq which had gotten DEC, it was natural to switch to
Itanium.

(Several TLUGers had Alpha system.  The cheapest were pathetically
worse than PCs of the time (DEC crippled them so as not to compete
with their more expensive boxes).  The larger ones were aquired after they
were obsolescent.  Lennart may still have some.)

Itanium died for different reasons.

- apparently too ambitious about what compilers could do (static
  scheduling).  I'd quibble with this.

- Intel never manufactured Itanium on the latest node.  So it always
  lost some speed compared with x86.  Why did they do this?  I think
  that it was that Innovators Dilemma stuff.  The x86 fight with AMD
  was existential and Itanium wasn't as important to them.

- customers took a wait and see attitude.  As did Microsoft.

| Mips has had some luck with low to mid range systems for routers and storage
| systems but their server business is long gone with the death of SGI.

No, SGI switched horses.  Itanium and, later, x86.

MIPS just seemed lucky to fall into the controller business, but it
seems lost now.  Replaced by ARM.

| Sun/Oracle has had some luck with the Sparc but not all that much outside
| their own use and I am just speculating but I would bet that Sun/Oracle sells
| more x86 systems than Sparc systems.

Fun fact: Some older Scientific Atlanta / Cisco Set Top Boxes for
cable use SPARC.  Some XEROX copiers did too.

| ARM seems to be having some luck but I believe that luck is because of their
| popularity in the small computer systems world sliding into supporting larger
| systems and not by being designed for servers from the get go.

Right.  Since power matters so much in the datacentre, lots of
companies are trying to build suitable ARM systems.  Progress is
surprisingly slow.  AMD is even one of these ARM-for-datacentre
companies.

| I am a bit of a processor geek and have put lots of effort in the past into
| elegant processors that just seem to go nowhere.
| I would love to see some technologies other than the current von Neumann
| somewhat parallel SMP but I have a sad feeling that that will be a long time
| coming.

Interesting hopefuls include:

- GPUs

- FPGAs stuck on motherboards (eg. Intel can fit (Xilinx?) FPGAs in a
  processor socket of a multi-socket server motherboard.

- neural net accelerators.

- The Mill (dark horse)
  <https://en.wikipedia.org/wiki/Mill_architecture>

- quantum computers

- wafer-scale integration

| With the latest screw-up from Intel and the huge exploit surface that is the
| Intel ME someone may be able to get some traction by coming up with a
| processor that is designed and verified for security.

You only have to be as secure as "best practices" within your
industry.  Otherwise Windows would have died a generation ago.

There are security-verified processors for the military.  Expensive
and obsolete by our standards.

Not enough customers are willing to pay even the first price for security:
simplicity.  That's before we even get to the inconvenience issues.

Security does not come naturally.  Todays Globe and Mail reported:
<https://www.theglobeandmail.com/news/world/fitness-devices-can-provide-locations-of-soldiers/article37764423/>

<https://www.theverge.com/2018/1/28/16942626/strava-fitness-tracker-heat-map-military-base-internet-of-things-geolocation>

Re: [GTALUG] How to go fast without speculating... maybe

D. Hugh Redelmeier