From: Michael J. Mahon on 7 Jun 2006 14:18
Jorge ChB wrote:
> mdj <mdj.mdj(a)gmail.com> wrote:
>>Now that extremely cheap machines can manipulate high definition AV
>>content in better than realtime, we're running out of reasons to make
>>This is a good thing, in many ways.
> Wow wait !
> What are you saying ?
> A good thing in what way ?
> More MIPS + less Watts + smaller.
> That's what hardware designers are after and will always be after.
> That's been the key to now-possible before-unthinkable everyday things
> like mobiles, ipods, psps, palms, dvb tv, ABS, EFI, portables, google
> earth, WIFI, etc etc. A microprocessor everywhere.
> And that's the key to many unthinkable wonderful new inventions that
> have to come yet and won't be possible unless the hardware keeps
> evolving == (More MIPS + less Watts + smaller)
Absolutely right. But the game is about to get much harder,
in the sense that silicon feature sizes are already approaching
the point where silicon looks like swiss cheese, making further
improvements by simple scaling quite difficult.
The game is getting harder and *much* more expensive, so progress is
slower (note the lack of 2x speed increases for the last few years)
and the number of different players is decreasing.
The big "open door" opportunity is multiprocessor parallelism, but
we have invested so little in learning to apply parallelism that it
remains esoteric. (But AppleCrate makes it easy to experiment with! ;-)
The popular "thread" model, in which all of memory is conceptually
shared by all threads, is a disaster for real multiprocessors, since
they will *always* have latency and bandwidth issues to move data
between them, and a "single, coherent memory image" is both slow
> The only lack "of reasons to make faster machines" I can think of comes
> from the fact that the software is evolving at a so *much* slower pace
> (than hardware)...
> Voice recognition ?
> "Artificial Intelligence" ?
> User (human) interface ?
> etc... :-(
Yes, there is much to be done--and very slow progress. What is
needed is breakthrough *algorithmic* work, not *tools* work.
Parallel computing for 8-bit Apple II's!
Home page: http://members.aol.com/MJMahon/
"The wastebasket is our most important design
tool--and it is seriously underused."
From: mdj on 7 Jun 2006 21:37
Jorge ChB wrote:
> Wow wait !
> What are you saying ?
> A good thing in what way ?
> More MIPS + less Watts + smaller.
In general, more MIPS = more watts. In the past we focussed on more
MIPS, more or less regardless of watts because we needed the MIPS more.
Now, MIPS per watt is critical. Mobility demands it, as does server
room space/heat issues, as does embedded systems. The work we want to
do is achievable with current levels of processing power, but not
current levels of power consumption.
Considering the current state of software development, it's a good
thing. We need to spend some time focussing our efforts of exploiting
other means of performance enhancement, and focus more on software
quality. Many of the foreseeable tasks for faster machines require many
orders of magnitude more processing power that we currently have, so
the focus will have to be on software improvement. I say about time!
From: mdj on 7 Jun 2006 22:56
Michael J. Mahon wrote:
> The big "open door" opportunity is multiprocessor parallelism, but
> we have invested so little in learning to apply parallelism that it
> remains esoteric. (But AppleCrate makes it easy to experiment with! ;-)
Parallelism is the big door, but I think the approaches that need to be
explored cover a wider gamut than multiprocess parallelism, which as
you point of has considerable latency issues.
> The popular "thread" model, in which all of memory is conceptually
> shared by all threads, is a disaster for real multiprocessors, since
> they will *always* have latency and bandwidth issues to move data
> between them, and a "single, coherent memory image" is both slow
> and wasteful.
It is however an extremely efficient form of multiprocessing for
applications with modest horizontal scaling potential.
There's essentially 3 basic models for parallelism that must be
Multithread - in which one processor core can execute multiple threads
Uniform Memory Multiprocessor - in which many processsor cores share
the same physical memory subsystem. Note that this is further divided
into multiple cores in the same package, plus other cores in different
packages, which have very different latency properties.
Non Uniform Memory Multiprocessor - In this case the latency can vary
wildly depending on the system configuration.
Modern multiprocessor servers employ all three approaches, both on the
same system board, plus via high speed interconnects that join multiple
system boards together. OS's must weight the 'distance' to another CPU
when considering a potential execution unit for a process.
What's slow and wasteful depends a great deal on the task at hand.
Multithreading used to be just as expensive as multiprocessing. But
consider a current generation CPU designed for low power, high
concurrency, the UltraSPARC T1.
These units have execution cores cable of running 4 concurrent threads.
In the highest end configuration, there are 8 of these execution cores
per physical processor. The cores have a 3.2GB/s interconnect. Each
physical processor has 4 independant memory controllers, so you have
non-uniform memory access on the one die.
Peak power consumption for this part is 79W at 1Ghz. Considering you
can in theory run 32 threads simulaneously, that's pretty impressive.
How well you can exploit it depends on your application. An 'old
school' web server for instance, can only get 8 way parallelism on this
chip. A new school web server written in Java, can get 32 way, assuming
at any given time there is at least 32 concurrent requests for the same
dynamic page, or 32 static requests.
It's getting to the stage where the power consumed by driving I/O over
a pin on an IC package is significant, so expect to see systems like
this grow in popularity.
Interesting, you can download a VHDL description of this part from Sun,
and synthesise it on one of the higer end FPGA's. Oh how I wish I had
access to hardware like that!
A top of the range Sun server uses parts that have 4 execution threads
per core, four cores per board, each with it's own memory
controller+memory, and up to 18 boards per system (coupled together by
an 9GB/s crossbar switch). Exploiting all the resources in this system
and doing it efficiently is *hard*, as it employs every different style
of parallelism I mentioned before within the same 'machine'.
And I haven't even considered computing clusters!
The way it's panning out is that real multiprocessors are a disaster
for parallelism. The problem is that essentially any task that can be
parallelised needs to process the same data that it does in serial
form. Because of this, you can utilise the same buses, I/O subsystems,
and take advantage of 'nearness' to allow some pretty incredible IPC
Multithreading approaches are very important on these systems. In fact,
multithreading is important even on systems with single execution
units. The gap between I/O throughput and processing throughput means
you get a certain degree of 'parallelism' even though you can only run
one thread at a time. Free performance improvement if you employ
parallel design techniques.
Of course, there are certain heavily compute-bound applications where
the degree of IPC is very low, and massive parallelism is possible
regardless of the interconnect used, as IPC constitutes a relatively
small part of the workload. For the rest of the cases though where lots
of data is being consumed, systems that allow low-overhead IPC through
multithreading are the way to go.
From: mdj on 7 Jun 2006 23:36
Michael J. Mahon wrote:
> And most organizations give maintenance tasks to new programmers, as
> a kind of hazing, I think!
> But not supporting the code that you produced, at least for its first
> year in the field, deprives a team of the *real* learning experience,
> in which you discover which of your grand ideas worked and which didn't.
> And it also serves as a test of whether the code is *actually*
> maintainable, as opposed to theoretically maintainable.
> I see doing at least "early" maintenance as a kind of accountability.
Fully agree. All of the applicable data that needs to be fed into the
improvement process comes directly from supporting the code. Severing
the connection between the development group and this process is most
unwise, as it's these people that need to come up with ways of
improving this process. That's where you get real efficiency.
Of course the problem is that if your development team is spending its
time doing support, it doesn't have the time to develop new versions of
code. This is where the the process I outlined previously comes in.
Once the support issues around the product stabilise, you bring in
short-term resources to handle that support, free up the development
team, and start the process again.
> As we enter the era of 10 million transistor FPGAs, system compilers,
> and "turnarounds" measured in seconds--in short, as the constraints on
> hardware design are eased--I expect to see many of the same problems
> that have afflicted software shift into the "hardware" realm.
> Discipline is hard-won. Discipline can only coexist with ease and
> convenience *after* it has been formed through hard experience, since
> ease puts greater demands on discipline.
> Tools can give the appearance of discipline by restricting expression,
> but to a truly disciplined mind, tools are merely secondary.
> I think of "strict" tools as "discipline for the undisciplined", but
> so much of system design is outside the realm of any formal tools, that
> there is no substitute for design discipline. A terrible fate awaits
> those who think that there is.
It's also premanaged complexity, for those that have neither the time
or the resources to mangage it themselves. Tools don't necessarily have
to be strict, just work, and provide access to complex functionality
that's already proven. This is where tool and language evolution is
key. This complexity goes up all the time, while human discipline
evolves slowly and has real, known limits. In order to build more
complex systems, more complex toolsets that encapsulate that complexity
must be employed.
> My second software tools phase was strict typing and enforced structure.
> My mantra was, "If you think you need a macro, then something is missing
> from the language." Experienced programmers chafed at the "training
> wheels" the language forced upon them. Some of them filled their code
> with unstructured "workarounds", perhaps a sign of their resentment at
> the strictures of the programming environment. (Unstructured code can
> be written in any language.)
Been there too. Time has proven it doesn't work well, and that test
driven development techniques provide more safety, and allow more
flexible forms of expression in the process, easing chafing.
> My third software tools phase was "the only thing that matters is
> the team". I strove for a small team of 98th percentile people, who
> implicitly understood the need for and benefits of discipline, and
> who had learned this by experience. Tools are useful, but secondary.
> If a tool is really needed, it will be written. (Structured code can
> be written in any language.)
> Although I don't consider any of the three approaches ideal, there
> is no doubt that the third worked the best, both in terms of team
> esprit and in terms of product quality (function & reliability).
> Don't count too much on tools--it's the people that make the real
The problem is such teams are very hard to build, and keep. And often
the 98th percentile people are already consumed by the very companies
that produce the technology you're trying to leverage. It's really up
to say, the 90th percentile group to manage the complexity for the
rest, and provide it in more accessible forms, through tools that
support higher abstractions, allowing more to be done.
It's certainly not ideal either, and it shifts a lot of 'waste' onto
the machines. But this is the only place you can feasibly put it,
because the machines are cheap and get bigger all the time. The humans
on the other hand....
> > This is the principle reason for evolving languages and tools. Improved
> > langugages allow ideas to be expressed more concisely, support
> > encapsulation mechanisms that allow complex modules to reused, thus
> > allowing complexity to be more effectively managed. Sure it's
> > idealistic to expect new tools solve all the problems, they don't. They
> > do however mitigate some of the old issues and allow some progress to
> > be made.
> For balance, I have to point out that they also permit *needless*
> complexity to be more effectively managed. When "Hello, World!"
> executes 8 megabytes of code, you know something has gone sour.
> (And, yes, I do include *all* the code executed, not just the code
> in the "Hello, World!" module.)
Sure. Of course, it bares pointing out that you're referring to 8mb of
code that Hello World won't execute, but will carry around as a payload
anyway. Runtime systems are getting larger that's true, but they also
only have to be loaded once, thanks to copy on write memory, and much
of the initialisation work can be cached and shared amongst running
applications. Over time the issues bought about by this approach are
being mitigated, and besides, it's good fun work finding ways to tune
It's not ideal, but what's the alternative? If you don't follow this
road, a cap is placed on the possible solutions you can build. The
overheads introduced by high level abstraction systems is a very
interesting field of research, and one that great inroads into managing
has been made.
I can see a time when massive parallel computing clusters 'churn'
through algorithms, fitting them to particular machines are problem
domains. Programming done by the human will be not much more than
assembling from vast libraries of prevalidated solutions. It doesn't
take much thinking into the future to imagine a time when software
complexity is so high that this is the only feasible solution to
building more complex systems.
I think we're more or less on the right track, but breaking the ties to
legacy implementations that simply cannot be scaled in this way is one
of the biggest hurdles to moving further towards solving the current
issues in software design.
From: Michael on 8 Jun 2006 01:31
> Paul Schlyter wrote:
> > What "Java portability" ????
> > Java is not a portable language. Java is less portable than both
> > FORTRAN, C or C++, which runs on several platforms. Java runs on one
> > single platform only: the Java platform.
> Sorry, this isn't true. The Java language specification is quite
> deliberately void of any language construct that would bind it, or any
> program written in it to a specific architecture. The key concepts that
> are missing here are pointers,
It has pointers, just not accessible by the user.
i.e. null reference
> and more specifically, the ability to
> perform arbitrary arithmetic on pointer types. Additionally, the
> language specification defines EXACTLY the size and precision of each
> data type. C and C++ on the other hand, not only allow arbitrary
> pointer arithmetic, but also only define in the standard, the minimum
> size requirements of each data type.
You say that, as if it was a bad thing.
The problem is one of size vs speed, and ease of serialization, which
is why C99 added int#_t, int_fast#_t, int_least#_t
While most code doesn't need to know the bit size of types, you still
need to know the min sizes, so you don't have to worry about underflow
/ overflow. The size issue comes up when serializing. By the language
mandidating features, even if the hardware doesn't support them, say
like doubles on DSPs, or the PS2, is one of the reasons Java is so
See: "How Java's Floating-Point Hurts Everyone Everywhere"
Maybe you have a different experience on "portability" you can comment
on, compared to Carmack's (May 2006) one with Java and cell-pohones?
It turns out that I'm a lot less fond of Java for
resource-constrained work. I remember all the little gripes I had with
the Java language, like no unsigned bytes, and the consequences of
strong typing, like no memset, and the inability to read resources into
anything but a char array, but the frustrating issues are details down
close to the hardware.
The biggest problem is that Java is really slow. On a pure cpu / memory
/ display / communications level, most modern cell phones should be
considerably better gaming platforms than a Game Boy Advanced. With
Java, on most phones you are left with about the CPU power of an
original 4.77 mhz IBM PC, and lousy control over everything.
I spent a fair amount of time looking at java byte code disassembly
while optimizing my little rendering engine. This is interesting fun
like any other optimization problem, but it alternates with a bleak
knowledge that even the most inspired java code is going to be a
fraction the performance of pedestrian native C code.
Even compiled to completely native code, Java semantic requirements
like range checking on every array access hobble it. One of the phones
(Motorola i730) has an option that does some load time compiling to
improve performance, which does help a lot, but you have no idea what
it is doing, and innocuous code changes can cause the compilable
heuristic to fail.
Write-once-run-anywhere. Ha. Hahahahaha. We are only testing on four
platforms right now, and not a single pair has the exact same quirks.
All the commercial games are tweaked and compiled individually for each
(often 100+) platform. Portability is not a justification for the awful