From: Eugene Miya on
In article <fvqksi$6pd$1(a)gemini.csx.cam.ac.uk>,
Nick Maclaren <nmm1(a)cus.cam.ac.uk> wrote:
>I suggest that you try to get a factor of 2 from the X Windowing
>System Toolkit - the exercise would stretch your mind.

Right.
You handle the Maclaren challenge.
Might be a good idea to factor out semiconductor device improvement effects.
But it's your challlenge.

--
From: Eugene Miya on
>> Knuth's challenge.
>> If either of you even thinks you can get significant parallelism from TeX,
>> show me, and I will forward to DEK. I can assure you that he would be
>> amused. I watched as about a 100 people at Stanford back from that
>> challenge.

In article <vkf8f5-eub.ln1(a)annette.mikron.de>,
Bernd Paysan <bernd.paysan(a)gmx.de> wrote:
>I would certainly have to completely rewrite TeX (rebuild from scratch).
>I'm talking about the task TeX has to do, not the implementation.

Well email me when you are done. I only pass thru the ng occasionally.
I have a real job, I don't have time to read or post with any frequency.


> The implementation is serialized beyond repair.

Are you saying the TeX is broken?

I wonder how big a 2^n pennys check he would write you?

The exception handling might be tough.

>However, there are different ways to achieve parallelism:
>
>For a mere factor of 8, a simple filter design would do (using multiple
>cores): First filter takes TeX source and expands macros. Next filter takes
>expanded TeX and builds paragraphs out of it. Paragraphs then can be
>wrapped in parallel (TeX gives each paragraph several tries and chooses the
>best one). In effect, you can wrap all paragraphs of a single document at
>once. Next is to combine paragraphs to pages, this needs to serialize again
>(but for a factor 8, one paragraph at a time is completely sufficient).

You'd need 8 balanced stages of pipelining or possibly more stages of
unbalanced pipelining to be a virtual factor of 8, and no weird cases like the
sample spirialing sentences.

>Ok, then let's assume we want to use VLIW or other vector techniques to get
>more fine-grain parallelism out of TeX. This requires different data
>structures. For example kerning: Optimally, we build a matrix of kerning
>for character A followed by character B (wastes space, because most of the
>entries are 0). If our text also is placed in an array (instead of a linked
>list), we can see that adding kerning is a 100% parallel task - there's no
>interdependency, the limiting factor is just how many accesses we can make
>to that kerning matrix (random access, but the kerning matrix fits easily
>into L1 cache).

So let's say we get the Multiflow running again. You can run both cases
(list control and rewritten array experiment).

>For breaking paragraphs at high speed (and thus high parallelism), we first
>fold words (or syllables after hyphenating) into single object (reduce
>data) - high parallelism, O(log(N)) sequential limit. Then we compute fixed
>and glue width for the sequence of all words in the paragraph, this is a
>vector operation with a sequential sum part, i.e. we have a O(log(N))
>sequential limit in this operation. Then we bisect the paragraphs to find
>possible line breaks - this is a sequential algorithm, but since we have an
>array, we don't need to walk through the list while we do the bisection
>steps. Finally, calculate "badness", and select the best.

Daze us.

--