|
Prev: The Mark Thorson FAQ: Read all about this LIAR, BULLY, THUG, CYBERSTALKER and HARASSER
Next: The Mark Robert Thorson FAQ: Read all about this LIAR, BULLY, THUG, CYBERSTALKER and HARASSER
From: Eugene Miya on 7 May 2008 15:44 In article <fvqksi$6pd$1(a)gemini.csx.cam.ac.uk>, Nick Maclaren <nmm1(a)cus.cam.ac.uk> wrote: >I suggest that you try to get a factor of 2 from the X Windowing >System Toolkit - the exercise would stretch your mind. Right. You handle the Maclaren challenge. Might be a good idea to factor out semiconductor device improvement effects. But it's your challlenge. --
From: Eugene Miya on 7 May 2008 15:59
>> Knuth's challenge. >> If either of you even thinks you can get significant parallelism from TeX, >> show me, and I will forward to DEK. I can assure you that he would be >> amused. I watched as about a 100 people at Stanford back from that >> challenge. In article <vkf8f5-eub.ln1(a)annette.mikron.de>, Bernd Paysan <bernd.paysan(a)gmx.de> wrote: >I would certainly have to completely rewrite TeX (rebuild from scratch). >I'm talking about the task TeX has to do, not the implementation. Well email me when you are done. I only pass thru the ng occasionally. I have a real job, I don't have time to read or post with any frequency. > The implementation is serialized beyond repair. Are you saying the TeX is broken? I wonder how big a 2^n pennys check he would write you? The exception handling might be tough. >However, there are different ways to achieve parallelism: > >For a mere factor of 8, a simple filter design would do (using multiple >cores): First filter takes TeX source and expands macros. Next filter takes >expanded TeX and builds paragraphs out of it. Paragraphs then can be >wrapped in parallel (TeX gives each paragraph several tries and chooses the >best one). In effect, you can wrap all paragraphs of a single document at >once. Next is to combine paragraphs to pages, this needs to serialize again >(but for a factor 8, one paragraph at a time is completely sufficient). You'd need 8 balanced stages of pipelining or possibly more stages of unbalanced pipelining to be a virtual factor of 8, and no weird cases like the sample spirialing sentences. >Ok, then let's assume we want to use VLIW or other vector techniques to get >more fine-grain parallelism out of TeX. This requires different data >structures. For example kerning: Optimally, we build a matrix of kerning >for character A followed by character B (wastes space, because most of the >entries are 0). If our text also is placed in an array (instead of a linked >list), we can see that adding kerning is a 100% parallel task - there's no >interdependency, the limiting factor is just how many accesses we can make >to that kerning matrix (random access, but the kerning matrix fits easily >into L1 cache). So let's say we get the Multiflow running again. You can run both cases (list control and rewritten array experiment). >For breaking paragraphs at high speed (and thus high parallelism), we first >fold words (or syllables after hyphenating) into single object (reduce >data) - high parallelism, O(log(N)) sequential limit. Then we compute fixed >and glue width for the sequence of all words in the paragraph, this is a >vector operation with a sequential sum part, i.e. we have a O(log(N)) >sequential limit in this operation. Then we bisect the paragraphs to find >possible line breaks - this is a sequential algorithm, but since we have an >array, we don't need to walk through the list while we do the bisection >steps. Finally, calculate "badness", and select the best. Daze us. -- |