From: Andy Glew on
On 7/9/2010 8:09 PM, Robert Myers wrote:
> nmm1(a)cam.ac.uk wrote:
>>
>> Yes. There is a lot of good mathematical work on this, in the area
>> of real (i.e. Kolmogorov) complexity theory. Most of it can be
>> summarised for practical use as "if you have a genuinely random
>> finite state machine, you are up the creek without a paddle, so
>> consider what empirical, semi-mathematical heuristics you can use."
>>
> The theory and practice of identifying patterns in semi-random time
> series must be very well developed, if only because of those
> front-running supercomputers humming away on Wall Street. Is there a
> related subfield (Andy apparently has developed one for his own use)
> where the available resources are extremely constrained? How much of
> this is genuinely new ground and how much might already exist in some
> other context?

Haven't developed. Want to develop.

Trevor Mudge and his students have

Analysis of Branch Prediction via Data Compression (1996)
http://cardit.et.tudelft.nl/~heco/lit/chen96.ps.gz
by I-Cheng K. Chen , John T. Coffey , Trevor N. Mudge
in Proceedings of the 7th International Conference on Architectural
Support for Programming Languages and Operating Systems

Limits to Branch Prediction (1996)
by Trevor N. Mudge , I-Cheng K. Chen , John T. Coffey
In Proceedings of the Seventh International Conference on Architectural
Support for Programming Languages and Operating Systems (ASPLOS VII)


however, although the paper's titles look promising, I do not think they
have truly hit the limits.

---

Kolmogorov theory also looks promising, but I have not found as much as
I would like in the area of analysis of semi-random, as opposed to truly
random.
From: nmm1 on
In article <dwRZn.6659$KT3.5929(a)newsfe13.iad>,
Robert Myers <rbmyersusa(a)gmail.com> wrote:
>>
>> Yes. There is a lot of good mathematical work on this, in the area
>> of real (i.e. Kolmogorov) complexity theory. Most of it can be
>> summarised for practical use as "if you have a genuinely random
>> finite state machine, you are up the creek without a paddle, so
>> consider what empirical, semi-mathematical heuristics you can use."
>>
>The theory and practice of identifying patterns in semi-random time
>series must be very well developed, if only because of those
>front-running supercomputers humming away on Wall Street. Is there a
>related subfield (Andy apparently has developed one for his own use)
>where the available resources are extremely constrained? How much of
>this is genuinely new ground and how much might already exist in some
>other context?
>
>The empirical, semi-mathematical heuristics must have themselves come
>from a learning predictor somewhere, even if it is the brain of a
>computer architect.

No, no, not at all. Firstly, the whole point of such things IS that
there is no theory - if anything works, fine, and trial-and-error is
the only way to find out. And, worse, the very approaches to try
are selected by guesswork.

Secondly, all that those people who gamble with our money need is a
small margin (say, 5%) - and that would be of no use in this context.


Regards,
Nick Maclaren.
From: Terje Mathisen "terje.mathisen at on
George Neuner wrote:
> On Fri, 09 Jul 2010 22:03:31 +0200, Terje Mathisen<"terje.mathisen at
> tmsw.no"> wrote:
>
>> MitchAlsup wrote:
>>> But beyond this, the predictors are really swimming up stream agains
>>> the current of random placement of semi-random sized objects allocated
>>> on a heap. Especially when subject to garbage collection.
>>
>> I think this is a key issue:
>>
>> Garbage collectors seems to me to be _the_ proper spot to implement data
>> location optimizations:
>>
>> Since most garbage-collectable languages allow the collector to have
>> full knowledge of all pointer data items, it should also be capable of
>> detecting left/right pointers in a tree structure right?
>>
>> If so, one possible approach would be to try to minimize the total
>> distance covered by said pointers, measured as the offset between the
>> address of the pointer itself and the object it points to.
>
> Coalescing tree data structures during GC has been studied quite a
> bit. There seems to be consensus that, where possible, the best thing
> to do is to maintain the address order of object allocations because
> that tends to be a good predictor of object lifetimes. But
> maintaining object address ordering is relatively complex due to the
> need to sort addresses, so where it is infeasible, regrouping
> according to depth first traversal has been shown to yield slightly
> better locality than breadth first (at least for the test cases that
> have been studied).

[big snip]

> Not to mention that GC itself, regardless of algorithm(s), is a fairly
> bad fit to LRU paging because the collection process potentially may
> touch and drag in large amounts of non-resident data - displacing data
> actively in use. There have been proposals for various extensions to
> VMM APIs to give processes more direct control over their own paging,
> but, of course, these may be at odds with the global policies the VMM
> system may be trying to implement.

Thanks for an enlightening post!

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"
From: Robert Myers on
Terje Mathisen wrote:

>
> Since 60-70% of the decoding time was spent in the IMDCT (Inverse
> Modified Discrete Cosine Transform), the initial suggestion from Jeff
> Roberts was to simply write a SIMD version of the existing IMDCT (which
> was believed to be optimal, but written in plain C).
>

I've been out of the ridiculously fast Fourier transform business for a
while, but there were a lot of specialized tricks you could play that
could reduce the operations count if you were willing to search beyond
the recursive neatness of divide and conquer. I don't know how
well-known they ever were or how useful they ever were outside some very
specialized (and, at the time, classified) contexts.

Robert.
From: Terje Mathisen "terje.mathisen at on
Robert Myers wrote:
> Terje Mathisen wrote:
>
>>
>> Since 60-70% of the decoding time was spent in the IMDCT (Inverse
>> Modified Discrete Cosine Transform), the initial suggestion from Jeff
>> Roberts was to simply write a SIMD version of the existing IMDCT
>> (which was believed to be optimal, but written in plain C).
>>
>
> I've been out of the ridiculously fast Fourier transform business for a
> while, but there were a lot of specialized tricks you could play that
> could reduce the operations count if you were willing to search beyond
> the recursive neatness of divide and conquer. I don't know how
> well-known they ever were or how useful they ever were outside some very
> specialized (and, at the time, classified) contexts.

The IMDCT is less regular than FFT, and a bit harder to play tricks
with, but for short lengths, like the fixed 17-sample transform used by
mp3, you basically unroll it completely, then combine as many operations
as possible.

Ogg Vorbis can work with more or less any pair of power of two lengths,
the default has been to use 256 and 2048, but a decoder has to be
prepared to handle any length.

The rewrite combined the two outermost recursion levels, combining some
operations and getting rid of some.

It is possible to implement IMDCT with a regular FFT, but the required
pre and post processing made that less interesting.

OTOH, afaik it should definitely be possible to plug length=256 and
vector=4 into the FFTW synthesizer and get a very big, completely
unrolled, minimum-operation count, piece of code out of it.

The three problems with this would be

(a) code size
(b) the need to handle length=2048 (as well as any other power of two)
(c) licensing.

Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"