Fastest DFA recognizer [Theory]

Prev: Atomic variable Vs. Atomic operation
Next: How to program critical section for reader-writer systems?

From: Peter Olcott on 20 May 2010 12:15

"Ben Bacarisse" <ben.usenet(a)bsb.me.uk> wrote in message
news:0.28e56633a8ee0dd7b329.20100520163818BST.878w7ek2hh.fsf(a)bsb.me.uk...
> "Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:
>
>> "Torben �gidius Mogensen" <torbenm(a)diku.dk> wrote in
>> message
>> news:7z7hmygb05.fsf(a)ask.diku.dk...
>>> "Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:
>>>
>>>> I was wondering if there is a generally faster way to
>>>> implement a
>>>> DFA recognizer than by using a state transition matrix.
>>>
>>> Not asymptotically faster, since it is linear (assuming
>>> matrix lookup
>>> is constant time). You can sometimes get faster code by
>>> translating
>>> the matrix into code (essentially a switch statement per
>>> state)
>>> instead of using an array, but it depends on the
>>> compiler which is
>>> faster. Flex does this.
>>
>> I use a state transition matrix to look up the switch
>> case
>> value. I can not imagine how to (for example) match 50
>> different keywords without the table and only use just
>> the
>> switch statement. How is this done?
>
> I suspect you may just be using a mixed version that is
> half table and
> half switch driven. The fully matrix version looks like
> this:
>
> state := initial_state;
> while !is_final(state) do
> state := state_table[state, get_input()];
>
> while the full case version looks like this:
>
> state := initial_state;
> while !is_final(state) do
> c := get_input()
> switch state in
> case 0:
> switch c in
> case 'a': state := 1;
> case 'b': state := 2;
> end
> case 1:
> switch c in
> case 'a': state := 2;
> end
> /* and so on... */
> end
>
> corresponding to a table in which state_table[0, 'a'] is 1
> and
> state_table[0, 'b'] and state_table[1, 'a'] are 2.
> (state_table[x, y]
> is x for all other x and y.)
>
> <snip>
> --
> Ben.

That is completely obvious now. Simply implement the state
transition matrix as a set of case statements corresponding
to possible inputs each nested within a case element of a
case statement representing all possible states.

It is also reasonably obvious which one would be faster. If
the case within case form reduces the size of the state
transition matrix enough to fit within cache and the table
form can not fit within cache the spatial locality of
reference might more than make up for the additional case
statement overhead.

From: Ben Bacarisse on 20 May 2010 13:09

"Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:

> "Ben Bacarisse" <ben.usenet(a)bsb.me.uk> wrote in message
> news:0.28e56633a8ee0dd7b329.20100520163818BST.878w7ek2hh.fsf(a)bsb.me.uk...
<snip>
>>> "Torben Ægidius Mogensen" <torbenm(a)diku.dk> wrote in
>>> message news:7z7hmygb05.fsf(a)ask.diku.dk...
<snip>
>>>> Not asymptotically faster, since it is linear (assuming matrix
>>>> lookup is constant time). You can sometimes get faster code by
>>>> translating the matrix into code (essentially a switch statement
>>>> per state) instead of using an array, but it depends on the
>>>> compiler which is faster. Flex does this.
<snip>
>> I suspect you may just be using a mixed version that is half table
>> and half switch driven. The fully matrix version looks like this:
>>
>> state := initial_state;
>> while !is_final(state) do
>> state := state_table[state, get_input()];
>>
>> while the full case version looks like this:
>>
>> state := initial_state;
>> while !is_final(state) do
>> c := get_input()
>> switch state in
>> case 0:
>> switch c in
>> case 'a': state := 1;
>> case 'b': state := 2;
>> end
>> case 1:
>> switch c in
>> case 'a': state := 2;
>> end
>> /* and so on... */
>> end
<snip>
> That is completely obvious now. Simply implement the state
> transition matrix as a set of case statements corresponding
> to possible inputs each nested within a case element of a
> case statement representing all possible states.

You can do the equivalent of transposing the matrix and switch on the
input first and have a contained switch on the state. Which is better
will probably depend on lots of details. Humans usually find the first
from clearer but if the ode is being generated automatically that hardly
matters.

> It is also reasonably obvious which one would be faster.

You are better at it than me then. I never cease to be surprised when I
time code on modern hardware!

> If the case within case form reduces the size of the state transition
> matrix enough to fit within cache and the table form can not fit
> within cache the spatial locality of reference might more than make up
> for the additional case statement overhead.

If you say so but I'd measure it.

--
Ben.

From: Tim Little on 21 May 2010 00:06

On 2010-05-20, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote:
> If the case within case form reduces the size of the state
> transition matrix enough to fit within cache and the table form can
> not fit within cache the spatial locality of reference might more
> than make up for the additional case statement overhead.

The general case within case form will almost certainly not be
smaller, as it will likely be implemented in the form of jump tables.
Every entry in the original transition matrix will have an entry in
one of the jump tables - and just as likely to be a full instruction
pointer instead of a relatively small state number.

It is possible that a sufficiently smart compiler may identify that
some patterns exist and create special code for those. In that case
perhaps it might end up smaller.

- Tim

From: Chris F Clark on 21 May 2010 01:35

"Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:

> I was wondering if there is a generally faster way to
> implement a DFA recognizer than by using a state transition
> matrix.
>
> I already asked this question on several other groups and
> never received any definitive answer.

Did you ask on comp.compilers?

In any case, looking at the answers here. It looks like one is just
fiddling with the proportionality constant. A transition matrix and
nested switch tables are exactly equivalent in terms that they both
are linear (with slightly different constants), which constant is
better depends upon how big your DFA is, how clever the compiler is,
and what size of caches your machine has. For small enough DFA's the
case statement will generally be faster. For most compilers, though,
eventually, when one gets a large enough DFA, the optimizer and code
generator "break", sometimes literally, but more often simply cannot
track enough information to generate optimal code and the table method
actually gets faster.

Note, case statements aren't your only option. Most DFA's can be
converted into what looks like hand-code programs. That takes *a lot*
more work. That is in essence what people do when writing hand-
written recursive descent parsers.

ab* could be translated into something like:

if ( token() = 'a' ) {
while ( token() = 'b' ) ;
}
accept();

Note, programs like this (which are hard to write correctly) *can*
produce even faster lexers than case statement ones, because this code
is more similar to what the optimizer and code generator were
originally written and tuned for.

That is still not the fastest one can do, depending on the situation.
It is probably the fastest if one is only feeding in inputs that
match, but it might be slower at rejecting non-matching inputs than
some non-obvious other methods. For example, the methodology of
Boyer-Moore string searches can be applied to certain DFAs to quickly
look for places where the DFA might fail. For accepting one has to
look at every input byte, but for rejecting there is no such
requirement, and the BM technique is suggestive of how one can quickly
find rejectable bytes in the input (if you don't have to process it
linearly).

However, since I assume you are looking for a simple DFA recognizer
that processes its input left-to-right, and probably want one in a
high-level language, you have already "found" your answer. If you are
willing to use assembly language, find the paper by Thomas Pennello on
"the Fastest LR Parser". The heart of an LR parser is essentially a
DFA recognizer. They went from the state matrix to optimal VAX
assembly language. Still I suspect for a large enough DFA, a table
interpreter could still have been faster due to cache issues.

Hope this helps,
-Chris

******************************************************************************
Chris Clark email: christopher.f.clark(a)compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------

From: Peter Olcott on 21 May 2010 10:23

"Tim Little" <tim(a)little-possums.net> wrote in message
news:slrnhvc1m6.jrj.tim(a)soprano.little-possums.net...
> On 2010-05-20, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote:
>> If the case within case form reduces the size of the
>> state
>> transition matrix enough to fit within cache and the
>> table form can
>> not fit within cache the spatial locality of reference
>> might more
>> than make up for the additional case statement overhead.
>
> The general case within case form will almost certainly
> not be
> smaller, as it will likely be implemented in the form of
> jump tables.
> Every entry in the original transition matrix will have an
> entry in
> one of the jump tables - and just as likely to be a full
> instruction
> pointer instead of a relatively small state number.
>
> It is possible that a sufficiently smart compiler may
> identify that
> some patterns exist and create special code for those. In
> that case
> perhaps it might end up smaller.
>
>
> - Tim

I was thinking of the case of a lexical analyzer that needs
to recognize fifty keywords. In this case most of the
alphabet does not need to be explicitly represented, only
those letters that are part of keywords that are to be
matched need be represented.

If these sets of letters do not form contiguous sequences
the compiler may implement this as a sequence of if, else
if, else if... If they do form contiguous sequences the
compiler must still generate code to check for the beginning
and ending of the sequence. Because of this it is at least
theoretically possible for the case within case form to be
faster.

First | Prev | Next | Last
Pages: 1 2 3
Prev: Atomic variable Vs. Atomic operation
Next: How to program critical section for reader-writer systems?