Fastest DFA recognizer [Theory]

Prev: Atomic variable Vs. Atomic operation
Next: How to program critical section for reader-writer systems?

From: Peter Olcott on 20 May 2010 00:16

I was wondering if there is a generally faster way to
implement a DFA recognizer than by using a state transition
matrix.

I already asked this question on several other groups and
never received any definitive answer.

From: Tim Little on 20 May 2010 05:05

On 2010-05-20, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote:
> I was wondering if there is a generally faster way to implement a
> DFA recognizer than by using a state transition matrix.

I wouldn't expect so, as O(1) per symbol is about the fastest you
could reasonably expect.

- Tim

From: Torben �gidius Mogensen on 20 May 2010 05:47

"Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:

> I was wondering if there is a generally faster way to implement a DFA
> recognizer than by using a state transition matrix.

Not asymptotically faster, since it is linear (assuming matrix lookup is
constant time). You can sometimes get faster code by translating the
matrix into code (essentially a switch statement per state) instead of
using an array, but it depends on the compiler which is faster. Flex
does this.

You can also halve the number of table lookups if you match two
characters at a time. This makes the table enormously larger, though.
If you use 8-bit characters, the table size is 256 x states. If you
match two characters at a time, it is 65536 x states, i.e., 256 times
larger.

Since matrix lookup is not constant time on modern machines, this might
actually make your code slower. So you can do the converse: Match half
a character (4 bits) at a time, which makes your table roughly 32 x
states (since you need to insert extra states). This reduces your table
by 8, which will more likely make it fit your cache. Obviously, if
there are only a few states, it will anyway.

Torben

From: Peter Olcott on 20 May 2010 10:14

"Torben �gidius Mogensen" <torbenm(a)diku.dk> wrote in message
news:7z7hmygb05.fsf(a)ask.diku.dk...
> "Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:
>
>> I was wondering if there is a generally faster way to
>> implement a DFA
>> recognizer than by using a state transition matrix.
>
> Not asymptotically faster, since it is linear (assuming
> matrix lookup is
> constant time). You can sometimes get faster code by
> translating the
> matrix into code (essentially a switch statement per
> state) instead of
> using an array, but it depends on the compiler which is
> faster. Flex
> does this.

I use a state transition matrix to look up the switch case
value. I can not imagine how to (for example) match 50
different keywords without the table and only use just the
switch statement. How is this done?

>
> You can also halve the number of table lookups if you
> match two
> characters at a time. This makes the table enormously
> larger, though.
> If you use 8-bit characters, the table size is 256 x
> states. If you
> match two characters at a time, it is 65536 x states,
> i.e., 256 times
> larger.
>
> Since matrix lookup is not constant time on modern
> machines, this might
> actually make your code slower. So you can do the
> converse: Match half
> a character (4 bits) at a time, which makes your table
> roughly 32 x
> states (since you need to insert extra states). This
> reduces your table
> by 8, which will more likely make it fit your cache.
> Obviously, if
> there are only a few states, it will anyway.
>
> Torben

From: Ben Bacarisse on 20 May 2010 11:38

"Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:

> "Torben Ægidius Mogensen" <torbenm(a)diku.dk> wrote in message
> news:7z7hmygb05.fsf(a)ask.diku.dk...
>> "Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:
>>
>>> I was wondering if there is a generally faster way to implement a
>>> DFA recognizer than by using a state transition matrix.
>>
>> Not asymptotically faster, since it is linear (assuming matrix lookup
>> is constant time). You can sometimes get faster code by translating
>> the matrix into code (essentially a switch statement per state)
>> instead of using an array, but it depends on the compiler which is
>> faster. Flex does this.
>
> I use a state transition matrix to look up the switch case
> value. I can not imagine how to (for example) match 50
> different keywords without the table and only use just the
> switch statement. How is this done?

I suspect you may just be using a mixed version that is half table and
half switch driven. The fully matrix version looks like this:

state := initial_state;
while !is_final(state) do
state := state_table[state, get_input()];

while the full case version looks like this:

state := initial_state;
while !is_final(state) do
c := get_input()
switch state in
case 0:
switch c in
case 'a': state := 1;
case 'b': state := 2;
end
case 1:
switch c in
case 'a': state := 2;
end
/* and so on... */
end

corresponding to a table in which state_table[0, 'a'] is 1 and
state_table[0, 'b'] and state_table[1, 'a'] are 2. (state_table[x, y]
is x for all other x and y.)

<snip>
--
Ben.

| Next | Last
Pages: 1 2 3
Prev: Atomic variable Vs. Atomic operation
Next: How to program critical section for reader-writer systems?