From: Peter Olcott on
I was wondering if there is a generally faster way to
implement a DFA recognizer than by using a state transition
matrix.

I already asked this question on several other groups and
never received any definitive answer.


From: Tim Little on
On 2010-05-20, Peter Olcott <NoSpam(a)OCR4Screen.com> wrote:
> I was wondering if there is a generally faster way to implement a
> DFA recognizer than by using a state transition matrix.

I wouldn't expect so, as O(1) per symbol is about the fastest you
could reasonably expect.


- Tim
From: Torben �gidius Mogensen on
"Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:

> I was wondering if there is a generally faster way to implement a DFA
> recognizer than by using a state transition matrix.

Not asymptotically faster, since it is linear (assuming matrix lookup is
constant time). You can sometimes get faster code by translating the
matrix into code (essentially a switch statement per state) instead of
using an array, but it depends on the compiler which is faster. Flex
does this.

You can also halve the number of table lookups if you match two
characters at a time. This makes the table enormously larger, though.
If you use 8-bit characters, the table size is 256 x states. If you
match two characters at a time, it is 65536 x states, i.e., 256 times
larger.

Since matrix lookup is not constant time on modern machines, this might
actually make your code slower. So you can do the converse: Match half
a character (4 bits) at a time, which makes your table roughly 32 x
states (since you need to insert extra states). This reduces your table
by 8, which will more likely make it fit your cache. Obviously, if
there are only a few states, it will anyway.

Torben
From: Peter Olcott on

"Torben �gidius Mogensen" <torbenm(a)diku.dk> wrote in message
news:7z7hmygb05.fsf(a)ask.diku.dk...
> "Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:
>
>> I was wondering if there is a generally faster way to
>> implement a DFA
>> recognizer than by using a state transition matrix.
>
> Not asymptotically faster, since it is linear (assuming
> matrix lookup is
> constant time). You can sometimes get faster code by
> translating the
> matrix into code (essentially a switch statement per
> state) instead of
> using an array, but it depends on the compiler which is
> faster. Flex
> does this.

I use a state transition matrix to look up the switch case
value. I can not imagine how to (for example) match 50
different keywords without the table and only use just the
switch statement. How is this done?

>
> You can also halve the number of table lookups if you
> match two
> characters at a time. This makes the table enormously
> larger, though.
> If you use 8-bit characters, the table size is 256 x
> states. If you
> match two characters at a time, it is 65536 x states,
> i.e., 256 times
> larger.
>
> Since matrix lookup is not constant time on modern
> machines, this might
> actually make your code slower. So you can do the
> converse: Match half
> a character (4 bits) at a time, which makes your table
> roughly 32 x
> states (since you need to insert extra states). This
> reduces your table
> by 8, which will more likely make it fit your cache.
> Obviously, if
> there are only a few states, it will anyway.
>
> Torben


From: Ben Bacarisse on
"Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:

> "Torben Ægidius Mogensen" <torbenm(a)diku.dk> wrote in message
> news:7z7hmygb05.fsf(a)ask.diku.dk...
>> "Peter Olcott" <NoSpam(a)OCR4Screen.com> writes:
>>
>>> I was wondering if there is a generally faster way to implement a
>>> DFA recognizer than by using a state transition matrix.
>>
>> Not asymptotically faster, since it is linear (assuming matrix lookup
>> is constant time). You can sometimes get faster code by translating
>> the matrix into code (essentially a switch statement per state)
>> instead of using an array, but it depends on the compiler which is
>> faster. Flex does this.
>
> I use a state transition matrix to look up the switch case
> value. I can not imagine how to (for example) match 50
> different keywords without the table and only use just the
> switch statement. How is this done?

I suspect you may just be using a mixed version that is half table and
half switch driven. The fully matrix version looks like this:

state := initial_state;
while !is_final(state) do
state := state_table[state, get_input()];

while the full case version looks like this:

state := initial_state;
while !is_final(state) do
c := get_input()
switch state in
case 0:
switch c in
case 'a': state := 1;
case 'b': state := 2;
end
case 1:
switch c in
case 'a': state := 2;
end
/* and so on... */
end

corresponding to a table in which state_table[0, 'a'] is 1 and
state_table[0, 'b'] and state_table[1, 'a'] are 2. (state_table[x, y]
is x for all other x and y.)

<snip>
--
Ben.