From: mirza2m on
Hi,
Here is link to the paper http://www.ee.iastate.edu/~aluru/publications/p/CPM-2003-LinearTimeSuffixArrays.pdf
I understand most of it but cannot figure out last part of sorting of
the S array using S-Distance array. More Specifically In Step 3 of
Figure 3 why did substring 5 move up higher than substring 2?

Thank you
From: user923005 on
On Jul 2, 7:24 am, mirza2m <babar.ans...(a)gmail.com> wrote:
> Hi,
> Here is link to the paperhttp://www.ee.iastate.edu/~aluru/publications/p/CPM-2003-LinearTimeSu...
> I understand most of it but cannot figure out last part of sorting of
> the S array using S-Distance array. More Specifically In Step 3 of
> Figure 3 why did substring 5 move up higher than substring 2?

This appears to be an efficient implementation of the idea:
http://code.google.com/p/libdivsufsort/

I have found it easy to build and test.
It has a very favorable MIT style license.
It sure is nice to live in the information age.
From: ko.pang.cn on
On Jul 2, 7:24 am, mirza2m <babar.ans...(a)gmail.com> wrote:
> Hi,
> Here is link to the paperhttp://www.ee.iastate.edu/~aluru/publications/p/CPM-2003-LinearTimeSu...
> I understand most of it but cannot figure out last part of sorting of
> the S array using S-Distance array. More Specifically In Step 3 of
> Figure 3 why did substring 5 move up higher than substring 2?
>
> Thank you

It is a mistake in the paper, as you correctly point out, 2 should
come before 5. But the order within a bucket is actually unimportant,
as long as they are in the same bucked as indicated in the paper.

libdivsufsort is a good implementation of my algorithm. However, if
the linear behavior is important to you, e.g. you expect the input to
have long repeats, which results in high average lcp between the
suffixes, you may want to alter it to make it run in linear time.
From: mirza2m on
On Jul 7, 2:32 pm, ko.pang...(a)gmail.com wrote:
> On Jul 2, 7:24 am, mirza2m <babar.ans...(a)gmail.com> wrote:
>
> > Hi,
> > Here is link to the paperhttp://www.ee.iastate.edu/~aluru/publications/p/CPM-2003-LinearTimeSu...
> > I understand most of it but cannot figure out last part of sorting of
> > the S array using S-Distance array. More Specifically In Step 3 of
> > Figure 3 why did substring 5 move up higher than substring 2?
>
> > Thank you
>
> It is a mistake in the paper, as you correctly point out, 2 should
> come before 5. But the order within a bucket is actually unimportant,
> as long as they are in the same bucked as indicated in the paper.
>
> libdivsufsort is a good implementation of my algorithm. However, if
> the linear behavior is important to you, e.g. you expect the input to
> have long repeats, which results in high average lcp between the
> suffixes, you may want to alter it to make it run in linear time.

Thank you all for your replies. I see the mistake was fixed in the
newer edition of this paper. Now I have another question.

In the following steps

(1) Classifying the types of all suffixes.
(2) Sorting all suffixes according to their first character.
(3) Constructing m lists according to the S-distance of each suffix,
and the
sorted order of their first character.
(4) Sorting all type S substrings by repeated bucketing using the m
lists.
(5) Constructing a new string T$B!l(B according to the bucket numbers of
type S
substrings.
(6) Recursively applying our algorithm, and obtaining the sorted order
of
type S suffixes.
(7) Constructing the suffix array from the sorted order of all type S
suffixes.

Is the only purpose for (5),(6) and (7) is to determine the sorted
order of type S suffixes that fall into the
same bucket after step (4) is done? In other words if each bucket
contained only one S type suffix. String T' would be equivalent to S
suffixes sorted?

Thank you