From: avilella on
Hi,

I am looking for a neat way of trying a match of a series of tokens to
another string. E.g.:

$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
$qy1 = "abdca dadcbacb dbdcadbc cbcad dbcadbc"

Because $qy1 contains the characters in $tg1, I want the match to be
true. Whereas:


$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
$qy2 = "abdca dadcbacb aaaaaaaa cbcad dbcadbc"

Now $qy2 has a middle token that is not compatible with $tg, so the
match should be false.

Any suggestions?

Cheers,

Albert.
From: J. Gleixner on
avilella wrote:
> Hi,
>
> I am looking for a neat way of trying a match of a series of tokens to
> another string. E.g.:
>
> $tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
> $qy1 = "abdca dadcbacb dbdcadbc cbcad dbcadbc"
>
> Because $qy1 contains the characters in $tg1, I want the match to be
> true. Whereas:
>
>
> $tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
> $qy2 = "abdca dadcbacb aaaaaaaa cbcad dbcadbc"
>
> Now $qy2 has a middle token that is not compatible with $tg, so the
> match should be false.
>
> Any suggestions?

Use a regular expression, instead of spaces, in $qy1. You could use ".*"
or '.'.

perldoc perlre
perldoc perlop
....
m/PATTERN/msixogc
/PATTERN/msixogc
Searches a string for a pattern match, and in scalar context
....



From: Dilbert on
On 22 avr, 17:00, avilella <avile...(a)gmail.com> wrote:
> Hi,
>
> I am looking for a neat way of trying a match of a series of tokens to
> another string. E.g.:
>
> $tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
> $qy1 = "abdca     dadcbacb       dbdcadbc      cbcad      dbcadbc"
>
> Because $qy1 contains the characters in $tg1, I want the match to be
> true. Whereas:
>
> $tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
> $qy2 = "abdca     dadcbacb       aaaaaaaa      cbcad      dbcadbc"
>
> Now $qy2 has a middle token that is not compatible with $tg, so the
> match should be false.
>
> Any suggestions?

One way to look at this problem is through "Algorithm::Diff" glasses:

use strict;
use warnings;
use Algorithm::Diff qw(sdiff);

my $tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
my $qy1 = "abdca dadcbacb dbdcadbc cbcad dbcadbc";
my $qy2 = "abdca dadcbacb aaaaaaaa cbcad dbcadbc";

print "case-a: first string : '$tg1'\n";
print "case-a: second string : '$qy1'\n";
print "case-a: degree of diff : ", degree_of_difference($tg1, $qy1),
"\n";
print "\n";

print "case-b: first string : '$tg1'\n";
print "case-b: second string : '$qy2'\n";
print "case-b: degree of diff : ", degree_of_difference($tg1, $qy2),
"\n";
print "\n";

sub degree_of_difference {
my ($string_x, $string_y) = @_;

s{\s}''xmsg for $string_x, $string_y;

# the longest string always comes first:
if (length($string_x) < length($string_y)) {
my $temp = $string_x;
$string_x = $string_y;
$string_y = $temp;
}

my @chain_x = split m{}xms, $string_x;
my @chain_y = split m{}xms, $string_y;

my @sd = sdiff(\@chain_x, \@chain_y);

my $inserts = () = grep {$_->[0] eq '+'} @sd;
my $deletes = () = grep {$_->[0] eq '-'} @sd;
my $changes = () = grep {$_->[0] eq 'c'} @sd;
my $unchanged = () = grep {$_->[0] eq 'u'} @sd;

$inserts + $changes;
}

The output is:

case-a: first string :
'abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc'
case-a: second string : 'abdca dadcbacb dbdcadbc
cbcad dbcadbc'
case-a: degree of diff : 0

case-b: first string :
'abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc'
case-b: second string : 'abdca dadcbacb aaaaaaaa
cbcad dbcadbc'
case-b: degree of diff : 5

One could argue that the "degree-of-diff" = 0 in case-a implies that
the match is true.

With the same argument we find that "degree-of-diff" = 5 in case-b
implies that the match is false.

This is only one way to look at the problem, I am sure that there are
many more different ways to look at the problem.
From: sln on
On Thu, 22 Apr 2010 08:00:46 -0700 (PDT), avilella <avilella(a)gmail.com> wrote:

>Hi,
>
>I am looking for a neat way of trying a match of a series of tokens to
>another string. E.g.:
>
>$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
>$qy1 = "abdca dadcbacb dbdcadbc cbcad dbcadbc"
>
>Because $qy1 contains the characters in $tg1, I want the match to be
>true. Whereas:
>
>
>$tg1 = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
>$qy2 = "abdca dadcbacb aaaaaaaa cbcad dbcadbc"
>
>Now $qy2 has a middle token that is not compatible with $tg, so the
>match should be false.
>
>Any suggestions?
>
You could use index if the tokens are constant.

use strict;
use warnings;

my $String = "abdcadbcdadcbacbacbadbdcadbcbdcdcbcadabadbcadbc";
my @Toks = qw(abdca dadcbacb dbdcadbc aaaaaaaa cbcad dbcadbc);

print "\n$String'\n\n";
for my $tok (@Toks) {
my $pos = index $String, $tok;
if ($pos >= 0) {
printf "found (%2d): %s\n", $pos, $tok;
}
else {
printf "not found : %s\n", $tok;
}
}

-sln