From: Ilya Zakharevich on
On 2010-04-13, Kyle T. Jones <KBfoMe(a)realdomain.net> wrote:
>> Solution 1:
>> ---ab
>> a1bab
>>
>
>
> [
> ['+', 0, 'a'],
> ['+', 1, '1'],
> ['+', 2, 'b'],
> ]
>
>> Solution 2:
>> a-b--
>> a1bab
>>
>
>
> [
> ['+', 1, '1'],
> ['+', 3, 'a'],
> ['+', 4, 'b'],
> ]
>
>> Solution 3:
>> a---b
>> a1bab
>>
>
>
> [
> ['+', 1, '1'],
> ['+', 2, 'b'],
> ['+', 3, 'a'],
> ]

> Why are any of the three better?

Obviously, the metric the OP wants is: assign the "identity edit"
weight eps, and any other edit weight 1+eps, with the exception that N
consecutive edits of the same type get weight N+eps, not N + N*eps.

Looks reasonable (if one convert it to a cheap algorithm to find the
best match...).

Hope this helps,
Ilya
From: Ed on
On Apr 12, 4:24 pm, Dilbert <dilbert1...(a)gmail.com> wrote:

> Theoretically there are 3 solutions with LCS = 2:
>
> Solution 1:
> ---ab
> a1bab
>
> Solution 2:
> a-b--
> a1bab
>
> Solution 3:
> a---b
> a1bab
>
> I understand that any of those 3 solutions could be returned by
> Algorithm::Diff, but I would argue that solution 1 is "better" than
> solution 2 or 3, because solution 1 changes only once between '-' and
> [ab], whereas solution 2 and 3 change more than once between '-' and
> [ab].

> my $d = Algorithm::Diff::sdiff(\@old, \@new);

> How can I teach Algorithm::Diff to choose Solution 1 (the best of the
> 3 possibilities) ?

Look at traverse_balanced as a starting point. Basically you'd need
to write your own diff calculator based off the LCS, using whatever
method you feel is appropriate. Once you get to the point where
you're looking for the "best" of the possible solutions, you are in
new territory since you'll have to consider the solution set. I don't
think there's anything in Algorithm::Diff that does that sort of thing
- I believe the code simply finds the first solution that uses the
given LCS.