From: GZ on
On Jun 5, 8:42 pm, Ben Finney <ben+pyt...(a)benfinney.id.au> wrote:
> GZ <zyzhu2...(a)gmail.com> writes:
> > Let me think of a better way to express what I mean by a "smaller
> >diff." After Idiffthe two strings, I will have something like this:
>
> >   AAA
> > - BBB
> > + CCC
> > + DDD
> > - EEE
>
> > It means the first line does not change, the second line is replaced
> > by the third line, the forth line is new, and the fifth line is
> > deleted.
>
> Are you drawing a distinction between:
>
>   * “line FOO is replaced by line BAR”
>   * “line FOO is deleted, line BAR is added”
>
> Your wording seems to make that distinction, but I don't see how it's
> useful or meaningful in a discussion aboutdiff. Are they not exactly
> the same?
>
> --
>  \     “Injustice is relatively easy to bear; what stings is justice.” |
>   `\                                                 —Henry L. Mencken |
> _o__)                                                                  |
> Ben Finney

I should distinguish between modifications and additions. In my above
example, one line is modified/replaced, one line is added and one line
is deleted. There are a total of 3 edits. I am looking for an
alternative python library other than difflib that minimizes this
number (edit distance).
From: Bryan on
GZ wrote:
> I should distinguish between modifications and additions. In my above
> example, one line is modified/replaced, one line is added and one line
> is deleted. There are a total of 3 edits. I am looking for an
> alternative python library other than difflib that minimizes this
> number (edit distance).

Since you know the lingo, "edit distance", I expect you've already
Googled up the general results. Depending on what kinds of edits we
count, the problem of finding a minimal diff can be low-order poly-
time, NP-hard, or incomputable.

Gonzalo Navarro's 2001 survey, "A Guided Tour to Approximate String
Matching", is now available on-line for free:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.7225&rep=rep1&type=pdf
The paper notes, "issues have been put aside to keep a reasonable
scope," but presents more than most of us ever wanted to know about
the subject.

I don't know of a module that does what GZ asks. There are scores of
line-oriented diff implementations, and there are minimal-edit-
distance diffs, but the combination is rare at best. Problem domains
that call for a true minimal diff tend not to work in units of lines
of text.

--
--Bryan