From: marty.mcgowan on
On Apr 3, 1:11 pm, Eric <e...(a)deptj.eu> wrote:
> On 2010-04-03, Ed Morton <mortons...(a)gmail.com> wrote:
>
>
>
> > On 4/2/2010 9:25 AM, Thomas 'PointedEars' Lahn wrote:
> >> Hongyi Zhao wrote:
> >>I use the following code to obtain the lines existing file2 but not in file1,
>
> >>> Ed Morton wrote:
>
> >>>> awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2>  tmp&&  mv tmp file2
> ><snip>
>
> >> I would use diff | grep anyway.  RTFM.
>
> > I'm curious - what would that solution look like given the input files below?
>
> > $ cat file1
> > a
> > c
> > $ cat file2
> > c
> > a
> > b
> > $ awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2
> > b
>
> > Regards,
>
> >      Ed.
>
> I've just reread the original question, and I guess it depends on why
> you are doing it, but I would definitely consider
>
> sort file1 > file1s
> sort file2 > file2s
> comm -13 file1s file2s
>
> comm needs the files to be sorted, but maybe they are anyway.
>
> Eric

sometimes we can afford to "overwrite" the initial file. see below
for source

overwrite file1 sort file1
overwrite file2 sort file2

-----------
this is from Kernighan and Pike's "Unix Programming Environment" '84
=============================
# overwrite: copy standard input to output after EOF

opath=$PATH
PATH=/bin:/usr/bin

case $# in
0|1) echo 'Usage: overwrite file cmd [args]' 1>&2; exit 2
esac

file=$1; shift
new=/tmp/overwr1.$$; old=/tmp/overwr2.$$
trap 'rm -f $new $old; exit 1' 1 2 15 # clean up

if PATH=$opath "$@" >$new
then
cp $file $old # save original
trap '' 1 2 15 # wr are commmitted
cp $new $file
else
echo "overwrite: $1 failed, $file unchanged" 1>&2
exit 1
fi
rm -f $new $old
=============================
enjoy

-=+-- Marty
From: Ed Morton on
On 4/3/2010 12:11 PM, Eric wrote:
> On 2010-04-03, Ed Morton<mortonspam(a)gmail.com> wrote:
>> On 4/2/2010 9:25 AM, Thomas 'PointedEars' Lahn wrote:
>>> Hongyi Zhao wrote:
>>> I use the following code to obtain the lines existing file2 but not in file1,
>>>
>>>> Ed Morton wrote:
>>>>>
>>>>> awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2> tmp&& mv tmp file2
>> <snip>
>>>
>>> I would use diff | grep anyway. RTFM.
>>>
>>
>> I'm curious - what would that solution look like given the input files below?
>>
>> $ cat file1
>> a
>> c
>> $ cat file2
>> c
>> a
>> b
>> $ awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2
>> b
>>
>> Regards,
>>
>> Ed.
>
> I've just reread the original question, and I guess it depends on why
> you are doing it, but I would definitely consider
>
> sort file1> file1s
> sort file2> file2s
> comm -13 file1s file2s
>
> comm needs the files to be sorted, but maybe they are anyway.

I'd consider that too.

I'd still like to see what the diff | grep solution looks like though as the way
I THINK you'd have to implement that would be fairly unpleasant but I might be
overlooking a clean approach.

Ed.

From: Thomas 'PointedEars' Lahn on
Ed Morton wrote:

[Quotation fixed]

> Thomas 'PointedEars' Lahn wrote:
>>> Ed Morton wrote:
>>>>> Hongyi Zhao wrote:
>>>>> I use the following code to obtain the lines existing file2 but
>>>>> not in file1,
>>>> [...]
>>>> awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2 > tmp && mv tmp file2
>> [...]
>> I would use diff | grep anyway. RTFM.
>
> I'm curious - what would that solution look like given the input files
> below?
>
> $ cat file1
> a
> c
> $ cat file2
> c
> a
> b
> $ awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2
> b

I did not understand the OP that they meant any line (without regard to
order) and did not run the awk-based proposal, so I was thinking of

diff -u --suppress-common-lines file1 file2 | grep ^+ | sed '1d; s/^+//'

where grep(1) would be superflous, indeed:

diff -u --suppress-common-lines file1 file2 |
sed -n '2d; /^+/ s/^+// p'

However, if order does not matter, this can be modified in bash(1) to

diff -u --suppress-common-lines <(sort file1) <(sort file2) |
sed -n '2d; /^+/ s/^+// p'

(Remove --suppress-common-lines for non-GNU diffs, respectively.)

It is quite possible that I have overlooked a solution that uses only
non-POSIX diff(1).


PointedEars
From: Ed Morton on
On 4/4/2010 6:39 PM, Thomas 'PointedEars' Lahn wrote:
> Ed Morton wrote:
>
> [Quotation fixed]
>
>> Thomas 'PointedEars' Lahn wrote:
>>>> Ed Morton wrote:
>>>>>> Hongyi Zhao wrote:
>>>>>> I use the following code to obtain the lines existing file2 but
>>>>>> not in file1,
>>>>> [...]
>>>>> awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2> tmp&& mv tmp file2
>>> [...]
>>> I would use diff | grep anyway. RTFM.
>>
>> I'm curious - what would that solution look like given the input files
>> below?
>>
>> $ cat file1
>> a
>> c
>> $ cat file2
>> c
>> a
>> b
>> $ awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2
>> b
>
> I did not understand the OP that they meant any line (without regard to
> order) and did not run the awk-based proposal, so I was thinking of
>
> diff -u --suppress-common-lines file1 file2 | grep ^+ | sed '1d; s/^+//'
>
> where grep(1) would be superflous, indeed:
>
> diff -u --suppress-common-lines file1 file2 |
> sed -n '2d; /^+/ s/^+// p'
>
> However, if order does not matter, this can be modified in bash(1) to
>
> diff -u --suppress-common-lines<(sort file1)<(sort file2) |
> sed -n '2d; /^+/ s/^+// p'
>
> (Remove --suppress-common-lines for non-GNU diffs, respectively.)
>
> It is quite possible that I have overlooked a solution that uses only
> non-POSIX diff(1).

--suppress-common-lines apparently already relies on using non-POSIX diff (see
the POSIX diff man page at
http://www.opengroup.org/onlinepubs/009695399/utilities/diff.html). Maybe it's
GNU diff?

In any case, I've never heard of that option before so thanks for the tip.

Ed.
From: Thomas 'PointedEars' Lahn on
Ed Morton wrote:

> Thomas 'PointedEars' Lahn wrote:
>> Ed Morton wrote:
>>> Thomas 'PointedEars' Lahn wrote:
>>>>> Ed Morton wrote:
>>>>>>> Hongyi Zhao wrote:
>>>>>>> I use the following code to obtain the lines existing file2 but
>>>>>>> not in file1,
>>>>>> [...]
>>>>>> awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2> tmp&& mv tmp file2
>>>> [...]
>>>> I would use diff | grep anyway. RTFM.
>>>
>>> I'm curious - what would that solution look like given the input files
>>> below?
>>>
>>> $ cat file1
>>> a
>>> c
>>> $ cat file2
>>> c
>>> a
>>> b
>>> $ awk 'NR==FNR{a[$0]++;next} !a[$0]' file1 file2
>>> b
>>
>> I did not understand the OP that they meant any line (without regard to
>> order) and did not run the awk-based proposal, so I was thinking of
>>
>> diff -u --suppress-common-lines file1 file2 | grep ^+ | sed '1d;
>> s/^+//'
>>
>> where grep(1) would be superflous, indeed:
>>
>> diff -u --suppress-common-lines file1 file2 |
>> sed -n '2d; /^+/ s/^+// p'
>>
>> However, if order does not matter, this can be modified in bash(1) to
>>
>> diff -u --suppress-common-lines<(sort file1)<(sort file2) |
>> sed -n '2d; /^+/ s/^+// p'
>>
>> (Remove --suppress-common-lines for non-GNU diffs, respectively.)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> It is quite possible that I have overlooked a solution that uses only
>> non-POSIX diff(1).
>
> --suppress-common-lines apparently already relies on using non-POSIX diff

It is not a requirement to use that option for the solution to work, but it
makes things easier for sed(1).

> (see the POSIX diff man page at
> http://www.opengroup.org/onlinepubs/009695399/utilities/diff.html). Maybe
> it's GNU diff?

See above.

> In any case, I've never heard of that option before so thanks for the
> tip.

You're welcome.


PointedEars