From: Janis Papanagnou on
Atropo schrieb:
> On Aug 4, 11:06 am, Janis Papanagnou <janis_papanag...(a)hotmail.com>
> wrote:
>> Atropo schrieb:
>>
>>> Hi all.
>>> I got this file from sql output
>>> dates_out.txt:
>>> 26-03-2009 P
>>> 26-03-2009 B
>>> 26-03-2009 R
>>> 26-03-2009 L
>>> 26-03-2009 O
>>> i would like to compare if there is differences in the first field
>>> only. they were dates but now the're only chars. i'm not quite
>>> sure about the uniq -f 2. it shows 26-03-2009 P
>>> if any is different then send a mail
>> You mean if there is more than one date in column 1 you want
>> some action?
> exactly
>
>> The following program prints the different types of values in
>> field 1...
>>
>> awk '{a[$1]}END{print length(a)}' dates_out.txt
>>
> /usr/xpg4/bin/awk '{a[$1]}END{print length(a)}' dates_out.txt
> the output is 0.

My bad. Length on arrays may not be supported by standard awk.
Either retry with GNU awk, or change the program to...

awk '{a[$1]}END{for (i in a) c++; print c}' dates_out.txt

>
> but first, why length(a), how could this evaluate differences?

The array a will carry for every distinct value $1 one entry.
In gawk you get the size of the array a, i.e. the number of
elements in a, by length(a) .

>
>> If you set the exit-code appropriately you can trigger sending
>> a mail or whatever...
> set the exit-code appropriately --- not sure what you mean.

By using the exit(x) function in the awk program instead of
printing the value of distinct entries. With above changes
for standard awk you'll get (for example)...

awk '{a[$1]}END{for (i in a) c++; exit(c>1)}' dates_out.txt ||
{ echo Failed | mail -s Huhu you(a)org ;}

Hope that helps.

Janis

>
>> awk '{a[$1]}END{exit(length(a)!=1)}' dates_out.txt ||
>> { echo Failed | mail -s Huhu you(a)org ;}
>>
>> Janis
>
> I really appreciate your effort to help me janis, but i may be on a
> little lower place of knowlegde. I'm RTFM. but i'm not there yet
>
From: Atropo on
On Aug 4, 11:33 am, Janis Papanagnou <janis_papanag...(a)hotmail.com>
wrote:
> Atropo schrieb:
>
>
>
> > On Aug 4, 11:06 am, Janis Papanagnou <janis_papanag...(a)hotmail.com>
> > wrote:
> >> Atropo schrieb:
>
> >>> Hi all.
> >>> I got this file from sql output
> >>> dates_out.txt:
> >>> 26-03-2009 P
> >>> 26-03-2009 B
> >>> 26-03-2009 R
> >>> 26-03-2009 L
> >>> 26-03-2009 O
> >>> i would like to compare if there is differences in the first field
> >>> only.   they were dates but now the're only chars.   i'm not quite
> >>> sure about the uniq -f 2.   it shows 26-03-2009 P
> >>> if any is different then send a mail
> >> You mean if there is more than one date in column 1 you want
> >> some action?
> > exactly
>
> >> The following program prints the different types of values in
> >> field 1...
>
> >>    awk '{a[$1]}END{print length(a)}' dates_out.txt
>
> > /usr/xpg4/bin/awk '{a[$1]}END{print length(a)}' dates_out.txt
> > the output is 0.
>
> My bad. Length on arrays may not be supported by standard awk.
> Either retry with GNU awk, or change the program to...
>
>    awk '{a[$1]}END{for (i in a) c++; print c}' dates_out.txt
>
>
>
> > but first, why length(a),   how could this evaluate differences?
>
> The array a will carry for every distinct value $1 one entry.
> In gawk you get the size of the array a, i.e. the number of
> elements in a, by  length(a) .
>
>
>
> >> If you set the exit-code appropriately you can trigger sending
> >> a mail or whatever...
> > set the exit-code appropriately --- not sure what you mean.
>
> By using the exit(x) function in the awk program instead of
> printing the value of distinct entries. With above changes
> for standard awk you'll get (for example)...
>
>    awk '{a[$1]}END{for (i in a) c++; exit(c>1)}' dates_out.txt ||
>      { echo Failed | mail -s Huhu you(a)org ;}
>
> Hope that helps.
>
> Janis
>
>
>
> >>    awk '{a[$1]}END{exit(length(a)!=1)}' dates_out.txt ||
> >>      { echo Failed | mail -s Huhu you(a)org ;}
>
> >> Janis
>
> > I really appreciate your effort to help me janis, but i may be on a
> > little lower place of knowlegde.  I'm RTFM. but i'm not there yet



really I admire you. the way you explain is easy to follow.
it works great.
Thanks
From: thdyoung on
I too found this instructive so thank you.

The extra illumination I'd like is this: how does the array 'know'
only accept distinct values. Why doesn't each line w the same date in
$1 simply add to the array membership ?

'a[$1]' says: the first field of ea line of input will go into the
array: what about it prohibits another identical member i.e {1,1,1,1}
is a perfectly valid set, isn't it ?

The contents of the array aren't necessarily lost when the next line
of input is processed (the persistence of the array contents is just
a given I guess tho' it's slightly puzzling since I don't understand
awk's ability to remember).

Tom

> > The array a will carry for every distinct value $1 one entry.
> > In gawk you get the size of the array a, i.e. the number of
> > elements in a, by  length(a) .

> >    awk '{a[$1]}END{for (i in a) c++; exit(c>1)}' dates_out.txt ||
> >      { echo Failed | mail -s Huhu you(a)org ;}

From: Janis Papanagnou on
[top-posting fixed below]

On 04/08/10 21:38, thdyoung(a)googlemail.com wrote:
>>> The array a will carry for every distinct value $1 one entry.
>>> In gawk you get the size of the array a, i.e. the number of
>>> elements in a, by length(a) .
>
>>> awk '{a[$1]}END{for (i in a) c++; exit(c>1)}' dates_out.txt ||
>>> { echo Failed | mail -s Huhu you(a)org ;}
>

> I too found this instructive so thank you.
>
> The extra illumination I'd like is this: how does the array 'know'
> only accept distinct values. Why doesn't each line w the same date in
> $1 simply add to the array membership ?

An awk associative array has exactly one key and (optionally) one
value.[*]

Actually, you can model the concepts of a set and of a map; consider
these definitions...

set[key]

map[key] = value

The first one will create an array node, accessible using 'key', but
not assigning any value. This array named 'set' represents the concept
of a set. It does *not* represent the concept of a multi-set, because
only one key is possible.

The second one will create an array node, accessible using 'key', and
it will assign a value. This array named 'map' represents the concept
of a map. It does *not* represent the concept of a multi-map, because
only one value is possible to assign.

>
> 'a[$1]' says: the first field of ea line of input will go into the
> array: what about it prohibits another identical member i.e {1,1,1,1}
> is a perfectly valid set, isn't it ?

The input will *not* "go into the array". We could say that the _index_
will go into the array. If there's already an array element created by
this index (or key), not other one will be created. You can access only
one element by one key.

>
> The contents of the array aren't necessarily lost when the next line
> of input is processed (the persistence of the array contents is just
> a given I guess tho' it's slightly puzzling since I don't understand
> awk's ability to remember).

There's no "second incarnation" of an array key or of an array element.

Janis

[*] Multiple "dimensions" of arrays are implicitly mapped to a single one;
i.e. the keys x, y, and z in arr[x,y,z] are, actually, just a single key
that is the concatenated result of x and SUBSEP and y and SUBSEP and z.

>
> Tom
>
From: Ed Morton on
On 8/4/2010 2:38 PM, thdyoung(a)googlemail.com wrote:
> I too found this instructive so thank you.
>
> The extra illumination I'd like is this: how does the array 'know'
> only accept distinct values. Why doesn't each line w the same date in
> $1 simply add to the array membership ?
>
> 'a[$1]' says: the first field of ea line of input will go into the
> array: what about it prohibits another identical member i.e {1,1,1,1}
> is a perfectly valid set, isn't it ?

See http://en.wikipedia.org/wiki/Associative_array.

Ed.

>
> The contents of the array aren't necessarily lost when the next line
> of input is processed (the persistence of the array contents is just
> a given I guess tho' it's slightly puzzling since I don't understand
> awk's ability to remember).
>
> Tom
>
>>> The array a will carry for every distinct value $1 one entry.
>>> In gawk you get the size of the array a, i.e. the number of
>>> elements in a, by length(a) .
>
>>> awk '{a[$1]}END{for (i in a) c++; exit(c>1)}' dates_out.txt ||
>>> { echo Failed | mail -s Huhu you(a)org ;}
>