From: rick on
Hi All,

I'm looking to put something together that would compare and integrate
columns of data from two files. In one file I have load average that I
got from sar and prettied up.
from file: load_avg
date time 1min 5min 15min
2006-11-07 10:00:01 0.02 0.02 0.00
2006-11-07 10:10:01 0.03 0.02 0.00
2006-11-07 10:20:01 0.01 0.02 0.00
2006-11-07 10:30:01 0.01 0.00 0.00
2006-11-07 10:40:01 0.00 0.00 0.00
2006-11-07 10:50:01 0.02 0.03 0.00
2006-11-07 11:00:01 0.07 0.06 0.01
2006-11-07 11:10:01 0.05 0.05 0.00
2006-11-07 11:20:01 0.01 0.04 0.00
2006-11-07 11:30:01 0.02 0.04 0.00
2006-11-07 11:40:01 0.24 0.06 0.02
2006-11-07 11:50:01 0.06 0.04 0.00

The other file is queries per second from apache logs.
from file: hits_per_second
date time qps
2006-11-07 10:59:36 2
2006-11-07 10:59:37 1
2006-11-07 10:59:38 1
2006-11-07 10:59:40 1
2006-11-07 10:59:41 1
2006-11-07 10:59:43 1
2006-11-07 10:59:44 1
2006-11-07 10:59:45 1
2006-11-07 10:59:46 1
2006-11-07 10:59:47 1
2006-11-07 10:59:49 1
2006-11-07 10:59:50 1
2006-11-07 10:59:51 2
2006-11-07 10:59:52 1
2006-11-07 10:59:53 1
2006-11-07 10:59:54 2
2006-11-07 11:00:40 2
2006-11-07 11:00:41 3
2006-11-07 11:00:43 1
2006-11-07 11:00:44 3
2006-11-07 11:00:45 2
2006-11-07 11:00:46 4
2006-11-07 11:00:48 1
2006-11-07 11:00:49 2
2006-11-07 11:00:50 4

I'd like to find a way (my attempts have been with awk) to get the load
averages added on after the qps column. Since my load avg is only done
every 10 minutes I want to put the load average for 10:50 tacked on to
any qps result in the 10:50:00-10:59:59 range. Example of what I'd
like to see.
(since 0.03 0.00 0.00 is the load avg from 10:50 & 0.07 0.06 0.01 is
the load avg from 11:00)
2006-11-07 10:59:47 1 0.03 0.00 0.00
2006-11-07 10:59:49 1 0.02 0.03 0.00
2006-11-07 10:59:50 1 0.02 0.03 0.00
2006-11-07 10:59:51 2 0.02 0.03 0.00
2006-11-07 10:59:52 1 0.02 0.03 0.00
2006-11-07 10:59:53 1 0.02 0.03 0.00
2006-11-07 10:59:54 2 0.02 0.03 0.00
2006-11-07 11:00:40 2 0.07 0.06 0.01
2006-11-07 11:00:41 3 0.07 0.06 0.01
2006-11-07 11:00:43 1 0.07 0.06 0.01
2006-11-07 11:00:44 3 0.07 0.06 0.01
2006-11-07 11:00:45 2 0.07 0.06 0.01
2006-11-07 11:00:46 4 0.07 0.06 0.01

I've tried several variations on the following, but I have no idea how
to do the range matching.
gawk 'NR==FNR{b[$2]=$3;next}{print $0 OFS b[$4;}' hits_per_second
load_avg

Ultimately I'm looking to get all of this data put into a graph where I
can compare the queries per second on the web server to the load
average on the server (each of the 3 load average columns will have a
peak and the qps will have a peak). I still have no idea how I'm going
to pull that off.

Any thoughts on this would be a great help.

From: Michael Heiming on
In comp.unix.shell rick <devrick88(a)gmail.com>:
> Hi All,

> I'm looking to put something together that would compare and integrate
> columns of data from two files. In one file I have load average that I
> got from sar and prettied up.
> from file: load_avg
> date time 1min 5min 15min
> 2006-11-07 10:00:01 0.02 0.02 0.00
> 2006-11-07 10:10:01 0.03 0.02 0.00
> 2006-11-07 10:20:01 0.01 0.02 0.00
> 2006-11-07 10:30:01 0.01 0.00 0.00
> 2006-11-07 10:40:01 0.00 0.00 0.00
[..]
> The other file is queries per second from apache logs.
> from file: hits_per_second
> date time qps
> 2006-11-07 10:59:36 2
> 2006-11-07 10:59:37 1
> 2006-11-07 10:59:38 1
> 2006-11-07 10:59:40 1
> 2006-11-07 10:59:41 1

> the load avg from 11:00)
> 2006-11-07 10:59:47 1 0.03 0.00 0.00
> 2006-11-07 10:59:49 1 0.02 0.03 0.00
> 2006-11-07 10:59:50 1 0.02 0.03 0.00

man join

Good luck

btw
Your example data doesn't match at all?

--
Michael Heiming (X-PGP-Sig > GPG-Key ID: EDD27B94)
mail: echo zvpunry(a)urvzvat.qr | perl -pe 'y/a-z/n-za-m/'
#bofh excuse 114: electro-magnetic pulses from French above
ground nuke testing.
From: Stephane CHAZELAS on
2006-11-8, 14:13(-08), rick:
> Hi All,
>
> I'm looking to put something together that would compare and integrate
> columns of data from two files. In one file I have load average that I
> got from sar and prettied up.
> from file: load_avg
> date time 1min 5min 15min
[...]
> 2006-11-07 10:50:01 0.02 0.03 0.00
> 2006-11-07 11:00:01 0.07 0.06 0.01
[...]
>
> The other file is queries per second from apache logs.
> from file: hits_per_second
> date time qps
> 2006-11-07 10:59:36 2
> 2006-11-07 10:59:37 1
[...]
> I'd like to find a way (my attempts have been with awk) to get the load
> averages added on after the qps column. Since my load avg is only done
> every 10 minutes I want to put the load average for 10:50 tacked on to
> any qps result in the 10:50:00-10:59:59 range. Example of what I'd
> like to see.
> (since 0.03 0.00 0.00 is the load avg from 10:50 & 0.07 0.06 0.01 is
> the load avg from 11:00)
> 2006-11-07 10:59:47 1 0.03 0.00 0.00
> 2006-11-07 10:59:49 1 0.02 0.03 0.00
[...]

If your shell supports process substitution (zsh, bash, some
kshs):

join -t, <(sed 's/:./&,/' < hits_per_second) \
<(sed 's/\(:.\)[^ ]*/\1,/' < load_avg) | tr -d ,


--
St?phane
From: Ed Morton on
rick wrote:

> Hi All,
>
> I'm looking to put something together that would compare and integrate
> columns of data from two files. In one file I have load average that I
> got from sar and prettied up.
> from file: load_avg
> date time 1min 5min 15min
> 2006-11-07 10:00:01 0.02 0.02 0.00
> 2006-11-07 10:10:01 0.03 0.02 0.00
> 2006-11-07 10:20:01 0.01 0.02 0.00
> 2006-11-07 10:30:01 0.01 0.00 0.00
> 2006-11-07 10:40:01 0.00 0.00 0.00
> 2006-11-07 10:50:01 0.02 0.03 0.00
> 2006-11-07 11:00:01 0.07 0.06 0.01
> 2006-11-07 11:10:01 0.05 0.05 0.00
> 2006-11-07 11:20:01 0.01 0.04 0.00
> 2006-11-07 11:30:01 0.02 0.04 0.00
> 2006-11-07 11:40:01 0.24 0.06 0.02
> 2006-11-07 11:50:01 0.06 0.04 0.00
>
> The other file is queries per second from apache logs.
> from file: hits_per_second
> date time qps
> 2006-11-07 10:59:36 2
> 2006-11-07 10:59:37 1
> 2006-11-07 10:59:38 1
> 2006-11-07 10:59:40 1
> 2006-11-07 10:59:41 1
> 2006-11-07 10:59:43 1
> 2006-11-07 10:59:44 1
> 2006-11-07 10:59:45 1
> 2006-11-07 10:59:46 1
> 2006-11-07 10:59:47 1
> 2006-11-07 10:59:49 1
> 2006-11-07 10:59:50 1
> 2006-11-07 10:59:51 2
> 2006-11-07 10:59:52 1
> 2006-11-07 10:59:53 1
> 2006-11-07 10:59:54 2
> 2006-11-07 11:00:40 2
> 2006-11-07 11:00:41 3
> 2006-11-07 11:00:43 1
> 2006-11-07 11:00:44 3
> 2006-11-07 11:00:45 2
> 2006-11-07 11:00:46 4
> 2006-11-07 11:00:48 1
> 2006-11-07 11:00:49 2
> 2006-11-07 11:00:50 4
>
> I'd like to find a way (my attempts have been with awk) to get the load
> averages added on after the qps column. Since my load avg is only done
> every 10 minutes I want to put the load average for 10:50 tacked on to
> any qps result in the 10:50:00-10:59:59 range. Example of what I'd
> like to see.
> (since 0.03 0.00 0.00 is the load avg from 10:50 & 0.07 0.06 0.01 is
> the load avg from 11:00)
> 2006-11-07 10:59:47 1 0.03 0.00 0.00

I think you made a couple of mistakes in the 3 lines above.

> 2006-11-07 10:59:49 1 0.02 0.03 0.00
> 2006-11-07 10:59:50 1 0.02 0.03 0.00
> 2006-11-07 10:59:51 2 0.02 0.03 0.00
> 2006-11-07 10:59:52 1 0.02 0.03 0.00
> 2006-11-07 10:59:53 1 0.02 0.03 0.00
> 2006-11-07 10:59:54 2 0.02 0.03 0.00
> 2006-11-07 11:00:40 2 0.07 0.06 0.01
> 2006-11-07 11:00:41 3 0.07 0.06 0.01
> 2006-11-07 11:00:43 1 0.07 0.06 0.01
> 2006-11-07 11:00:44 3 0.07 0.06 0.01
> 2006-11-07 11:00:45 2 0.07 0.06 0.01
> 2006-11-07 11:00:46 4 0.07 0.06 0.01
>
> I've tried several variations on the following, but I have no idea how
> to do the range matching.
> gawk 'NR==FNR{b[$2]=$3;next}{print $0 OFS b[$4;}' hits_per_second
> load_avg
>
> Ultimately I'm looking to get all of this data put into a graph where I
> can compare the queries per second on the web server to the load
> average on the server (each of the 3 load average columns will have a
> peak and the qps will have a peak). I still have no idea how I'm going
> to pull that off.
>
> Any thoughts on this would be a great help.
>

Just strip the non-signficant parts of the time stamp and use that as a
key to populate the averages from the first file, then to access them
for the second, e.g.:

awk '{t=$1$2; sub(/[0-9]:[^:]*$/,"",t)}
NR==FNR{avg[t]=$3" "$4" "$5;next}
{print $0,avg[t]}' load_avg hits_per_second

Regards,

Ed.
From: Ed Morton on
Ed Morton wrote:

> rick wrote:
<snip>
>> Ultimately I'm looking to get all of this data put into a graph where I
>> can compare the queries per second on the web server to the load
>> average on the server (each of the 3 load average columns will have a
>> peak and the qps will have a peak). I still have no idea how I'm going
>> to pull that off.
>> Any thoughts on this would be a great help.
>>
>
> Just strip the non-signficant parts of the time stamp and use that as a
> key to populate the averages from the first file, then to access them
> for the second, e.g.:
>
> awk '{t=$1$2; sub(/[0-9]:[^:]*$/,"",t)}
> NR==FNR{avg[t]=$3" "$4" "$5;next}
> {print $0,avg[t]}' load_avg hits_per_second
>
> Regards,
>
> Ed.

Oh, and for the graph, you could use "gnuplot" which has a home page and
a newsgroup and many gnuplot applications use awk for data processing.
Google...

Ed.
 |  Next  |  Last
Pages: 1 2
Prev: Finding end of file
Next: Ambiguous output redirect