From: db on
Hi, I would like to get correlation coefficient that calculates obs
1~2, obs 1~3, obs 1~4, obs 1~5, obs 1~6 ... etc

Thanks, db


data input
input date :mmddyy10. time :time8.0 stock1 stock2;
format date:mmddyy10. time :time8.0;
cards;
01/04/2010 7:00:00 5989.5 1115.25
01/04/2010 7:01:00 5998.5 1116
01/04/2010 7:01:00 5998.5 1116
01/04/2010 7:02:00 5996.5 1115.75
01/04/2010 7:05:00 6000 1115.5
01/04/2010 20:55:00 6042 1128.5
01/04/2010 20:56:00 6040.5 1128.5
01/04/2010 20:57:00 6040.5 1128.75
01/04/2010 20:58:00 6040.5 1128.5
01/04/2010 20:59:00 6040.5 1128.5
01/04/2010 21:00:00 6036 1128.75
01/05/2010 7:00:00 6035.5 1128.5
01/05/2010 7:01:00 6037 1128.5
01/05/2010 7:02:00 6037 1128.5
01/05/2010 7:03:00 6038 1128.5
01/05/2010 7:06:00 6038.5 1128.25
01/05/2010 20:56:00 6042 1131.5
01/05/2010 20:57:00 6042.5 1132
01/05/2010 20:58:00 6042 1132
01/05/2010 20:59:00 6041.5 1131.75
01/05/2010 21:00:00 6041.5 1132.25
;
run;


output
date time stock1 stock2 R (by date)
01/04/2010 7:00:00 5989.5 1115.25
01/04/2010 7:01:00 5998.5 1116 .9999 ( R based on obs 1~2 )
01/04/2010 7:01:00 5998.5 1116 .9999 (R based on obs 1~3)
01/04/2010 7:02:00 5996.5 1115.75 .9999 (R based on obs 1~4)
01/04/2010 7:05:00 6000 1115.5 .9999 (R based on obs 1~5)
01/04/2010 20:55:00 6042 1128.5 .9999 (R based on obas 1~6)
01/04/2010 20:56:00 6040.5 1128.5 .9999 (R based on obas 1~7)
01/04/2010 20:57:00 6040.5 1128.75 .9999 (R based on obas 1~8)
01/04/2010 20:58:00 6040.5 1128.5 .9999 (R based on obas 1~9)
01/04/2010 20:59:00 6040.5 1128.5 .9999 (R based on obas 1~10)
01/04/2010 21:00:00 6036 1128.75 .9999 (R based on obs 1~ 11)
01/05/2010 7:00:00 6035.5 1128.5
01/05/2010 7:01:00 6037 1128.5 .9999 (R based on obs 12 ~13)
01/05/2010 7:02:00 6037 1128.5 .9999 ( R based on obs 12
~14)
01/05/2010 7:03:00 6038 1128.5 .9999 ( R based on obs 12
~15)
01/05/2010 7:06:00 6038.5 1128.25 .9999 ( R based on obs 12 ~16)
01/05/2010 20:56:00 6042 1131.5 .9999 ( R based on obs 12
~17)
01/05/2010 20:57:00 6042.5 1132 .9999 ( R based on obs 12 ~18)
01/05/2010 20:58:00 6042 1132 .9999 ( R based on obs
12 ~19)
01/05/2010 20:59:00 6041.5 1131.75 .9999 ( R based on obs 12 ~20)
01/05/2010 21:00:00 6041.5 1132.25 .9999 ( R based on obs 12 ~21)
From: Ya on
On Jun 14, 1:38 pm, db <daronnebonn...(a)gmail.com> wrote:
> Hi, I would like to get correlation coefficient that calculates obs
> 1~2, obs 1~3, obs 1~4, obs 1~5, obs 1~6 ... etc
>
> Thanks, db
>
> data input
> input date :mmddyy10.  time :time8.0  stock1 stock2;
> format date:mmddyy10.  time :time8.0;
> cards;
> 01/04/2010      7:00:00 5989.5  1115.25
> 01/04/2010      7:01:00 5998.5  1116
> 01/04/2010      7:01:00 5998.5  1116
> 01/04/2010      7:02:00 5996.5  1115.75
> 01/04/2010      7:05:00 6000    1115.5
> 01/04/2010      20:55:00        6042    1128.5
> 01/04/2010      20:56:00        6040.5  1128.5
> 01/04/2010      20:57:00        6040.5  1128.75
> 01/04/2010      20:58:00        6040.5  1128.5
> 01/04/2010      20:59:00        6040.5  1128.5
> 01/04/2010      21:00:00        6036    1128.75
> 01/05/2010      7:00:00 6035.5  1128.5
> 01/05/2010      7:01:00 6037    1128.5
> 01/05/2010      7:02:00 6037    1128.5
> 01/05/2010      7:03:00 6038    1128.5
> 01/05/2010      7:06:00 6038.5  1128.25
> 01/05/2010      20:56:00        6042    1131.5
> 01/05/2010      20:57:00        6042.5  1132
> 01/05/2010      20:58:00        6042    1132
> 01/05/2010      20:59:00        6041.5  1131.75
> 01/05/2010      21:00:00        6041.5  1132.25
> ;
> run;
>
> output
> date time stock1 stock2 R (by date)
> 01/04/2010      7:00:00 5989.5  1115.25
> 01/04/2010      7:01:00 5998.5  1116      .9999 ( R based on obs 1~2 )
> 01/04/2010      7:01:00 5998.5  1116      .9999 (R based on obs 1~3)
> 01/04/2010      7:02:00 5996.5  1115.75  .9999 (R based on obs 1~4)
> 01/04/2010      7:05:00 6000    1115.5            .9999 (R based on obs 1~5)
> 01/04/2010      20:55:00        6042    1128.5            .9999 (R based on obas 1~6)
> 01/04/2010      20:56:00        6040.5  1128.5     .9999 (R based on obas 1~7)
> 01/04/2010      20:57:00        6040.5  1128.75    .9999 (R based on obas 1~8)
> 01/04/2010      20:58:00        6040.5  1128.5     .9999 (R based on obas 1~9)
> 01/04/2010      20:59:00        6040.5  1128.5     .9999 (R based on obas 1~10)
> 01/04/2010      21:00:00        6036    1128.75         .9999 (R based on obs 1~ 11)
> 01/05/2010      7:00:00 6035.5  1128.5
> 01/05/2010      7:01:00 6037    1128.5           .9999 (R based on obs 12 ~13)
> 01/05/2010      7:02:00 6037    1128.5           .9999 ( R based on obs 12
> ~14)
> 01/05/2010      7:03:00 6038    1128.5            .9999 ( R based on obs 12
> ~15)
> 01/05/2010      7:06:00 6038.5  1128.25   .9999 ( R based on obs 12 ~16)
> 01/05/2010      20:56:00        6042    1131.5              .9999 ( R based on obs 12
> ~17)
> 01/05/2010      20:57:00        6042.5  1132         .9999 ( R based on obs 12 ~18)
> 01/05/2010      20:58:00        6042    1132                 .9999 ( R based on obs
> 12 ~19)
> 01/05/2010      20:59:00        6041.5  1131.75     .9999 ( R based on obs 12 ~20)
> 01/05/2010      21:00:00        6041.5  1132.25     .9999 ( R based on obs 12 ~21)

You can use proc sql to construct the needed dataset first, which also
creates a group var for later "by processing":

data have1;
set have;
by date time;
if first.date then n_=0;
n_+1;
run;


proc sql;
create table have2 as
select a.date,a.stock1,a.stock2,'1 to '||put(b.n_,z2.) as grp
from have1 a, have1 b
where a.date=b.date and a.n_<=b.n_
order by a.date,grp,a.time
;
quit;

have2 looks like this:

Obs date stock1 stock2 grp

1 01/04/2010 5989.5 1115.25 1 to 01
2 01/04/2010 5989.5 1115.25 1 to 02
3 01/04/2010 5998.5 1116.00 1 to 02
4 01/04/2010 5989.5 1115.25 1 to 03
5 01/04/2010 5998.5 1116.00 1 to 03
6 01/04/2010 5998.5 1116.00 1 to 03
7 01/04/2010 5989.5 1115.25 1 to 04
8 01/04/2010 5998.5 1116.00 1 to 04
9 01/04/2010 5998.5 1116.00 1 to 04
10 01/04/2010 5996.5 1115.75 1 to 04
11 01/04/2010 5989.5 1115.25 1 to 05
12 01/04/2010 5998.5 1116.00 1 to 05
13 01/04/2010 5998.5 1116.00 1 to 05
14 01/04/2010 5996.5 1115.75 1 to 05
15 01/04/2010 6000.0 1115.50 1 to 05
16 01/04/2010 5989.5 1115.25 1 to 06
17 01/04/2010 5998.5 1116.00 1 to 06
18 01/04/2010 5998.5 1116.00 1 to 06
19 01/04/2010 5996.5 1115.75 1 to 06
20 01/04/2010 6000.0 1115.50 1 to 06
21 01/04/2010 6042.0 1128.50 1 to 06
.....

With have2, you can use proc corr and "by processing" to get what you
need:

ods listing close;
ods output PearsonCorr=need(where=(variable='stock2'));

proc corr data=have2;
by date grp;
var stock1 stock2;
run;

ods listing;

proc print label;
var date grp stock1;
label stock1='R';
run;

Obs date grp R

1 01/04/2010 1 to 01 .
2 01/04/2010 1 to 02 1.00000
3 01/04/2010 1 to 03 1.00000
4 01/04/2010 1 to 04 0.99313
5 01/04/2010 1 to 05 0.67305
6 01/04/2010 1 to 06 0.98630
7 01/04/2010 1 to 07 0.99139
8 01/04/2010 1 to 08 0.99308
9 01/04/2010 1 to 09 0.99411
10 01/04/2010 1 to 10 0.99474
11 01/04/2010 1 to 11 0.99258
12 01/05/2010 1 to 01 .
13 01/05/2010 1 to 02 .
14 01/05/2010 1 to 03 .
15 01/05/2010 1 to 04 .
16 01/05/2010 1 to 05 -0.63134
17 01/05/2010 1 to 06 0.85891
18 01/05/2010 1 to 07 0.91995
19 01/05/2010 1 to 08 0.93372
20 01/05/2010 1 to 09 0.93834
21 01/05/2010 1 to 10 0.93508

HTH

Ya
From: jwl on
On Jun 14, 3:26 pm, Ya <huang8...(a)gmail.com> wrote:
> On Jun 14, 1:38 pm, db <daronnebonn...(a)gmail.com> wrote:
>
>
>
>
>
> > Hi, I would like to get correlation coefficient that calculates obs
> > 1~2, obs 1~3, obs 1~4, obs 1~5, obs 1~6 ... etc
>
> > Thanks, db
>
> > data input
> > input date :mmddyy10.  time :time8.0  stock1 stock2;
> > format date:mmddyy10.  time :time8.0;
> > cards;
> > 01/04/2010      7:00:00 5989.5  1115.25
> > 01/04/2010      7:01:00 5998.5  1116
> > 01/04/2010      7:01:00 5998.5  1116
> > 01/04/2010      7:02:00 5996.5  1115.75
> > 01/04/2010      7:05:00 6000    1115.5
> > 01/04/2010      20:55:00        6042    1128.5
> > 01/04/2010      20:56:00        6040.5  1128.5
> > 01/04/2010      20:57:00        6040.5  1128.75
> > 01/04/2010      20:58:00        6040.5  1128.5
> > 01/04/2010      20:59:00        6040.5  1128.5
> > 01/04/2010      21:00:00        6036    1128.75
> > 01/05/2010      7:00:00 6035.5  1128.5
> > 01/05/2010      7:01:00 6037    1128.5
> > 01/05/2010      7:02:00 6037    1128.5
> > 01/05/2010      7:03:00 6038    1128.5
> > 01/05/2010      7:06:00 6038.5  1128.25
> > 01/05/2010      20:56:00        6042    1131.5
> > 01/05/2010      20:57:00        6042.5  1132
> > 01/05/2010      20:58:00        6042    1132
> > 01/05/2010      20:59:00        6041.5  1131.75
> > 01/05/2010      21:00:00        6041.5  1132.25
> > ;
> > run;
>
> > output
> > date time stock1 stock2 R (by date)
> > 01/04/2010      7:00:00 5989.5  1115.25
> > 01/04/2010      7:01:00 5998.5  1116      .9999 ( R based on obs 1~2 )
> > 01/04/2010      7:01:00 5998.5  1116      .9999 (R based on obs 1~3)
> > 01/04/2010      7:02:00 5996.5  1115.75  .9999 (R based on obs 1~4)
> > 01/04/2010      7:05:00 6000    1115.5            .9999 (R based on obs 1~5)
> > 01/04/2010      20:55:00        6042    1128.5            .9999 (R based on obas 1~6)
> > 01/04/2010      20:56:00        6040.5  1128.5     ..9999 (R based on obas 1~7)
> > 01/04/2010      20:57:00        6040.5  1128.75    ..9999 (R based on obas 1~8)
> > 01/04/2010      20:58:00        6040.5  1128.5     ..9999 (R based on obas 1~9)
> > 01/04/2010      20:59:00        6040.5  1128.5     ..9999 (R based on obas 1~10)
> > 01/04/2010      21:00:00        6036    1128.75         .9999 (R based on obs 1~ 11)
> > 01/05/2010      7:00:00 6035.5  1128.5
> > 01/05/2010      7:01:00 6037    1128.5           .9999 (R based on obs 12 ~13)
> > 01/05/2010      7:02:00 6037    1128.5           .9999 ( R based on obs 12
> > ~14)
> > 01/05/2010      7:03:00 6038    1128.5            .9999 ( R based on obs 12
> > ~15)
> > 01/05/2010      7:06:00 6038.5  1128.25   .9999 ( R based on obs 12 ~16)
> > 01/05/2010      20:56:00        6042    1131.5              .9999 ( R based on obs 12
> > ~17)
> > 01/05/2010      20:57:00        6042.5  1132         .9999 ( R based on obs 12 ~18)
> > 01/05/2010      20:58:00        6042    1132                 .9999 ( R based on obs
> > 12 ~19)
> > 01/05/2010      20:59:00        6041.5  1131.75     .9999 ( R based on obs 12 ~20)
> > 01/05/2010      21:00:00        6041.5  1132.25     .9999 ( R based on obs 12 ~21)
>
> You can use proc sql to construct the needed dataset first, which also
> creates a group var for later "by processing":
>
> data have1;
>  set have;
> by date time;
> if first.date then n_=0;
> n_+1;
> run;
>
> proc sql;
> create table have2 as
> select a.date,a.stock1,a.stock2,'1 to '||put(b.n_,z2.) as grp
> from have1 a, have1 b
> where a.date=b.date and a.n_<=b.n_
> order by a.date,grp,a.time
> ;
> quit;
>
> have2 looks like this:
>
> Obs          date    stock1     stock2      grp
>
>   1    01/04/2010    5989.5    1115.25    1 to 01
>   2    01/04/2010    5989.5    1115.25    1 to 02
>   3    01/04/2010    5998.5    1116.00    1 to 02
>   4    01/04/2010    5989.5    1115.25    1 to 03
>   5    01/04/2010    5998.5    1116.00    1 to 03
>   6    01/04/2010    5998.5    1116.00    1 to 03
>   7    01/04/2010    5989.5    1115.25    1 to 04
>   8    01/04/2010    5998.5    1116.00    1 to 04
>   9    01/04/2010    5998.5    1116.00    1 to 04
>  10    01/04/2010    5996.5    1115.75    1 to 04
>  11    01/04/2010    5989.5    1115.25    1 to 05
>  12    01/04/2010    5998.5    1116.00    1 to 05
>  13    01/04/2010    5998.5    1116.00    1 to 05
>  14    01/04/2010    5996.5    1115.75    1 to 05
>  15    01/04/2010    6000.0    1115.50    1 to 05
>  16    01/04/2010    5989.5    1115.25    1 to 06
>  17    01/04/2010    5998.5    1116.00    1 to 06
>  18    01/04/2010    5998.5    1116.00    1 to 06
>  19    01/04/2010    5996.5    1115.75    1 to 06
>  20    01/04/2010    6000.0    1115.50    1 to 06
>  21    01/04/2010    6042.0    1128.50    1 to 06
> ....
>
> With have2, you can use proc corr and "by processing" to get what you
> need:
>
> ods listing close;
> ods output PearsonCorr=need(where=(variable='stock2'));
>
> proc corr data=have2;
> by date grp;
> var stock1 stock2;
> run;
>
> ods listing;
>
> proc print label;
> var date grp stock1;
> label stock1='R';
> run;
>
>  Obs          date      grp             R
>
>    1    01/04/2010    1 to 01      .
>    2    01/04/2010    1 to 02     1.00000
>    3    01/04/2010    1 to 03     1.00000
>    4    01/04/2010    1 to 04     0.99313
>    5    01/04/2010    1 to 05     0.67305
>    6    01/04/2010    1 to 06     0.98630
>    7    01/04/2010    1 to 07     0.99139
>    8    01/04/2010    1 to 08     0.99308
>    9    01/04/2010    1 to 09     0.99411
>   10    01/04/2010    1 to 10     0.99474
>   11    01/04/2010    1 to 11     0.99258
>   12    01/05/2010    1 to 01      .
>   13    01/05/2010    1 to 02      .
>   14    01/05/2010    1 to 03      .
>   15    01/05/2010    1 to 04      .
>   16    01/05/2010    1 to 05    -0.63134
>   17    01/05/2010    1 to 06     0.85891
>   18    01/05/2010    1 to 07     0.91995
>   19    01/05/2010    1 to 08     0.93372
>   20    01/05/2010    1 to 09     0.93834
>   21    01/05/2010    1 to 10     0.93508
>
> HTH
>
> Ya

Hi Ya,

Thank you so much for your logic! your logic works really good.
I need to run this logic on the dataset with about 1 million
observation. Is there a way to omit proc sql processing to calculate
cumulative correlation coefficient ?

Thanks, db