From: lisiqi77 on
Hey,

I have a dataset with 5 variables like the following:

code year month bench_mo x;

a 1994 1 12 0.21
a 1994 2 12 0.22
a 1994 3 12 0.26
...
a 1994 12 12 0.20
a 1995 1 12 0.11
...
a 1995 12 12 0.20
...
b 1994 1 8 0.20
b 1994 2 8 0.32
...

In other words, for each code, I have x data for every month of
yr1994-2005. And bench_month differs for each code.

My goal is to calculate the variance of x during a year with its
yearend defined as 10 months after the bench_month. For example, for
code a with bench_month equal 12, I'd like to know the variance of x
from 1994 month 11 to 1995 month 10, etc.

Please help if you have any good idea...

Thanks,
Tracy

From: Dale McLerran on
--- lisiqi77(a)YAHOO.COM wrote:

> Hey,
>
> I have a dataset with 5 variables like the following:
>
> code year month bench_mo x;
>
> a 1994 1 12 0.21
> a 1994 2 12 0.22
> a 1994 3 12 0.26
> ...
> a 1994 12 12 0.20
> a 1995 1 12 0.11
> ...
> a 1995 12 12 0.20
> ...
> b 1994 1 8 0.20
> b 1994 2 8 0.32
> ...
>
> In other words, for each code, I have x data for every month of
> yr1994-2005. And bench_month differs for each code.
>
> My goal is to calculate the variance of x during a year with its
> yearend defined as 10 months after the bench_month. For example, for
> code a with bench_month equal 12, I'd like to know the variance of x
> from 1994 month 11 to 1995 month 10, etc.
>
> Please help if you have any good idea...
>
> Thanks,
> Tracy
>

Tracy,

I'm not going to ask how this problem arises. I'll take it that
calculating the variance among the data points within the 1 year
window starting one month before the benchmark month to 10 months
after the benchmark month is acceptable for the problem that you
must address.

While you have not explicitly stated so, I will assume that the
benchmark month is in month 2 or later of year 1994 but no later
than month 2 of year 1995. If the benchmark month is outside that
window, then you would not be able to obtain a year's worth of
data commencing from the month prior to the benchmark month through
10 months after the benchmark month. Actually, this presents a bit
of a problem in that the benchmark month=2 could belong to either
1994 or 1995. I will assume that a benchmark month=2 belongs to
1994 only.

Given this assumptions and assuming that your data are sorted by
code, there is an easy solution to your problem. You have all the
information in the benchmark month to compose a WHERE condition
which would restrict the data to the appropriate interval. For
benchmark month=1, then we would require the month=12 observation
from 1994 and all 1995 observations up through month 11. All
other benchmark months would allow us to construct a condition
where we accept all 1994 observations with month greather than
or equal to one less than the benchmark month and any 1995
observations with month less than or equal to the benchmark month
minus 2. Throw in by variable processing to get the variance
separately for each code level, and you have your solution.


proc means data=mydata;
where ( benchmark=1 AND
( (year=1994 AND month=12) OR
(year=1995 AND month<=11) ) )
OR
( (year=1994 AND month>=(benchmark-1)) OR
(year=1995 AND month<=(benchmark-2) ) );
by code;
var x;
run;


Of course, you may want to use an OUTPUT statement or ODS to
capture the variance values and place them in a data set for
some subsequent need.

HTH,

Dale


---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra(a)NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
From: Muthia Kachirayan on
Tracy,

You say yearend defined as 10 months after bench_month. Then you want
variance for 12 months (1994/11 to 1995/10).

If you can provide sample data for 2 years and the output derived from that
data will help someone to provide you a good solution.

Muthia Kachirayan


On 4/20/06, lisiqi77(a)yahoo.com <lisiqi77(a)yahoo.com> wrote:
>
> Hey,
>
> I have a dataset with 5 variables like the following:
>
> code year month bench_mo x;
>
> a 1994 1 12 0.21
> a 1994 2 12 0.22
> a 1994 3 12 0.26
> ...
> a 1994 12 12 0.20
> a 1995 1 12 0.11
> ...
> a 1995 12 12 0.20
> ...
> b 1994 1 8 0.20
> b 1994 2 8 0.32
> ...
>
> In other words, for each code, I have x data for every month of
> yr1994-2005. And bench_month differs for each code.
>
> My goal is to calculate the variance of x during a year with its
> yearend defined as 10 months after the bench_month. For example, for
> code a with bench_month equal 12, I'd like to know the variance of x
> from 1994 month 11 to 1995 month 10, etc.
>
> Please help if you have any good idea...
>
> Thanks,
> Tracy
>
From: lisiqi77 on
Hi, Dale,

Thanks for your reply. Actually my data ranges from 1994 to 2004, which
somehow conplicates the problem. The solution you proposed works well
for two years of data, i.e., 1994 and 1995. But for my sample period,
would it be adequate?

And upon the request of Muthia, I include a more complete sample data
as follows:
code year month bench_mo x beg_mo end_mo
a 1994 1 8 0.84 7 6
a 1994 2 8 0.01 7 6
a 1994 3 8 0.38 7 6
a 1994 4 8 0.85 7 6
a 1994 5 8 0.68 7 6
a 1994 6 8 0.25 7 6
a 1994 7 8 0.88 7 6
a 1994 8 8 0.75 7 6
a 1994 9 8 0.13 7 6
a 1994 10 8 0.30 7 6
a 1994 11 8 0.04 7 6
a 1994 12 8 0.77 7 6
a 1995 1 8 0.57 7 6
a 1995 2 8 0.94 7 6
a 1995 3 8 0.03 7 6
a 1995 4 8 0.50 7 6
a 1995 5 8 0.69 7 6
a 1995 6 8 0.72 7 6
a 1995 7 8 0.62 7 6
a 1995 8 8 0.40 7 6
a 1995 9 8 0.08 7 6
a 1995 10 8 0.69 7 6
a 1995 11 8 0.39 7 6
a 1995 12 8 0.84 7 6
b 2001 1 1 0.24 12 11
b 2001 2 1 0.77 12 11
b 2001 3 1 0.49 12 11
b 2001 4 1 0.40 12 11
b 2001 5 1 0.71 12 11
b 2001 6 1 0.25 12 11
b 2001 7 1 0.71 12 11
b 2001 8 1 0.63 12 11
b 2001 9 1 0.35 12 11
b 2001 10 1 0.30 12 11
b 2001 11 1 0.63 12 11
b 2001 12 1 0.25 12 11
b 2002 1 1 0.15 12 11
b 2002 2 1 0.99 12 11
b 2002 3 1 0.37 12 11
b 2002 4 1 0.00 12 11
b 2002 5 1 0.50 12 11
b 2002 6 1 0.42 12 11
b 2002 7 1 0.09 12 11
b 2002 8 1 0.84 12 11
b 2002 9 1 0.97 12 11
b 2002 10 1 0.97 12 11
b 2002 11 1 0.85 12 11
b 2002 12 1 0.68 12 11

Note that I constructed two additional variables, beg_mo and end_mo
based on the defintion of yearend as 10+ of benchmark month. My goal is
to calculate the variance of X during the self-defined year. Those
months not in the year range are discarded. The output variance is
0.11 for code a and 0.14 for code b.

Thanks,
Tracy

From: Jiann-Shiun Huang on
Tracy:

See if the following code fits your need. Output follows the code.

data InData;
input code $ year month bench_mo x;
datalines;
a 1994 1 12 0.21
a 1994 2 12 0.22
a 1994 3 12 0.26
a 1994 4 12 0.21
a 1994 5 12 0.22
a 1994 6 12 0.26
a 1994 7 12 0.21
a 1994 8 12 0.22
a 1994 9 12 0.26
a 1994 10 12 0.20
a 1994 11 12 0.20
a 1994 12 12 0.20
a 1995 1 9 0.11
a 1995 2 9 0.13
a 1995 3 9 0.03
a 1995 4 9 0.11
a 1995 5 9 0.13
a 1995 6 9 0.03
a 1995 7 9 0.11
a 1995 8 9 0.13
a 1995 9 9 0.03
a 1995 10 9 0.11
a 1995 11 9 0.13
a 1995 12 9 0.20
b 1994 1 2 0.20
b 1994 2 2 0.32
b 1994 3 2 0.22
b 1994 4 2 0.20
b 1994 5 2 0.32
b 1994 6 2 0.22
b 1994 7 2 0.20
b 1994 8 2 0.32
b 1994 9 2 0.22
b 1994 10 2 0.20
b 1994 11 2 0.32
b 1994 12 2 0.22
b 1995 1 3 0.20
b 1995 2 3 0.32
b 1995 3 3 0.22
b 1995 4 3 0.20
b 1995 5 3 0.32
b 1995 6 3 0.22
b 1995 7 3 0.22
b 1995 8 3 0.20
b 1995 9 3 0.20
b 1995 10 3 0.32
b 1995 11 3 0.22
b 1995 12 3 0.22
b 1996 1 3 0.20
b 1996 2 3 0.32
b 1996 3 3 0.25
;
run;

proc sort data=InData;
by code year month;
run;

data GroupData(keep=code year month bench_mo x Group);
length ThisCode $ 1 LastCode $ 1;
retain ThisYear ThisBench LastYear LastBench ThisCode LastCode;
set InData;
by code year;
if first.code then do;
LastYear=year-1;
LastBench=1;
LastCode=code;
ThisYear=year;
ThisBench=bench_mo;
ThisCode=code;
end;
else do;
if first.year then do;
LastYear=ThisYear;
LastBench=ThisBench;
LastCode=ThisCode;
ThisYear=year;
ThisBench=bench_mo;
ThisCode=code;
end;
end;
if -1 le (12*(year-ThisYear)+month-ThisBench) le 10 then do;
Group=ThisCode || put(ThisYear,4.);
output;
end;
if -1 le (12*(year-LastYear)+month-LastBench) le 10 then do;
Group=LastCode || put(LastYear,4.);
output;
end;
run;

proc sort data=GroupData;
by Group year month;
run;

proc means data=GroupData MEAN VAR;
var x;
class Group;
run;

***** Output *****
The SAS System 11:03 Thursday, April 20, 2006 73

The MEANS Procedure

Analysis Variable : x

N
Group Obs Mean Variance
********************************************
a1994 12 0.1100000 0.0033273

a1995 5 0.1200000 0.0037000

b1994 12 0.2466667 0.0030061

b1995 12 0.2383333 0.0025061

b1996 2 0.2850000 0.0024500
********************************************



J S Huang
1-515-557-3987
fax 1-515-557-2422

>>> Muthia Kachirayan <muthia.kachirayan(a)GMAIL.COM> 04/20/06 6:51 PM >>>
Tracy,

You say yearend defined as 10 months after bench_month. Then you want
variance for 12 months (1994/11 to 1995/10).

If you can provide sample data for 2 years and the output derived from that
data will help someone to provide you a good solution.

Muthia Kachirayan


On 4/20/06, lisiqi77(a)yahoo.com <lisiqi77(a)yahoo.com> wrote:
>
> Hey,
>
> I have a dataset with 5 variables like the following:
>
> code year month bench_mo x;
>
> a 1994 1 12 0.21
> a 1994 2 12 0.22
> a 1994 3 12 0.26
> ...
> a 1994 12 12 0.20
> a 1995 1 12 0.11
> ...
> a 1995 12 12 0.20
> ...
> b 1994 1 8 0.20
> b 1994 2 8 0.32
> ...
>
> In other words, for each code, I have x data for every month of
> yr1994-2005. And bench_month differs for each code.
>
> My goal is to calculate the variance of x during a year with its
> yearend defined as 10 months after the bench_month. For example, for
> code a with bench_month equal 12, I'd like to know the variance of x
> from 1994 month 11 to 1995 month 10, etc.
>
> Please help if you have any good idea...
>
> Thanks,
> Tracy
>