From: "Howard Schreier <hs AT dc-sug DOT org>" on
On Thu, 20 Apr 2006 15:39:58 -0700, lisiqi77(a)YAHOO.COM wrote:

>Hey,
>
>I have a dataset with 5 variables like the following:
>
> code year month bench_mo x;
>
> a 1994 1 12 0.21
> a 1994 2 12 0.22
> a 1994 3 12 0.26
> ...
> a 1994 12 12 0.20
> a 1995 1 12 0.11
> ...
> a 1995 12 12 0.20
> ...
> b 1994 1 8 0.20
> b 1994 2 8 0.32
> ...
>
>In other words, for each code, I have x data for every month of
>yr1994-2005. And bench_month differs for each code.
>
>My goal is to calculate the variance of x during a year with its
>yearend defined as 10 months after the bench_month. For example, for
>code a with bench_month equal 12, I'd like to know the variance of x
>from 1994 month 11 to 1995 month 10, etc.
>
>Please help if you have any good idea...

SQL makes this fairly simple.

First generate test data:

data have;
input code $ bench_mo;
do year = 1994 to 2005;
do month = 1 to 12;
ym = year*100 + month;
x = round(ranuni(1),0.01);
output;
end;
end;
cards;
a 12
b 0
c -2
;

Variable YM is there only to help track behavior of the code. It can be
eliminated (along with references to it below).

Solution:

proc sql;
create table goal as
select code,
bench_mo,
year(intnx('month',mdy(month,1,year),2-bench_mo) ) as year,
var(x) as var,
n(x) as many,
min(ym) as from,
max(ym) as to
from have
group by code, bench_mo, calculated year
/* having n(x)=12 */
order by code, bench_mo, calculated year
;

The HAVING BY clause can be used to exclude the partial-year tails which
otherwise are part of the result.

The CALCULATED keyword is critical, since there is also a source column
neamed YEAR.

>
>Thanks,
>Tracy
From: Ya Huang on
What happens if bench_mo=2? It seems that both of the 12-month periods
in 1994 and 1995 will meet the criteria.



On Thu, 20 Apr 2006 18:48:17 -0700, lisiqi77(a)YAHOO.COM wrote:

>Hi, Dale,
>
>Thanks for your reply. Actually my data ranges from 1994 to 2004, which
>somehow conplicates the problem. The solution you proposed works well
>for two years of data, i.e., 1994 and 1995. But for my sample period,
>would it be adequate?
>
>And upon the request of Muthia, I include a more complete sample data
>as follows:
>code year month bench_mo x beg_mo end_mo
>a 1994 1 8 0.84 7 6
>a 1994 2 8 0.01 7 6
>a 1994 3 8 0.38 7 6
>a 1994 4 8 0.85 7 6
>a 1994 5 8 0.68 7 6
>a 1994 6 8 0.25 7 6
>a 1994 7 8 0.88 7 6
>a 1994 8 8 0.75 7 6
>a 1994 9 8 0.13 7 6
>a 1994 10 8 0.30 7 6
>a 1994 11 8 0.04 7 6
>a 1994 12 8 0.77 7 6
>a 1995 1 8 0.57 7 6
>a 1995 2 8 0.94 7 6
>a 1995 3 8 0.03 7 6
>a 1995 4 8 0.50 7 6
>a 1995 5 8 0.69 7 6
>a 1995 6 8 0.72 7 6
>a 1995 7 8 0.62 7 6
>a 1995 8 8 0.40 7 6
>a 1995 9 8 0.08 7 6
>a 1995 10 8 0.69 7 6
>a 1995 11 8 0.39 7 6
>a 1995 12 8 0.84 7 6
>b 2001 1 1 0.24 12 11
>b 2001 2 1 0.77 12 11
>b 2001 3 1 0.49 12 11
>b 2001 4 1 0.40 12 11
>b 2001 5 1 0.71 12 11
>b 2001 6 1 0.25 12 11
>b 2001 7 1 0.71 12 11
>b 2001 8 1 0.63 12 11
>b 2001 9 1 0.35 12 11
>b 2001 10 1 0.30 12 11
>b 2001 11 1 0.63 12 11
>b 2001 12 1 0.25 12 11
>b 2002 1 1 0.15 12 11
>b 2002 2 1 0.99 12 11
>b 2002 3 1 0.37 12 11
>b 2002 4 1 0.00 12 11
>b 2002 5 1 0.50 12 11
>b 2002 6 1 0.42 12 11
>b 2002 7 1 0.09 12 11
>b 2002 8 1 0.84 12 11
>b 2002 9 1 0.97 12 11
>b 2002 10 1 0.97 12 11
>b 2002 11 1 0.85 12 11
>b 2002 12 1 0.68 12 11
>
>Note that I constructed two additional variables, beg_mo and end_mo
>based on the defintion of yearend as 10+ of benchmark month. My goal is
>to calculate the variance of X during the self-defined year. Those
>months not in the year range are discarded. The output variance is
>0.11 for code a and 0.14 for code b.
>
>Thanks,
>Tracy
From: lisiqi77 on
Hi, Howard,

Your solution is magically simple! I tested it in my own dataset and it
worked well!

I think the clause " year(intnx('month',mdy(month,1,year),2-bench_mo)
) as year" is key but I actually don't quite understand how it works.
Do you mind explainning it in more details?

Thanks very much,
Tracy

From: "Howard Schreier <hs AT dc-sug DOT org>" on
On Mon, 24 Apr 2006 11:56:58 -0700, lisiqi77(a)YAHOO.COM wrote:

>Hi, Howard,
>
>Your solution is magically simple! I tested it in my own dataset and it
>worked well!
>
>I think the clause " year(intnx('month',mdy(month,1,year),2-bench_mo)
>) as year" is key but I actually don't quite understand how it works.
>Do you mind explainning it in more details?

Just work through an example. We can use the one in your initial post:

"My goal is to calculate the variance of x during a year with its
yearend defined as 10 months after the bench_month. For example, for
code a with bench_month equal 12, I'd like to know the variance of x
from 1994 month 11 to 1995 month 10, etc."

1994 month 11: MDY(11,1,1994) yields the number of days from 1 Jan. 1960 to
1 Nov. 1994. Then INTNX() shifts that back by 10 months (to the number of
days from 1 Jan. 1960 to 1 Jan. 1994), since 2-12 is -10. Finally, YEAR()
extracts 1994 from that.

1995 month 10: the same process yields 1994 as the year including 1 Dec.
1994.

Months between these 2 endpoints are mapped to the interior months (Feb.
through Nov.) of 1994. All yield 1994 as the new YEAR value.

Months outside this 12-month interval yield the appropriate years
before/after 1994.

>
>Thanks very much,
>Tracy