From: billyiqing on
Hi Patrick

The data looks something like this:

Ticker Date P
1 01Jan2010 5
1 02Jan2010 5
1 03Jan2010 6
1 04Jan2010 6.5
1 05Jan2010 6
2 01Jan2010 12
2 02Jan2010 12.4
2 03Jan2010 12.3
2 04Jan2010 13
2 05Jan2010 12.8
2 05Jan2010 14

And I want something like this:

Ticker Date P High
1 01Jan2010 5 5
1 02Jan2010 5 5
1 03Jan2010 6 6
1 04Jan2010 6.5 6.5
1 05Jan2010 6 6.5
2 01Jan2010 12 12
2 02Jan2010 12.4 12.4
2 03Jan2010 12.3 12.4
2 04Jan2010 13 13
2 05Jan2010 12.8 13
2 05Jan2010 14 14

The result data needs to contain ALL observations in the original
dataset, with an additional column containing the past 52-week high
for each observation.
Each ticker would have the stock price for the stock's entire period,
therefore the the Ticker-Date is unique. But neither the ticker itself
nor the date itself is unique.
The date spans across several years - sometimes really long periods.

Thank you SO much for your help!!!!
On Aug 6, 7:38 pm, Patrick <patrick.mat...(a)gmx.ch> wrote:
> Hi Bill
>
> Just to clarify before I'm coming up with something - please correct
> the following statements if wrong:
>
> - The result data set contains all the observations from the source
> data set which fall into the date range of the last 52 weeks.
> - The combination of ticker and date is not unique.
> - The variable 'high' (max. price) contains the max value of
> 'p' (price) for each combination of ticker and date.
>
> i.e. Have
>
> Ticker Date P
> 1 01jan2010 5
> 1 01jan2010 4
> 1 01jan2010 1
> 1 02jan2010 6
> 1 02jan2010 9
> 1 02jan2010 3
>
> Want:
> Ticker Date High
> 1 01jan2010 5
> 1 01jan2010 5
> 1 01jan2010 5
> 1 02jan2010 9
> 1 02jan2010 9
> 1 02jan2010 9
>
> Thanks
> Patrick

From: Patrick on
Hi Bill

Below code which (I think) solves the problem and should perform
reasonably well.
Everything else I can think of right now wouldn't add much performance
gain but complicate the code a lot.

If you're not familiar with the hash object:
Besides of the SASDoc there are also a few good white papers around. I
find also this quick reference very helpful:
http://support.sas.com/rnd/base/datastep/dot/hash-tip-sheet.pdf

data have;
infile datalines delimiter=' ';
input Ticker Date:date9. P;
format date date9.;
datalines;
1 01May2008 50
1 01Feb2009 3
1 01Mar2009 2
1 01Apr2009 4
1 01May2009 3
1 01Jun2009 1
1 01Jul2009 3
1 01Aug2009 4
1 01Jan2010 5
1 02Jan2010 5
1 03Jan2010 6
1 04Jan2010 6.5
1 05Jan2010 6
2 01Jan2010 12
2 02Jan2010 12.4
2 03Jan2010 12.3
2 04Jan2010 13
2 05Jan2010 12.8
2 05Jan2010 14
;
run;

/* number of weeks to look at */
%let J=52;

/* select all obs within date range and order result set by ticker and
date */
proc sql;
create view Vhave_Sorted as
select *
from have
order by ticker,date
;
quit;

data want(keep=Ticker Date p high);
set Vhave_Sorted;
by ticker date;
ThisDate=date;

/* declare hash table for look up */
if _n_=1 then
do;
declare hash h ();
rc=h.defineKey('ticker','ThisDate');
rc=h.defineData('p');
rc=h.defineDone();
end;

/* clear all entries in hash table if a new ticker starts */
if first.ticker then
do;
rc=rc = h.clear();
end;

/* add keys and data as in current record as defined to hash table
*/
rc=h.add();

/* lookup data in hash table and keep the max data value in the last
52 weeks */
/* starting from the current date
value */
do ThisDate=(date- 7*&j) to date;
rc=h.find();
high=max(high,p);
end;

run;

proc print data=want noobs;
run;

I was not 100% sure how exactly the date range has to be defined. You
might have to amend it to suit your needs.
What I did is just look back at the last 7*52 days:
-> do ThisDate=(date- 7*&j) to date;
.....or should it be 7*52+1 - or is it the last 52 weeks before the
date in current week (you would need intnx() for this).

I'm sure you'll manage to sort this out ;-)

Let me know if the code worked for you.

HTH
Patrick

From: Patrick on
Hi Andrew

Could you provide a code example which solves Bill's problem (he gave
us now some sample data)?

I'm really curious how this could be done with PROC EXPAND only.

Thanks
Patrick
From: Patrick on
Hi Bill

And last but not least (seem I got inspired...):
In case that you have a lot of RAM (>4GB addressable) and you don't
need your result data set sorted by ticker and date then loading the
full table into a hash for lookup would avoid the necessity for
sorting (which costs a lot of resources and time).

data have;
infile datalines delimiter=' ';
input Ticker Date:date9. P;
format date date9.;
datalines;
1 01Feb2009 3
1 01Mar2009 2
1 01Apr2009 4
1 01May2009 3
1 01Jun2009 1
1 01Jul2009 3
1 01Aug2009 4
1 01Jan2010 5
1 01May2008 50
1 02Jan2010 5
1 03Jan2010 6
1 04Jan2010 6.5
1 05Jan2010 6
2 01Jan2010 12
2 02Jan2010 12.4
2 03Jan2010 12.3
2 04Jan2010 13
2 05Jan2010 12.8
2 05Jan2010 14
;
run;

/* number of weeks to look at */
%let J=52;

data want(keep=Ticker Date p high);
set have;

/* declare hash table for look up */
if _n_=1 then
do;
declare hash h (dataset:'work.have');
rc=h.defineKey('ticker','Date');
rc=h.defineData('p');
rc=h.defineDone();
end;

/* lookup data in hash table and keep the max data value in the last
52 weeks */
/* starting from the current date
value */
ThisDate=date;
do date=(ThisDate- 7*&j) to date;
rc=h.find();
high=max(high,p);
end;
date=ThisDate;

run;

proc print data=want noobs;
run;


If you want to use this code then you have to pre-define the size of
the hash table - which can be tricky. The default is very likely to
small and not efficient.

Just two links dealing with this hashexp question:

http://books.google.com.au/books?id=PplQOP__uhcC&pg=PA397&lpg=PA397&dq=sas+hashexp+calculate&source=bl&ots=u3MMo0v2g1&sig=igTYtfGOKMDMm7EEksxHBuu-La4&hl=en&ei=kiJcTMnIOZKecdvbjPsB&sa=X&oi=book_result&ct=result&resnum=6&ved=0CDEQ6AEwBQ#v=onepage&q&f=false

http://support.sas.com/kb/34/193.html


I'm guilty of not having defined the hash table size in the code I
provided. For the previous example the hash table stays quite small
(less than 3000 records for 7.5 years and one ticker). It might be
worth to define a hashexp of 2 only (not sure about this - that's why
I just ignored it).

HTH
Patrick