From: billyiqing on
Hi All,

I'm trying to create a SQL query, which finds the previous 52-week's
maximum price for each day. My stock prices are daily. The variables
are p (price), ticker and date. I have written the following code:

%let J=52;
proc sql noprint;
create table _52mom.umd4
as select a.ticker, a.date, max(p) as high
from _52mom.intreturns_s4 (keep = ticker date) as a,
_52mom.intreturns_s4 as b
where a.ticker = b.ticker and b.date between a.date and intnx('week',
a.date, -&J)
group by a.ticker, a.date
having count(a.date)>=200;
quit;

However, my dataset is quite large (the whole set is about 4.6G, only
3 variables - Price, Ticker, and Date). I have subdivided it into four
subsamples. The smallest is about 500M.

I have tested the above code on a 8M data, and it works out fine.
However, when I run it on the 500M, it becomes extremely slow (I have
been running for 6 hours now, still going, not sure when it will
finish). Also I'm not sure about the spillover table that proc SQL
creates, and how large that will become.

Could someone please kindly help me on this issue? I didn't think the
data would take this long to run, is there an easier/more efficient
way to do this? Sorry I am kind of new to SAS, don't know much about
it yet.

Thanks very much in advance!!!
From: Patrick on
Hi

Joining such a huge table with itself is for sure very resource
intensive.

Assuming that your source is a SAS table I'd expect that the code
below will perform better. If the max price was reached on several
days then you'll get several obs per ticker.

data intreturns_s4;
format date date9.;
stop=today();
drop stop;
do ticker=1 to 3;
do date='01JAN2009'd to stop;
p=ceil(ranuni(1)*100);
output;
end;
end;
run;

%let J=52;
data _null_;
start_date=intnx('week',today(), -&J,'b');
stop_date =intnx('week',today(), -1,'e');
put start_date= weekdate.;
put stop_date = weekdate.;
call symput('start_date',cats(start_date));
call symput('stop_date',cats(stop_date));
run;


proc sql;
create view work.intreturns_s4_SORTED as
select ticker, date, p
from work.intreturns_s4
where date between &start_date and &stop_date
order by ticker, p DESC
;
quit;

data work.umd4;
array p_retained {1} 8 _temporary_;
set intreturns_s4_SORTED;
by ticker;
if first.ticker then p_retained{1}=p;
if p=p_retained{1} then output;
run;

proc print data=work.umd4;
run;


If you need everything done in SQL then search this forum with
keywords SQL and TOP. There is more than one thread discussing how to
select the top record using SQL.

HTH
Patrick
From: Andrew Karp Sierra Info Services on
On Aug 5, 2:36 am, Patrick <patrick.mat...(a)gmx.ch> wrote:
> Hi
>
> Joining such a huge table with itself is for sure very resource
> intensive.
>
> Assuming that your source is a SAS table I'd expect that the code
> below will perform better. If the max price was reached on several
> days then you'll get several obs per ticker.
>
> data intreturns_s4;
>   format date date9.;
>   stop=today();
>   drop stop;
>   do ticker=1 to 3;
>     do date='01JAN2009'd to stop;
>       p=ceil(ranuni(1)*100);
>       output;
>     end;
>   end;
> run;
>
> %let J=52;
> data _null_;
>   start_date=intnx('week',today(), -&J,'b');
>   stop_date =intnx('week',today(), -1,'e');
>   put start_date= weekdate.;
>   put stop_date =   weekdate.;
>   call symput('start_date',cats(start_date));
>   call symput('stop_date',cats(stop_date));
> run;
>
> proc sql;
>   create view work.intreturns_s4_SORTED as
>     select ticker, date, p
>       from work.intreturns_s4
>         where date between &start_date and &stop_date
>       order by ticker, p DESC
>   ;
> quit;
>
> data work.umd4;
>   array p_retained {1} 8 _temporary_;
>   set intreturns_s4_SORTED;
>   by ticker;
>   if first.ticker then p_retained{1}=p;
>   if p=p_retained{1} then output;
> run;
>
> proc print data=work.umd4;
> run;
>
> If you need everything done in SQL then search this forum with
> keywords SQL and TOP. There is more than one thread discussing how to
> select the top record using SQL.
>
> HTH
> Patrick

I think this is something PROC EXPAND, in the ETS module, is uniquely
qualified to do without a lot of tedious Data Step and / or SQL
coding. Check out my paper, "Time Series Magic: Using PROC EXPAND
with Time Series Data" which is available for free download at
http://www.sierrainformation.com . Click on the Free Downloads link
on the left hand side of the page and take things from there.

Thanks,

Andrew Karp
Sierra Information Services
http://www.sierrainformation.com
From: billyiqing on
Thank you very much for your help Patrick!

However, one thing that I didn't make clear in the original thread was
that I need a rolling window, i.e. I need to calculate this 52-week
high for EVERY observation in my dataset. Hence the big join in my
SQL.

Any thoughts on this?

Thanks
Bill

On Aug 5, 6:22 pm, billyiqing <billyiq...(a)gmail.com> wrote:
> Hi All,
>
> I'm trying to create a SQL query, which finds the previous 52-week's
> maximum price for each day. My stock prices are daily. The variables
> are p (price), ticker and date. I have written the following code:
>
> %let J=52;
> proc sql noprint;
>     create table _52mom.umd4
>         as select a.ticker, a.date, max(p) as high
>         from _52mom.intreturns_s4 (keep = ticker date) as a,
> _52mom.intreturns_s4 as b
>         where a.ticker = b.ticker and b.date between a.date and intnx('week',
> a.date, -&J)
>         group by a.ticker, a.date
>         having count(a.date)>=200;
> quit;
>
> However, my dataset is quite large (the whole set is about 4.6G, only
> 3 variables - Price, Ticker, and Date). I have subdivided it into four
> subsamples. The smallest is about 500M.
>
> I have tested the above code on a 8M data, and it works out fine.
> However, when I run it on the 500M, it becomes extremely slow (I have
> been running for 6 hours now, still going, not sure when it will
> finish). Also I'm not sure about the spillover table that proc SQL
> creates, and how large that will become.
>
> Could someone please kindly help me on this issue? I didn't think the
> data would take this long to run, is there an easier/more efficient
> way to do this? Sorry I am kind of new to SAS, don't know much about
> it yet.
>
> Thanks very much in advance!!!

From: Patrick on
Hi Bill

Just to clarify before I'm coming up with something - please correct
the following statements if wrong:

- The result data set contains all the observations from the source
data set which fall into the date range of the last 52 weeks.
- The combination of ticker and date is not unique.
- The variable 'high' (max. price) contains the max value of
'p' (price) for each combination of ticker and date.

i.e. Have

Ticker Date P
1 01jan2010 5
1 01jan2010 4
1 01jan2010 1
1 02jan2010 6
1 02jan2010 9
1 02jan2010 3

Want:
Ticker Date High
1 01jan2010 5
1 01jan2010 5
1 01jan2010 5
1 02jan2010 9
1 02jan2010 9
1 02jan2010 9


Thanks
Patrick