Simple Linear Regression By SAS BASE [SAS]

Prev: use work data in/out sas EG stored process
Next: Features of Project-drive.net

From: Kaz on 7 Jun 2010 21:03

Dear SAS experts,

I am currently trying to calculate coefficient, intercept adn R-square
for S&P500 and 10,000 shares.

e.g.
proc reg data=share; model apple=sp500; run;
proc reg data=share; model ge=sp500; run;
proc reg data=share; model microsoft=sp500; run;
..... continues 10,000 times.....

All data are avilable but I am wondering if you have to use "prog reg"
10,000 times.
If you know better way by using base sas functions to calculate simple
linear regression
between S&P500 and 10,000 shares, could you please adivse me?

Thank you for your support in advance.

Kazutoshi

From: Reeza on 7 Jun 2010 21:27

On Jun 7, 6:03 pm, Kaz <kazutoshi.shideh...(a)gmail.com> wrote:
> Dear SAS experts,
>
> I am currently trying to calculate coefficient, intercept adn R-square
> for S&P500 and 10,000 shares.
>
> e.g.
> proc reg data=share; model apple=sp500; run;
> proc reg data=share; model ge=sp500; run;
> proc reg data=share; model microsoft=sp500; run;
> .... continues 10,000 times.....
>
> All data are avilable but I am wondering if you have to use "prog reg"
> 10,000 times.
> If you know better way by using base sas functions to calculate simple
> linear regression
> between S&P500 and 10,000 shares, could you please adivse me?
>
> Thank you for your support in advance.
>
> Kazutoshi

I'm wondering why you'd want to do that. If its an issue of license
I'd suggest using R instead.

You could do it with IML or even Base but it wouldn't be recommended.
Use the Right tool for the right job...

From: Kaz on 7 Jun 2010 21:49

On 6$B7n(B8$BF|(B, $B8aA0(B9:27, Reeza <fkhurs...(a)hotmail.com> wrote:
> On Jun 7, 6:03 pm, Kaz <kazutoshi.shideh...(a)gmail.com> wrote:
>
>
>
>
>
> > Dear SAS experts,
>
> > I am currently trying to calculate coefficient, intercept adn R-square
> > for S&P500 and 10,000 shares.
>
> > e.g.
> > proc reg data=share; model apple=sp500; run;
> > proc reg data=share; model ge=sp500; run;
> > proc reg data=share; model microsoft=sp500; run;
> > .... continues 10,000 times.....
>
> > All data are avilable but I am wondering if you have to use "prog reg"
> > 10,000 times.
> > If you know better way by using base sas functions to calculate simple
> > linear regression
> > between S&P500 and 10,000 shares, could you please adivse me?
>
> > Thank you for your support in advance.
>
> > Kazutoshi
>
> I'm wondering why you'd want to do that. If its an issue of license
> I'd suggest using R instead.
>
> You could do it with IML or even Base but it wouldn't be recommended.
> Use the Right tool for the right job...- $B0zMQ%F%-%9%H$rI=<($7$J$$(B -
>
> - $B0zMQ%F%-%9%H$rI=<((B -

Macro is not a problem but the time consumption for prog reg 10,000
times.
The following macro takes about 2 sconds for each proc reg.
If you have faster method to do this, I would want to know.

%macro a;
%do i=1 %to 100;
proc reg data=share;
model share&i=sp500;
run;
%end;
%mend a;
%a;

From: Reeza on 8 Jun 2010 00:20

On Jun 7, 6:49 pm, Kaz <kazutoshi.shideh...(a)gmail.com> wrote:
> On 6$B7n(B8$BF|(B, $B8aA0(B9:27, Reeza <fkhurs...(a)hotmail.com> wrote:
>
>
>
>
>
> > On Jun 7, 6:03 pm, Kaz <kazutoshi.shideh...(a)gmail.com> wrote:
>
> > > Dear SAS experts,
>
> > > I am currently trying to calculate coefficient, intercept adn R-square
> > > for S&P500 and 10,000 shares.
>
> > > e.g.
> > > proc reg data=share; model apple=sp500; run;
> > > proc reg data=share; model ge=sp500; run;
> > > proc reg data=share; model microsoft=sp500; run;
> > > .... continues 10,000 times.....
>
> > > All data are avilable but I am wondering if you have to use "prog reg"
> > > 10,000 times.
> > > If you know better way by using base sas functions to calculate simple
> > > linear regression
> > > between S&P500 and 10,000 shares, could you please adivse me?
>
> > > Thank you for your support in advance.
>
> > > Kazutoshi
>
> > I'm wondering why you'd want to do that. If its an issue of license
> > I'd suggest using R instead.
>
> > You could do it with IML or even Base but it wouldn't be recommended.
> > Use the Right tool for the right job...- $B0zMQ%F%-%9%H$rI=<($7$J$$(B -
>
> > - $B0zMQ%F%-%9%H$rI=<((B -
>
> Macro is not a problem but the time consumption for prog reg 10,000
> times.
> The following macro takes about 2 sconds for each proc reg.
> If you have faster method to do this, I would want to know.
>
> %macro a;
> %do i=1 %to 100;
> proc reg data=share;
> model share&i=sp500;
> run;
> %end;
> %mend a;
> %a;

It can be done in base sas, basically need a few summary stats per
variable, sum(x), sum(x^2), sum(x*y) , sum(y), sum(y^2)

the calculations are then as follows
Coefficient= 1/(sum(x^2)/n)*sum(x*y)
Intercept=sum(y)/n-Coefficient*sum(x)/n

R squared is more complicated and can be found on this link:
http://mathbits.com/Mathbits/TISection/Statistics2/correlation.htm

You can use arrays and retains to calculate all of this with one pass
through the data, but it won't be pretty, and not sure about faster.
My guess is it would take longer than 3 hours to program and
test....my guess :)

You could trick proc means into giving you some of the summary stats
using weight=variable for squaring but it has to be done once for each
variable so doubt that's faster as well....

Using the no print option in proc reg and outputting to a dataset to
recalculate may be fastest overall. I'm curious to see what others
have to say on this.

HTH,
Reeza

From: Kaz on 8 Jun 2010 02:48

On 6$B7n(B8$BF|(B, $B8a8e(B12:20, Reeza <fkhurs...(a)hotmail.com> wrote:
> On Jun 7, 6:49 pm, Kaz <kazutoshi.shideh...(a)gmail.com> wrote:
>
>
>
>
>
> > On 6$B7n(B8$BF|(B, $B8aA0(B9:27, Reeza <fkhurs...(a)hotmail.com> wrote:
>
> > > On Jun 7, 6:03 pm, Kaz <kazutoshi.shideh...(a)gmail.com> wrote:
>
> > > > Dear SAS experts,
>
> > > > I am currently trying to calculate coefficient, intercept adn R-square
> > > > for S&P500 and 10,000 shares.
>
> > > > e.g.
> > > > proc reg data=share; model apple=sp500; run;
> > > > proc reg data=share; model ge=sp500; run;
> > > > proc reg data=share; model microsoft=sp500; run;
> > > > .... continues 10,000 times.....
>
> > > > All data are avilable but I am wondering if you have to use "prog reg"
> > > > 10,000 times.
> > > > If you know better way by using base sas functions to calculate simple
> > > > linear regression
> > > > between S&P500 and 10,000 shares, could you please adivse me?
>
> > > > Thank you for your support in advance.
>
> > > > Kazutoshi
>
> > > I'm wondering why you'd want to do that. If its an issue of license
> > > I'd suggest using R instead.
>
> > > You could do it with IML or even Base but it wouldn't be recommended.
> > > Use the Right tool for the right job...- $B0zMQ%F%-%9%H$rI=<($7$J$$(B -
>
> > > - $B0zMQ%F%-%9%H$rI=<((B -
>
> > Macro is not a problem but the time consumption for prog reg 10,000
> > times.
> > The following macro takes about 2 sconds for each proc reg.
> > If you have faster method to do this, I would want to know.
>
> > %macro a;
> > %do i=1 %to 100;
> > proc reg data=share;
> > model share&i=sp500;
> > run;
> > %end;
> > %mend a;
> > %a;
>
> It can be done in base sas, basically need a few summary stats per
> variable, sum(x), sum(x^2), sum(x*y) , sum(y), sum(y^2)
>
> the calculations are then as follows
> Coefficient= 1/(sum(x^2)/n)*sum(x*y)
> Intercept=sum(y)/n-Coefficient*sum(x)/n
>
> R squared is more complicated and can be found on this link:http://mathbits.com/Mathbits/TISection/Statistics2/correlation.htm
>
> You can use arrays and retains to calculate all of this with one pass
> through the data, but it won't be pretty, and not sure about faster.
> My guess is it would take longer than 3 hours to program and
> test....my guess :)
>
> You could trick proc means into giving you some of the summary stats
> using weight=variable for squaring but it has to be done once for each
> variable so doubt that's faster as well....
>
> Using the no print option in proc reg and outputting to a dataset to
> recalculate may be fastest overall. I'm curious to see what others
> have to say on this.
>
> HTH,
> Reeza- $B0zMQ%F%-%9%H$rI=<($7$J$$(B -
>
> - $B0zMQ%F%-%9%H$rI=<((B -

Hi Reeza,

Thanks for your note.
Yes I do realize that beta=covariance/variance.
That's why someone who may be able to calculate all stats easier than
using prog reg many times.
But together with R^2 and afpha(intercept), it may not be easy way to
do by using data.

I just wonder if anyone could suggest me better ways.
Proc corr or proc iml may be better solution but I am not sure....

Thank you,
Kaz

| Next | Last
Pages: 1 2
Prev: use work data in/out sas EG stored process
Next: Features of Project-drive.net