From: "Nick ." on
Hello,
Can one of you, Jonas or Dave or Peter or someone explain what this code is doing??? I am trying to follow this thread and I am lost already. Having a sample dataset will help a lot!!!
NICK


----- Original Message -----
From: "Jonas Bilenas"
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: Re: jackknife concept
Date: Wed, 10 May 2006 08:16:26 -0400


On Tue, 9 May 2006 17:49:53 -0400, Luo, Peter wrote:

> David, for what Jonas was trying to do, i.e. to get some 'error' estimates
> for model predictors, is N sub-samples or N bootstrapping samples the
better
> method?
>
I modified the code a bit, based on suggestions from David. Similar but
different results:

%macro boot(iter);

proc surveyselect data=reg out=outdata
rep=&ITER method=urs samprate=1 outhits;
run;

%do i=1 %to &iter;
ods listing close;
ods output ParameterEstimates=bout;
proc logistic data=outdata;
where replicate=&i;
model bad=&ivs;
run;
ods output close;

proc transpose data=bout out=bt&i;
var estimate;
id variable;
run;
%if "&i" ne "1" %then %do;
proc append base=bt1 data=bt&i;
run;
%end;
%end;
ods listing;


proc means data=bt1 mean min max std n nmiss;
run;
%mend;

%boot(20);

--
___________________________________________________
Play 100s of games for FREE! http://games.mail.com/
From: NordlDJ on
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Jonas
> Bilenas
> Sent: Wednesday, May 10, 2006 5:16 AM
> To: SAS-L(a)LISTSERV.UGA.EDU
> Subject: Re: jackknife concept
>
> On Tue, 9 May 2006 17:49:53 -0400, Luo, Peter <pluo(a)DRAFTNET.COM> wrote:
>
> >David, for what Jonas was trying to do, i.e. to get some 'error'
> estimates
> >for model predictors, is N sub-samples or N bootstrapping samples the
> better
> >method?
> >
> I modified the code a bit, based on suggestions from David. Similar but
> different results:
>
> %macro boot(iter);
>
> proc surveyselect data=reg out=outdata
> rep=&ITER method=urs samprate=1 outhits;
> run;
>
> %do i=1 %to &iter;
> ods listing close;
> ods output ParameterEstimates=bout;
> proc logistic data=outdata;
> where replicate=&i;
> model bad=&ivs;
> run;
> ods output close;
>
> proc transpose data=bout out=bt&i;
> var estimate;
> id variable;
> run;
> %if "&i" ne "1" %then %do;
> proc append base=bt1 data=bt&i;
> run;
> %end;
> %end;
> ods listing;
>
>
> proc means data=bt1 mean min max std n nmiss;
> run;
> %mend;
>
> %boot(20);

Jonas,

you haven't incorporated one of the most important suggestions that David
made, which is to use BY processing in Proc Logistic. That will eliminate
having to continually open and close the file of bootstrap samples, and the
file will only have to be read through once. Remove the %DO loop and
replace the where statement with a BY statement. You can also eliminate the
Proc Transpose and the Proc Append. Something like the following (I'm not
sure where the macro variable &ivs is defined) :

%macro boot(iter);
proc surveyselect data=reg out=outdata
rep=&ITER method=urs samprate=1 outhits;
run;

ods listing close;
ods output ParameterEstimates=bout;

proc logistic data=outdata;
by replicate;
model bad=&ivs;
run;

ods output close;
ods listing;


proc means data=bout mean min max std n nmiss;
class variable;
var estimate;
output out=estimate_summary;
run;
%mend;

%boot(20);

Hope this is helpful,

Dan

Daniel J. Nordlund
Research and Data Analysis
Washington State Department of Social and Health Services
Olympia, WA 98504-5204
From: "Nick ." on
Dan,

What is the objective of this macro? It will run 20 times, it will give you statistics on 20 differnt data sets? What is the objective of this macro for those of us who don't understand what Jonas is trying to implement and how to interprete?
NICK

----- Original Message -----
From: "Nordlund, Dan (DSHS)"
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: Re: jackknife concept
Date: Wed, 10 May 2006 12:25:43 -0700


> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of Jonas
> Bilenas
> Sent: Wednesday, May 10, 2006 5:16 AM
> To: SAS-L(a)LISTSERV.UGA.EDU
> Subject: Re: jackknife concept
>
> On Tue, 9 May 2006 17:49:53 -0400, Luo, Peter wrote:
>
> >David, for what Jonas was trying to do, i.e. to get some 'error'
> estimates
> >for model predictors, is N sub-samples or N bootstrapping samples the
> better
> >method?
> >
> I modified the code a bit, based on suggestions from David. Similar but
> different results:
>
> %macro boot(iter);
>
> proc surveyselect data=reg out=outdata
> rep=&ITER method=urs samprate=1 outhits;
> run;
>
> %do i=1 %to &iter;
> ods listing close;
> ods output ParameterEstimates=bout;
> proc logistic data=outdata;
> where replicate=&i;
> model bad=&ivs;
> run;
> ods output close;
>
> proc transpose data=bout out=bt&i;
> var estimate;
> id variable;
> run;
> %if "&i" ne "1" %then %do;
> proc append base=bt1 data=bt&i;
> run;
> %end;
> %end;
> ods listing;
>
>
> proc means data=bt1 mean min max std n nmiss;
> run;
> %mend;
>
> %boot(20);

Jonas,

you haven't incorporated one of the most important suggestions that David
made, which is to use BY processing in Proc Logistic. That will eliminate
having to continually open and close the file of bootstrap samples, and the
file will only have to be read through once. Remove the %DO loop and
replace the where statement with a BY statement. You can also eliminate the
Proc Transpose and the Proc Append. Something like the following (I'm not
sure where the macro variable &ivs is defined) :

%macro boot(iter);
proc surveyselect data=reg out=outdata
rep=&ITER method=urs samprate=1 outhits;
run;

ods listing close;
ods output ParameterEstimates=bout;

proc logistic data=outdata;
by replicate;
model bad=&ivs;
run;

ods output close;
ods listing;


proc means data=bout mean min max std n nmiss;
class variable;
var estimate;
output out=estimate_summary;
run;
%mend;

%boot(20);

Hope this is helpful,

Dan

Daniel J. Nordlund
Research and Data Analysis
Washington State Department of Social and Health Services
Olympia, WA 98504-5204

--
___________________________________________________
Play 100s of games for FREE! http://games.mail.com/
From: NordlDJ on
> -----Original Message-----
> From: Nick . [mailto:ni14(a)mail.com]
> Sent: Wednesday, May 10, 2006 12:56 PM
> To: Nordlund, Dan (DSHS); SAS-L(a)LISTSERV.UGA.EDU
> Subject: Re: Re: jackknife concept
>
> Dan,
>
> What is the objective of this macro? It will run 20 times, it will give
> you statistics on 20 differnt data sets? What is the objective of this
> macro for those of us who don't understand what Jonas is trying to
> implement and how to interprete?
> NICK
>
> ----- Original Message -----
> From: "Nordlund, Dan (DSHS)"
> To: SAS-L(a)LISTSERV.UGA.EDU
> Subject: Re: jackknife concept
> Date: Wed, 10 May 2006 12:25:43 -0700
>
>
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
> Jonas
> > Bilenas
> > Sent: Wednesday, May 10, 2006 5:16 AM
> > To: SAS-L(a)LISTSERV.UGA.EDU
> > Subject: Re: jackknife concept
> >
> > On Tue, 9 May 2006 17:49:53 -0400, Luo, Peter wrote:
> >
> > >David, for what Jonas was trying to do, i.e. to get some 'error'
> > estimates
> > >for model predictors, is N sub-samples or N bootstrapping samples the
> > better
> > >method?
> > >
> > I modified the code a bit, based on suggestions from David. Similar but
> > different results:
<<<snip>>>

> Jonas,
>
> you haven't incorporated one of the most important suggestions that David
> made, which is to use BY processing in Proc Logistic. That will eliminate
> having to continually open and close the file of bootstrap samples, and
> the
> file will only have to be read through once. Remove the %DO loop and
> replace the where statement with a BY statement. You can also eliminate
> the
> Proc Transpose and the Proc Append. Something like the following (I'm not
> sure where the macro variable &ivs is defined) :
>
> %macro boot(iter);
> proc surveyselect data=reg out=outdata
> rep=&ITER method=urs samprate=1 outhits;
> run;
>
> ods listing close;
> ods output ParameterEstimates=bout;
>
> proc logistic data=outdata;
> by replicate;
> model bad=&ivs;
> run;
>
> ods output close;
> ods listing;
>
>
> proc means data=bout mean min max std n nmiss;
> class variable;
> var estimate;
> output out=estimate_summary;
> run;
> %mend;
>
> %boot(20);
>
> Hope this is helpful,
>
> Dan
>

Nick,

I haven't got the time, space, or probably even the skill to adequately
explain bootstrapping, but I will try to briefly respond. I am sure that
David will be only too kind to correct me if I go to far astray. :-)

First the fact that I used a macro here was simply because I was responding
to what had been written. Unless I was going to try to create a much more
general boot macro with many parameters for flexibility (and I wouldn't
because it's already been done) I would just write the basic code here with
the number of replications hard coded.

I am oversimplifying here, but bootstrapping is based on the assumption that
your original sample is representative of the population from which it is
drawn. So sampling with replacement from your original sample will produce
a sample similar to what you could have gotten if you took a new sample from
the parent population. Now take many bootstrap samples and compute a
desired statistic, say the mean, on each one. Then you can empirically
estimate what the sampling distribution of the statistic is, rather than
assuming that the distribution is normal or some other distribution and
estimating the standard errors using standard formulas. Bootstrapping can
also be useful in those situations where you don't have an analytical
solution for the standard error of your statistic.

Here is a toy example logistic regression that you could play with.

**create sample data;
data test;
do i=1 to 100;
y=i GT 50;
x0=i+20*normal(1234);
x1=uniform(4321) > .5;
x2=normal(1234);
x3=normal(1234);
output;
end;
run;

/**run your initial logistic regression. It might be
instructive to compare the mean of the bootstrap sample
estimated coefficients (below) to the estimates here
**/
proc logistic data=test;
model y=x0 x1 x2 x3;
run;

/**create 20 bootstrap samples;
in real life you would probably want many more;
**/
proc surveyselect data=test out=outdata
rep=20 method=urs samprate=1 outhits;
run;

ods listing close;
ods output ParameterEstimates=bout;

/**run your analysis using BY processing;
the ODS output statement will collect 20 sets of
Regression coefficients into one dataset, bout;
**/
proc logistic data=outdata;
by replicate;
model y=x0 x1 x2 x3;
run;

ods output close;
ods listing;

/**compute the mean and Std.Dev. of the 20 regression coefficients
For each variable. The std.dev. is *an* estimate of the standard error
of estimate for the original regression coefficients. You might
then use these standard errors (or an empirically estimated confidence
interval) to assess whether your estimated coefficients are different
from zero
**/
proc means data=bout nway mean min max std n nmiss;
class variable;
var estimate;
output out=estimate_summary;
run;

I hope this description has not been too far off the mark. Do not go out
and try to bootstrap your own estimates using this partial, simplified
explanation. I haven't dealt with a whole host of issues, including but not
limited to things like bias estimation and whether you should be resampling
cases or residuals.

I hope this has been helpful for following this discussion thread,

Dan

Daniel J. Nordlund
Research and Data Analysis
Washington State Department of Social and Health Services
Olympia, WA 98504-5204
From: Jonas Bilenas on
On Wed, 10 May 2006 12:25:43 -0700, Nordlund, Dan (DSHS)
<NordlDJ(a)DSHS.WA.GOV> wrote:

>> -----Original Message-----
>> From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
Jonas
>> Bilenas
>> Sent: Wednesday, May 10, 2006 5:16 AM
>> To: SAS-L(a)LISTSERV.UGA.EDU
>> Subject: Re: jackknife concept
>>
>> On Tue, 9 May 2006 17:49:53 -0400, Luo, Peter <pluo(a)DRAFTNET.COM> wrote:
>>
>> >David, for what Jonas was trying to do, i.e. to get some 'error'
>> estimates
>> >for model predictors, is N sub-samples or N bootstrapping samples the
>> better
>> >method?
>> >
>> I modified the code a bit, based on suggestions from David. Similar but
>> different results:
>>
>> %macro boot(iter);
>>
>> proc surveyselect data=reg out=outdata
>> rep=&ITER method=urs samprate=1 outhits;
>> run;
>>
>> %do i=1 %to &iter;
>> ods listing close;
>> ods output ParameterEstimates=bout;
>> proc logistic data=outdata;
>> where replicate=&i;
>> model bad=&ivs;
>> run;
>> ods output close;
>>
>> proc transpose data=bout out=bt&i;
>> var estimate;
>> id variable;
>> run;
>> %if "&i" ne "1" %then %do;
>> proc append base=bt1 data=bt&i;
>> run;
>> %end;
>> %end;
>> ods listing;
>>
>>
>> proc means data=bt1 mean min max std n nmiss;
>> run;
>> %mend;
>>
>> %boot(20);
>
>Jonas,
>
>you haven't incorporated one of the most important suggestions that David
>made, which is to use BY processing in Proc Logistic. That will eliminate
>having to continually open and close the file of bootstrap samples, and the
>file will only have to be read through once. Remove the %DO loop and
>replace the where statement with a BY statement. You can also eliminate
the
>Proc Transpose and the Proc Append. Something like the following (I'm not
>sure where the macro variable &ivs is defined) :
>
>%macro boot(iter);
> proc surveyselect data=reg out=outdata
> rep=&ITER method=urs samprate=1 outhits;
> run;
>
> ods listing close;
> ods output ParameterEstimates=bout;
>
> proc logistic data=outdata;
> by replicate;
> model bad=&ivs;
> run;
>
> ods output close;
> ods listing;
>
>
> proc means data=bout mean min max std n nmiss;
> class variable;
> var estimate;
> output out=estimate_summary;
> run;
>%mend;
>
>%boot(20);
>
>Hope this is helpful,
>
>Dan
>
>Daniel J. Nordlund
>Research and Data Analysis
>Washington State Department of Social and Health Services
>Olympia, WA 98504-5204


Thanks. The first time I tried David's code I didn't get statistics on
variabel coefficients. This was helpful.

Jonas V. Bilenas
JP Morgan CHASE Bank
Decision Science
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4
Prev: calculate variance
Next: ODS Tagsets ExcelXP