jackknife concept [SAS]

Prev: calculate variance
Next: ODS Tagsets ExcelXP

From: David L Cassell on 9 May 2006 14:36

Peter Flom wrote:
> >>> Aparna <aparnasprasad(a)GMAIL.COM> 5/9/2006 8:08 am >>>
><<<
>hi. can anybody pls explain me the concept of Jackknife and Bootstrap?
>are they concerned with Regression alone? can this be used in
>prediction interval? thanks
> >>>
>
>This is a BIG topic, with a huge literature. Here is a VERY brief intro.
>Both the bootstrap and the jackknife and resampling methods. They have
>meany uses, but a lot of these center around finding variance estimates for
>statistics where there is no formula, or where the assumptions are
>violated. So, to answer your question: No, they are not limited to
>regression, and yes, they can be used to help with predictions (I am not
>sure what you mean by 'prediction interval', but if you mean 'predict some
>value and estimate a confidence interval', then yes, they can do that.
>
>I am under the impression that the bootstrap is now much more used than the
>jackknife, and also that the jackknife was a sort of 'poor mans bootstrap'
>that was used more when computers weren't so blazingly fast. The essential
>idea behind the bootstrap is more or less as follows:

I'm going to disagree a bit here. Yes, bootstrapping is apparently
used a lot more than jackknifing. Quenouille invented it in 1949,
when (obviously) the computational facilities were limited to hand-
crank machines unless you happened to be in with the ENAIC boys.
:-) Quenouille had a nonparametric estimate of bias, but no one
called it the jackknife until later.

I dont think of it as a poor man's bootstrap. They both do a
nonparametric linearization of functionals of the parameter estimates.
They both assume the data are i.i.d. ~ F, and that the data are
exchangeable. It has been proven that the jackknife variance
estimate has a slight bias (on the high side).

>Take a sample
>Now, resample from that sample, with replacement. Do this many times.
>Use these resamples to estimate parameters and variances.

Yes. Only don't do it the way the SAS bootstrapping macro code does it.
Ick. Use an appropriate tool to build all the samples at once, and then
use by-processing to do all the analyses in one pass. I complain about this
a lot, don't I?

>For a fairly readable, if somewhat dated, introduction, read
>
>Efron and Tibshirani An Introduction to the bootstrap.
>
>for a more recent, but much more technically demanding review, there is
>
>Davison and Hinkley : Bootstrap methods and their applications
>
>There are tons and tons of articles as well, and many other books. But E
>and T is the seminal book.

I like E & T.

For a technical but interesting (and short) book on the subject, try
Brad Efron's "The Jackknife, the Bootstrap, and Other Resampling Plans"
from SIAM Press.

David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

From: David L Cassell on 9 May 2006 14:47

Jonas Bilenas replied:
>I typically will use the bootstrap approach as opposed to hold out samples
>to validate my models. One rule is that if the coefficeints change sign,
>then that variable should be dropped. Here is an example using logistic
>regression. Variable selection (not using stepwise) was buit on entire
>sample. This will be featured in my new book I am working on for SAS
>Press, SAS Applications in Credit Industry.
>
>%macro bootstrap(mod_data,iter,);
> ods listing close;
>
> %do i = 1 %to &iter;
> ods output clear;
> ods output ParameterEstimates=b&i;
> proc logistic data=&mod_data;
> model bad=&ivs_trim;
> where ranuni(0)<=.9;
> run;quit;
> ods output close;
> run;
> proc transpose data=b&i out=bt&i;
> var estimate;
> id variable;
> run;
> %if "&i" ne "1" %then %do;
> proc append base=bt1 data=bt&i;
> run;
> %end;
> %end;
>
> ods listing;
> proc means data=bt1 mean min max std n nmiss;
> run;
>%mend;
>%bootstrap(reg1,20);
>
>Here is truncated OUTPUT:
>The MEANS Procedure
>
>
>Variable Mean Minimum Maximum
>Intercept 0.9560223 0.6456173 1.3784958
>tof24 0.6999331 0.5134410 0.8170087
>cd_util -0.4577382 -0.7089199 -0.2893133
>nhistd3 -0.2086835 -0.3138207 -0.0920624
>nocd 0.7812227 0.5508036 1.1057233
>nodel 0.4298646 0.3049216 0.5502467
>nonpromoinq -0.0532590 -0.0666753 -0.0292599
>ntrades1 0.0432712 0.0239419 0.0573544
>ntrades2 -0.1167981 -0.1399701 -0.0960097
>ntrades2_2 0.0024367 0.0016911 0.0031254
>average_hc_cd_p22 0.1913840 0.1463953 0.2747190

Jonas, I hate to be a pain in the kiester, but...

But I'm going to be one anyway. (Mah nishtanah hahlielah hazeh?
Why is this night different from any other? :-) :-) )

But what you have is a random holdout, but NOT a bootstrap in
the technical sense of the term. It also does not have the theoretical
support that a true bootstrap does.

Here's how I would do a bootstrap for your situation above
(note that I just whipped this up based on your code, and it is
untested).

proc surveyselect data=&MOD_DATA out=outdata
rep=&ITER method=urs samprate=1 outhits;
run;

ods output ParameterEstimates=bout;

proc logistic data=outdata;
by replicate;
model bad=&IVS_TRIM;
run;

ods output close;
ods listing;

proc means data=bout mean min max std n nmiss;
run;

Feel free to use as much or as little of my code as you want. If you
want to use SASFILE to speed up the PROC SURVEYSELECT, do that
as well.

David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330

_________________________________________________________________
Don?t just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/

From: "Luo, Peter" on 9 May 2006 17:49

David, for what Jonas was trying to do, i.e. to get some 'error' estimates
for model predictors, is N sub-samples or N bootstrapping samples the better
method?

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L(a)listserv.vt.edu] On Behalf Of David L
Cassell
Sent: Tuesday, May 09, 2006 2:47 PM
To: SAS-L(a)LISTSERV.VT.EDU
Subject: Re: jackknife concept

Jonas Bilenas replied:
>I typically will use the bootstrap approach as opposed to hold out samples
>to validate my models. One rule is that if the coefficeints change sign,
>then that variable should be dropped. Here is an example using logistic
>regression. Variable selection (not using stepwise) was buit on entire
>sample. This will be featured in my new book I am working on for SAS
>Press, SAS Applications in Credit Industry.
>
>%macro bootstrap(mod_data,iter,);
> ods listing close;
>
> %do i = 1 %to &iter;
> ods output clear;
> ods output ParameterEstimates=b&i;
> proc logistic data=&mod_data;
> model bad=&ivs_trim;
> where ranuni(0)<=.9;
> run;quit;
> ods output close;
> run;
> proc transpose data=b&i out=bt&i;
> var estimate;
> id variable;
> run;
> %if "&i" ne "1" %then %do;
> proc append base=bt1 data=bt&i;
> run;
> %end;
> %end;
>
> ods listing;
> proc means data=bt1 mean min max std n nmiss;
> run;
>%mend;
>%bootstrap(reg1,20);
>
>Here is truncated OUTPUT:
>The MEANS Procedure
>
>
>Variable Mean Minimum Maximum
>Intercept 0.9560223 0.6456173 1.3784958
>tof24 0.6999331 0.5134410 0.8170087
>cd_util -0.4577382 -0.7089199 -0.2893133
>nhistd3 -0.2086835 -0.3138207 -0.0920624
>nocd 0.7812227 0.5508036 1.1057233
>nodel 0.4298646 0.3049216 0.5502467
>nonpromoinq -0.0532590 -0.0666753 -0.0292599
>ntrades1 0.0432712 0.0239419 0.0573544
>ntrades2 -0.1167981 -0.1399701 -0.0960097
>ntrades2_2 0.0024367 0.0016911 0.0031254
>average_hc_cd_p22 0.1913840 0.1463953 0.2747190

Jonas, I hate to be a pain in the kiester, but...

But I'm going to be one anyway. (Mah nishtanah hahlielah hazeh?
Why is this night different from any other? :-) :-) )

But what you have is a random holdout, but NOT a bootstrap in
the technical sense of the term. It also does not have the theoretical
support that a true bootstrap does.

Here's how I would do a bootstrap for your situation above
(note that I just whipped this up based on your code, and it is
untested).

proc surveyselect data=&MOD_DATA out=outdata
rep=&ITER method=urs samprate=1 outhits;
run;

ods output ParameterEstimates=bout;

proc logistic data=outdata;
by replicate;
model bad=&IVS_TRIM;
run;

ods output close;
ods listing;

proc means data=bout mean min max std n nmiss;
run;

Feel free to use as much or as little of my code as you want. If you
want to use SASFILE to speed up the PROC SURVEYSELECT, do that
as well.

David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330

_________________________________________________________________
Don't just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/

From: Jonas Bilenas on 10 May 2006 08:22

I also created a stepwise boot strap code to show how unstable stepwise
regression is. Here is the code for logistic regression. You can modify
the macro to include data set parm.

%macro bootsw(iter);

proc surveyselect data=fff.data_gb out=outdata
rep=&ITER method=urs samprate=1 outhits;
run;

%do i=1 %to &iter;
ods listing close;
ods output OddsRatios=vars;
proc logistic data=outdata;
where replicate=&i;
model bad=&ivs/sle=.05 sls=.05 slection=stepwise;
run;
ods output close;

%if "&i" = "1" %then %do;
data sw;
set vars;
run;
%end;
%else %do;
proc append base=sw data=vars;
run;
%end;

%end;
%mend;

%bootsw(20);
ods listing;

proc freq data=sw;
tables effect;
run;

From: Jonas Bilenas on 10 May 2006 08:16

On Tue, 9 May 2006 17:49:53 -0400, Luo, Peter <pluo(a)DRAFTNET.COM> wrote:

>David, for what Jonas was trying to do, i.e. to get some 'error' estimates
>for model predictors, is N sub-samples or N bootstrapping samples the
better
>method?
>
I modified the code a bit, based on suggestions from David. Similar but
different results:

%macro boot(iter);

proc surveyselect data=reg out=outdata
rep=&ITER method=urs samprate=1 outhits;
run;

%do i=1 %to &iter;
ods listing close;
ods output ParameterEstimates=bout;
proc logistic data=outdata;
where replicate=&i;
model bad=&ivs;
run;
ods output close;

proc transpose data=bout out=bt&i;
var estimate;
id variable;
run;
%if "&i" ne "1" %then %do;
proc append base=bt1 data=bt&i;
run;
%end;
%end;
ods listing;

proc means data=bt1 mean min max std n nmiss;
run;
%mend;

%boot(20);

First | Prev | Next | Last
Pages: 1 2 3 4
Prev: calculate variance
Next: ODS Tagsets ExcelXP