From: bnz6 on
Jack,

I may not have a clear understand of what you are asking or of what you
are attempting to do, but it looks like SAS is working exactly as
designed here.

Looking at the code you submitted, you are requesting that SAS perform a
stratified (on ClientID) simple random sample of size 1 without
replacement from your dataset, with 2 replicates drawn independently.
This is exactly what you are given in your output dataset.

The code you submitted provides a seed value of 1234567890, which will
ALWAYS produce the same results over and over again on this dataset.
This explains why you ALWAYS get 2 ctrlid's for the last client ID. The
replication selection just happens to select ZA8A9ABAA twice (for
replicates 1 and 2) and will never change, unless you change the seed
value.

If you change the seed value, you may or may not get ZA8A9ABAA twice.

I hope this helps.

Sincerely yours,

Mark J. Lamias
SAIC Statistical Consultant
Office of Informatics
National Center for Preparedness, Detection, and Control of Infectious
Diseases
Coordinating Center for Infectious Diseases
US Centers for Disease Control and Prevention
w: (404) 639-0747
m: (404) 543-1394
f: (404) 639-1391

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
waterleaf sas
Sent: Wednesday, January 03, 2007 12:27 PM
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: proc surveyselect problem

please help me on the following code.

I use SRS under proc surveyselect with rep = 2, however, the last
clientid
(15352) always select the duplicate ctrlid. the weird thing is: if I
delect
all records of clientid = 15053 (or other), it works fine.

Thanks
Jack


*

data
*tmp;

input
ClientID CTRLID $*9*-* 18* ;

cards
;

1010 ZBABAZZ77

1010 B9ZA0A80Z

1010 B9AA493Z6

1010 B900903A3

1010 B94096638

1021 4Z6394ZZ7

1021 ZB87AB849

1021 B94A4B864

1021 B9BA80373

1021 B94098AA3

1021 B89A0A93B

1021 B94A49430

1021 B9ZA848AA

1021 B93ZZ6770

1021 B94303607

1021 ZZA6697Z8

1021 B9AZ0A736

1021 B93BZ0604

1021 B900940BZ

1021 B9094Z8A9

1021 B90Z89Z3Z

7227 B9BA4A0Z9

7227 B906ZAZ48

7227 B9B0A9693

7227 Z6489BB0Z

7227 Z6Z934983

7227 B93A099Z0

7227 Z6Z8BAZ37

7227 4604774BZ

7227 34B8ZB34

7227 B9Z64066A

7227 B9A07BB43

14244 Z64897089

14244 B9409744A

14244 B93700A09

14244 AB8749A9B

14244 Z6B93Z047

14553 B94Z8B3BB

14553 ZBAB9AZ36

14553 B89A6A0Z7

14553 B9A48A936

14553 33B7Z4498

14553 B9046Z780

14553 B9BA8Z708

14674 B9BZ063ZZ

14674 B9BA4BAB9

14674 B9444ZA04

14674 B8940B806

14674 B9ZZZ06ZZ

14674 B9A364986

14674 B90B8ZZ6B

14674 B9B36Z848

14778 ZB7434A67

14778 B903807Z6

14778 B9AA477BA

14821 Z6B7BBB47

14821 Z6B693979

14821 B90A6Z088

14821 Z6673Z94A

15053 B940B3BAB

15053 Z6A93B49B

15053 Z679BA986

15352 ZA8A9ABAA

15352 B933Z047B

;
*

run
*;


*

PROC
**SORT*;

BY
clientid;*

RUN
*;*

PROC
**SURVEYSELECT* DATA = tmp

SAMPSIZE=
*1*

METHOD = SRS REP =
*2*

SEED =
*1234567890*

OUT=SELECT_tmp;

STRATA clientid;*

RUN
*;*

proc
**freq* ;

tables
ctrlid;*

run
*;
From: David L Cassell on
waterleaf.sas(a)GMAIL.COM wrote:
>
>please help me on the following code.
>
>I use SRS under proc surveyselect with rep = 2, however, the last
>clientid
>(15352) always select the duplicate ctrlid. the weird thing is: if I delect
>all records of clientid = 15053 (or other), it works fine.
>
>Thanks
>Jack
>
>
>*
>
>data
>*tmp;
>
>input
>ClientID CTRLID $*9*-* 18* ;
>
>cards
>;

[DATASET ELIDED BY ME]

>;
>*
>
>run
>*;
>
>
>*
>
>PROC
>**SORT*;
>
>BY
>clientid;*
>
>RUN
>*;*
>
>PROC
>**SURVEYSELECT* DATA = tmp
>
>SAMPSIZE=
>*1*
>
>METHOD = SRS REP =
>*2*
>
>SEED =
>*1234567890*
>
>OUT=SELECT_tmp;
>
>STRATA clientid;*
>
>RUN
>*;*
>
>proc
>**freq* ;
>
>tables
>ctrlid;*
>
>run
>*;

Perhaps, if you explain what it is that you are REALLY trying to
accomplish, someone here could help you more. As things are,
the proc is doing exactly what you told it to do.

Why do you want 2 reps?

Why do you want only 1 sample per clientid? That seems like
a really bad sample design.

What sort of sample are you really trying to achieve?

And are you needing to worry about the fact that multiple reps
may or may not give you the same point multiple times, since
the selection process is independent on each replication? It
seems like you were not expecting this to happen.

HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330

_________________________________________________________________
Dave vs. Carl: The Insignificant Championship Series. Who will win?
http://clk.atdmt.com/MSN/go/msnnkwsp0070000001msn/direct/01/?href=http://davevscarl.spaces.live.com/?icid=T001MSN38C07001