From: renxue on
I have a dataset like this (more than one patient_id);
patient_id start_date_time stop_date_time
11 1/1/2008 6/1/2008
11 3/1/2008 4/1/2008
11 4/1/2008 10/1/2008
11 6/1/2008 10/1/2008
11 12/1/2008 1/20/2008
11 15/1/2008 22/1/2008
11 24/1/2008 27/1/2008

I want to sum the days of all date ranges for every patient_id .
That means :stop_date_time - start_date_time;
howerver there are many date ranges overlaped,I only want to sum them
once.
for example you can see the above date cover dates below:
2008/01/01-2008/01/10
2008/01/12-2008/01/22
2008/01/24-2008/01/27
then the sum equation was 9+10+3=22
I want to get the result of 22 ,then how to write the program when
the date ranges are overlaped ?
From: Arthur Tabachneck on
Zijun,

There are a number of ways to solve the problem. Take a look at:
http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0106A&L=sas-l&D=1&H=0&O=D&T=1&P=38600
and
http://www2.sas.com/proceedings/sugi29/260-29.pdf

The following is based on code that Ian Whitlock suggested in the
first of those links:

data have;
input patient_id
(start_date_time stop_date_time) (ddmmyy10.);
format start_date_time stop_date_time date9.;
cards;
11 1/1/2008 6/1/2008
11 3/1/2008 4/1/2008
11 4/1/2008 10/1/2008
11 6/1/2008 10/1/2008
11 12/1/2008 20/1/2008
11 15/1/2008 22/1/2008
11 24/1/2008 27/1/2008
12 1/1/2008 6/1/2008
12 3/1/2008 4/1/2008
12 4/1/2008 10/1/2008
12 6/1/2008 10/1/2008
12 12/1/2008 20/1/2008
12 15/1/2008 22/1/2008
12 24/1/2008 28/1/2008
;

proc sort data = have;
by patient_id
start_date_time
descending stop_date_time;
run;

data want ( keep = patient_id curstart curend) ;
retain curstart curend;
format curstart curend date9.;
set have;
by patient_id;
if first.patient_id then do;
days=0;
curstart = start_date_time;
curend = stop_date_time;
end;
if stop_date_time < stop_date_time then error ;
if stop_date_time > curend
and curstart <= start_date_time <= curend + 1 then
curend = stop_date_time;
if start_date_time > curend then do;
output;
curstart = start_date_time;
curend = stop_date_time;
end;
if last.patient_id then output;
run;

data want;
set want;
by patient_id;
retain days;
if first.patient_id then days=curend-curstart;
else days=sum(days,curend-curstart);
run;

HTH,
Art
---------------
On Aug 2, 6:35 am, renxue <zijun2...(a)gmail.com> wrote:
> I have a  dataset like this (more than one patient_id);
> patient_id  start_date_time stop_date_time
> 11             1/1/2008  6/1/2008
> 11             3/1/2008  4/1/2008
> 11             4/1/2008  10/1/2008
> 11             6/1/2008  10/1/2008
> 11             12/1/2008 1/20/2008
> 11             15/1/2008  22/1/2008
> 11             24/1/2008  27/1/2008
>
> I want to sum the days of all date ranges  for every patient_id .
> That means :stop_date_time  -  start_date_time;
> howerver there are many date ranges overlaped,I only want to  sum them
> once.
> for example you can see the above date cover dates below:
> 2008/01/01-2008/01/10
> 2008/01/12-2008/01/22
> 2008/01/24-2008/01/27
> then the sum equation was  9+10+3=22
> I want to get the result of 22 ,then  how to write the program when
> the date ranges are overlaped ?

From: Lou on

"renxue" <zijun2000(a)gmail.com> wrote in message
news:8e9e2393-60b9-473a-a33b-1c2265357f0f(a)m35g2000prn.googlegroups.com...
>I have a dataset like this (more than one patient_id);
> patient_id start_date_time stop_date_time
> 11 1/1/2008 6/1/2008
> 11 3/1/2008 4/1/2008
> 11 4/1/2008 10/1/2008
> 11 6/1/2008 10/1/2008
> 11 12/1/2008 1/20/2008
> 11 15/1/2008 22/1/2008
> 11 24/1/2008 27/1/2008
>
> I want to sum the days of all date ranges for every patient_id .
> That means :stop_date_time - start_date_time;
> howerver there are many date ranges overlaped,I only want to sum them
> once.
> for example you can see the above date cover dates below:
> 2008/01/01-2008/01/10
> 2008/01/12-2008/01/22
> 2008/01/24-2008/01/27
> then the sum equation was 9+10+3=22
> I want to get the result of 22 ,then how to write the program when
> the date ranges are overlaped ?

A question - why are you counting 9 days for 2008/01/01 - 2008/01/10 instead
of 10, and similarly for the other intervals?

Also, there's an anamalous line in your posted data: 12/1/2008 1/20/2008

All the other dates look like they're in dd/mm/yyyy format except this one
end date. Assuming 1/20/2008 should be 20/1/2008, I'd probably attack it
something like this.
data fee;
input @1 id 2. @4 start ddmmyy10. @14 end ddmmyy10.;
do date = start to end;
output;
end;
cards;
11 1/1/2008 6/1/2008
11 3/1/2008 4/1/2008
11 4/1/2008 10/1/2008
11 6/1/2008 10/1/2008
11 12/1/2008 20/1/2008
11 15/1/2008 22/1/2008
11 24/1/2008 27/1/2008
;
proc sort data = fee nodupkey;
by id date;
proc means data = fee noprint;
by id;
var date;
output out = fie
n = n;
quit;

Which incidentally, gives us a total of 25 days for id 11. If you really
don't want to count the first day of the interval, use "do date = start + 1
to end;" instead.