From: Steve_Bates on
Hi All SASers,

I have a problem creating relationship periods for couples on our
database that I simply cannot solve. Due to the vagaries of our social
welfare database we have real problems ascertaining accurate
relationship periods between two clients.

The front end software only records a partner if the partner is
present when a benefit period is canceled. There are also other issues
that come into play but these dont affect the problem I'm dealing
with. Bottom line is I need to get both clients and partners
relationship history in order to work out the correct relationship
period(s)

What I want to be able to do is extract accurate relationship periods
using SAS code. I've prepared some test code to illustrate below:

data relationships ;
length swn pnswn client_start client_end 8 client partner $12 ;
infile datalines delimiter=',';
input swn client $ pnswn partner $ client_start client_end ;
informat client_start client_end date9. ;
format client_start client_end date9. ;
datalines;
111,John Doe,222,Jane Citizen,20APR2004,11NOV2005
111,John Doe,222,Jane Citizen,24AUG2006,14JUL2008
222,Jane Citizen,111,John Doe,19JUL2004,11NOV2005
222,Jane Citizen,111,John Doe,24AUG2006,14JUL2008
222,Jane Citizen,111,John Doe,01JAN2010,20FEB2010
222,Jane Citizen,111,John Doe,05MAR2010,20MAR2010
;

SWN is the social welfare number of the client and PNSWN is the social
welfare number of the partner at cancellation. From the example above
I would want to extract the following relationship periods:

19JUL2004 - 11NOV2005
24AUG2006 - 14JUL2008
01JAN2010 - 20FEB2010
05MAR2010 - 20MAR2010

In other words if there is overlap between the two periods I want to
extract the overlap period only but if there is no overlap then the
whole period needs to be extracted (as seen in the last two obs).

The situation in the last two obs occurs when one client has a partner
included and excluded within one benefit type ie is on single rate
Sickness Benefit, has partner included and then excluded while still
on Sickness Benefit. When the benefit cancels there is no partner ergo
the pnswn is not recorded. The above data is extracted from benefit
data.

Clients can (and do) have multiple periods in and out of relationships
and these differ greatly so any solution needs to account for this.

I've tried proc transposes, arrays you name it but have been unable to
crack it. Any help much appreciated as its driving me crazy!

warm regards
Steve
From: Patrick on
Hi Steve

I believe the following code does what you're looking for (but some
more test cases wouldn't hurt...).

data have ;
length id swn pnswn client_start client_end 8 client partner
$12 ;
infile datalines delimiter=',';
input swn client $ pnswn partner $ client_start client_end ;
informat client_start client_end date9. ;
format client_start client_end date9. ;
id=_n_;
datalines;
111,John Doe,222,Jane Citizen,20APR2004,11NOV2005
111,John Doe,222,Jane Citizen,24AUG2006,14JUL2008
222,Jane Citizen,111,John Doe,19JUL2004,11NOV2005
222,Jane Citizen,111,John Doe,24AUG2006,14JUL2008
222,Jane Citizen,111,John Doe,01JAN2010,20FEB2010
222,Jane Citizen,111,John Doe,05MAR2010,20MAR2010
;
run;

proc sql;
create view V_OverlappingObs as
select m.id,
max(p.client_start,m.client_start) as Period_Start
format=date9.,
min(p.client_end,m.client_end) as Period_End format=date9.
from have as M,have as P
where m.swn=p.pnswn and (P.client_start <= M.client_end and
P.client_end >= M.client_start)
;
create table want as
select l.*,
coalesce(r.Period_Start,l.client_start) as Period_Start
format=date9.,
coalesce(r.Period_End,l.client_end) as Period_End
format=date9.
from have as l left join V_OverlappingObs as r
on l.id=r.id
;
quit;

proc print data=want ;
run;


HTH
Patrick
From: Steve_Bates on
Hi Patrick,

Thank you very much for this, really appreciated. I'll throw a heap of
real clients at it my end this morning and let you know the results.
Initial test looks great. I thought the answer lay in SQL as it's used
for a lot of similar problems but I dont have the expertise to get my
particular problem solved.

Will post later today, thanks again.
cheers
Steve
From: Steve_Bates on
Hi,

Tested the code and it worked perfect thanks although I got the logoc
slightly wrong. Since it picks up the partner at cancellation the end
date actually had to be the latest (and not strictly the overlap
period). I modified your code and it seems to work. I'm out of the
office till Monday now but I will post the code then. I'm pretty sure
you could do it more elegantly but all the tests I threw at it
appeared to work.

Thanks again for your help in this, I had never heard of the coalesce
function in SQL so will do some research into this.

cheers
Steve
From: Steve_Bates on
Hi Patrick,

Sorry for the late response but I have been on sick leave this week. I
modified your code as follows to pick up the latest date, it seems to
work with the tests I've thrown against it so far and I've had other
analysts throw examples at it as well - do you agree? I've used an
example which gives the correct dates with my adaptation. Again this
was due to me getting the logic incorrect initially!

data have ;
length id swn pnswn client_start client_end 8 client partner
$12 ;
infile datalines delimiter=',';
input swn client $ pnswn partner $ client_start client_end ;
informat client_start client_end date9. ;
format client_start client_end date9. ;
id=_n_;
datalines;
111,John Doe,222,Jane Citizen,11Feb1991,12Dec1998
111,John Doe,222,Jane Citizen,14Dec1998,24Sep2001
111,John Doe,222,Jane Citizen,05Nov2001,07Jul2002
111,John Doe,222,Jane Citizen,06Dec2002,02Jul2003
222,Jane Citizen,111,John Doe,11Feb1991,12Dec1998
222,Jane Citizen,111,John Doe,14Dec1998,24Sep2001
222,Jane Citizen,111,John Doe,05Nov2001,21Aug2004
;
run ;

proc sql;
create table V_OverlappingObs as
select m.id,
max(p.client_start,m.client_start) as Period_Start
format=ddmmyy10.,
max(p.client_end,m.client_end) as Period_End format=ddmmyy10.
from have as M,have as P
where m.swn=p.pnswn and (P.client_start <= M.client_end and
P.client_end >= M.client_start)

order by period_end, period_start ;
quit ;

data V_OverlappingObs ;
set V_OverlappingObs ;
by period_end period_start ;
if first.period_end ;
run ;

proc sql ;
create table my_version as
select l.*,
coalesce(r.Period_Start,l.client_start) as Period_Start
format=ddmmyy10.,
coalesce(r.Period_End,l.client_end) as Period_End
format=ddmmyy10.
from have as l inner join V_OverlappingObs as r
on l.id=r.id
order by period_end, period_start
;
quit;


cheers
Steve