From: hewei2004 on
Hi,

Suppose that I have the following file

datasource brand quarter sale
1 toyota 1977q1 15
1 toyota 1977q2 25
1 toyota 1977q3 35
1 ford 1977q1 45
1 ford 1977q2 55
1 ford 1977q3 65
2 toyota 1977q1 15
2 toyota 1977q2 25
2 toyota 1977q3 99
3 toyota 1977q3 99

I want to create a file that merges the information in datasource=1
and datasource=2 and datasource=3, and create a variable called
conflict_flag when datasource1 or datasource2 or datasource3
disagrees.


note that datasource=1 agrees with datasource=2 agrees on 1977q1 and
1977q2, thus conflict_flag=0
note that datasource=1 agrees with datasource=2 disagrees on 1977q3,
thus conflict_flag=1

The desired output is:
conflict_flag brand quarter sale
0 toyota 1977q1 15
0 toyota 1977q2 25
1 toyota 1977q3 [mode of (35, 99, 99 ), which is 99 of
course]
0 ford 1977q1 45
0 ford 1977q2 55
0 ford 1977q3 65
From: Arthur Tabachneck on
Hewei,

What if there are multiple modes?

Art
-------------
On Jul 7, 5:08 pm, hewei2004 <hewei2...(a)gmail.com> wrote:
> Hi,
>
>   Suppose that I have the following file
>
> datasource brand quarter  sale
> 1               toyota 1977q1 15
> 1               toyota 1977q2 25
> 1               toyota 1977q3 35
> 1               ford 1977q1 45
> 1               ford 1977q2 55
> 1               ford 1977q3 65
> 2               toyota 1977q1 15
> 2               toyota 1977q2 25
> 2               toyota 1977q3 99
> 3               toyota 1977q3 99
>
> I want to create a file that merges the information in datasource=1
> and datasource=2 and datasource=3, and create a variable called
> conflict_flag when datasource1 or datasource2 or datasource3
> disagrees.
>
> note that datasource=1 agrees with datasource=2 agrees on 1977q1 and
> 1977q2, thus conflict_flag=0
> note that datasource=1 agrees with datasource=2 disagrees on 1977q3,
> thus conflict_flag=1
>
> The desired output is:
> conflict_flag brand quarter  sale
> 0               toyota 1977q1 15
> 0               toyota 1977q2 25
> 1               toyota 1977q3 [mode of (35, 99, 99 ), which is 99 of
> course]
> 0               ford 1977q1 45
> 0               ford 1977q2 55
> 0               ford 1977q3 65

From: hewei2004 on
On Jul 7, 5:44 pm, Arthur Tabachneck <art...(a)netscape.net> wrote:
> Hewei,
>
> What if there are multiple modes?
>
> Art
> -------------

Then use the smallest of the mode please.
Thank you.
From: Arthur Tabachneck on
Hewei,

There is probably a much easier and straight forward way of doing what
you want but, since no one else has responded, I'll offer the
following possible solution:

data have;
input datasource brand $ quarter $ sale;
cards;
1 toyota 1977q1 15
1 toyota 1977q2 25
1 toyota 1977q3 35
1 ford 1977q1 45
1 ford 1977q2 55
1 ford 1977q3 65
2 toyota 1977q1 15
2 toyota 1977q2 25
2 toyota 1977q3 99
3 toyota 1977q3 99
;

proc sort data=have;
by brand quarter;
run;

proc freq data=have noprint;
tables sale/out=mode (
rename=(sale=want_sale)
drop=percent);
by brand quarter;
run;

proc sort data=mode;
by brand quarter descending count want_sale;
run;

proc sort data=mode nodupkey;
by brand quarter;
run;

data have1 have2 have3;
set have;
if datasource eq 1 then output have1;
else if datasource eq 2 then output have2;
else output have3;
run;

data want;
merge have1 (rename=(sale=sale1))
have2 (rename=(sale=sale2))
have3 (rename=(sale=sale3));
by brand quarter;
run;

data want;
merge want mode;
by brand quarter;
run;

data want (drop=aa: count i sale: datasource);
array aa_sales(3);
array a_sales(*) sale1-sale3;
set want;
do i=1 to dim(a_sales);
aa_sales(i)=a_sales(i);
end;
call sortn (of aa_sales[*]);
if mean(of sale:) ne aa_sales(dim(a_sales)) then
conflict_flag=1;
else conflict_flag=0;
run;

HTH,
Art
-----------
On Jul 7, 8:00 pm, hewei2004 <hewei2...(a)gmail.com> wrote:
> On Jul 7, 5:44 pm, Arthur Tabachneck <art...(a)netscape.net> wrote:
>
> > Hewei,
>
> > What if there are multiple modes?
>
> > Art
> > -------------
>
> Then use the smallest of the mode please.
> Thank you.

From: Richard A. DeVenezia on
On Jul 7, 5:08 pm, hewei2004 <hewei2...(a)gmail.com> wrote:
> Hi,
>
>   Suppose that I have the following file
>
> datasource brand quarter  sale
> 1               toyota 1977q1 15
> 1               toyota 1977q2 25
> 1               toyota 1977q3 35
> 1               ford 1977q1 45
> 1               ford 1977q2 55
> 1               ford 1977q3 65
> 2               toyota 1977q1 15
> 2               toyota 1977q2 25
> 2               toyota 1977q3 99
> 3               toyota 1977q3 99
>
> I want to create a file that merges the information in datasource=1
> and datasource=2 and datasource=3, and create a variable called
> conflict_flag when datasource1 or datasource2 or datasource3
> disagrees.
>
> note that datasource=1 agrees with datasource=2 agrees on 1977q1 and
> 1977q2, thus conflict_flag=0
> note that datasource=1 agrees with datasource=2 disagrees on 1977q3,
> thus conflict_flag=1
>
> The desired output is:
> conflict_flag brand quarter  sale
> 0               toyota 1977q1 15
> 0               toyota 1977q2 25
> 1               toyota 1977q3 [mode of (35, 99, 99 ), which is 99 of
> course]
> 0               ford 1977q1 45
> 0               ford 1977q2 55
> 0               ford 1977q3 65

A self MERGE with partitioning WHERE= options can be used to perform
the conflict discovery.
Note that there is not a MODE function.

This sample uses explicit logic coding to compute the MODE of three
values.
----------
data have;
input
datasource brand $ quarter $ sale ;
datalines;
1 toyota 1977q1 15
1 toyota 1977q2 25
1 toyota 1977q3 35
1 ford 1977q1 45
1 ford 1977q2 55
1 ford 1977q3 65
2 toyota 1977q1 15
2 toyota 1977q2 25
2 toyota 1977q3 99
3 toyota 1977q3 99
run;

proc sort data=have;
by datasource brand quarter;
run;

data compare;
merge
have (where=(datasource=1) rename=(sale=sale1) in = in1)
have (where=(datasource=2) rename=(sale=sale2) in = in2)
have (where=(datasource=3) rename=(sale=sale3) in = in3)
;
by brand quarter;

if min(of sale1-sale3) = max(of sale1-sale3) then
sale = min(of sale1-sale3);
else
if in1 & in2 & in3 then do;
* 3-item mode, explicit cases;
if sale1 = sale2 then
sale = sale1;
else
if sale1 = sale3 then
sale = sale1;
else
if sale2 = sale3 then
sale = sale2;
else
sale = min (of sale1-sale3);

conflict_flag = 1;
end;
else do;
sale = min (of sale1-sale3);
conflict_flag = 1;
end;

drop datasource sale1-sale3;
run;
----------


Richard A. DeVenezia
http://www.devenezia.com