From: TripleK on
All,

Can we merge a dataset with a veiw and create a permanent dataset?
Please see the code below . Any any help would be appreciated.

Thanks
Kiran.

rsubmit;
data dispositions_2010;
set mad_p.dispositions_2010(where=(le_leadid ne . and
CI_CONTACTTYPE_DESC ne 'Account Review' and CHI_STORE_NUM in
(9550,9551) ));
run;

proc sql;
create view inserts_2010 as
select le_leadid, le_leadinsertdate, bu_bucketdesc, le_housingstat,
RE_TIER_DESC,
AUTO_TIER_DESC, LE_BRANCHNUM ,le_source
from mad_P.inserts_2010
order by le_leadid, le_leadinsertdate;

select * from inserts_2010;
quit;

data save.dispositions;
merge dispositions_2010(in=a) inserts_2010(in=b);
by le_leadid le_leadinsertdate;
if a;

run;
endrsubmit;
From: Patrick on
Yes, you can. I do it all the time.

Your code looks right to me. Is something not working?

In case "dispositions_2010" isn't needed later on as work table then
you could also use a view here - and merge the 2 views.

From: TripleK on
On Aug 5, 3:46 pm, Patrick <patrick.mat...(a)gmx.ch> wrote:
> Yes, you can. I do it all the time.
>
> Your code looks right to me. Is something not working?
>
> In case "dispositions_2010" isn't needed later on as work table then
> you could also use a view here - and merge the 2 views.


Patrick, Thanks for your response. The last step, writing into
permanate dataset is taking more time (not completeing). I submited
the job at 3 hours earlier still not completed. But when I am a
merging with datasets it used to take 2 hours. The dispostions have
40k records with 50 variables and inserts have 50 million records with
7 variables. So merging with views take more time?

-Kiran.
From: Patrick on
Hi Kiran

A SQL view is in a way nothing else than some encapsulated SQL code
which gets executed when you call the view.

The good thing about a view is that it's executed in memory which
reduces disk I\O - so it should actually perform better than first
writing a table to disk and then reading this table again for a SAS
merge.

The problem you could have:
As these are 50M observations you might not have enough memory to
execute the view - and then the memory has to write and read it's
content to disk (paging). This can result in a performance which is
worse than creating a table first.

Just some thoughts:
- In case these 2 tables are in a DB then you might be better off to
formulate this join as a SQL left join (else the whole 50M rows get
first loaded into SAS). If only the "inserts" table is in a DB then
upload first your small table into the DB and then do the join there.

- The resource intensive step is sorting these 50M records. It might
be worth to create an index over your join keys (on both tables) and
then use this index for joining (there are examples in the SAS doc of
how to do this).

- Another way could be (if both tables in SAS): In a data step you
create a hash table out of the small table "dispositions_2010", you
iterate through the big table "inserts" as is (no pre-sorting, set
inserts(keep=<keys><additional wanted vars>);), you only write the
records to the output table where there was a key match with the
lookup table (rc=h.find(); output if rc=0).
In case you're not 100% sure that there will be for all records in the
hash table a matching record in the inserts table then you might have
to re-join the original table "dispositions_2010" with the result
table (which should be in the same structure like "inserts" plus the
addtional variables and may be some missing records -> a SQL
<dispositions_2010> union corr <result table> should fix this).

HTH
Patrick


From: Patrick on
For the hash aproach:
As there are 50 vars in "dispositions_2010" I actually think it would
be better to only load the keys of "dispositions_2010" into a hash,
and then use rc=h.check().
The result table could then be used in a SQL join with the original
"dispositions_2010" -> dispositions_2010 as l left join <result table>
as r on l.le_leadid= r.le_leadid and
l.le_leadinsertdate=r.le_leadinsertdate