From: David Johnson on
Actually Dave, I'm not convinced that it isn't a SAS problem.

I have a track open with SAS on a similar issue involving a rename from
the restructuring of a small data set. It is one within some hundreds of
data sets created and modified in a work library as part of 46 programs
included in a batch sequence.

At irregular times, the "rename of temporary member" message comes up, SAS
goes into syntax checking mode and sets Obs to 0, the batch manager
detects an error and terminates the sequence.

It doesn't appear to be the same place twice, it isn't practical to
replace every data step with a new output table followed by a delete step,
and it isn't an issue with space or permissions. So most of the usual
diagnoses are irrelevant.

Synchio is turned on, and the tables use V7 compliant naming and
structure, so V6.12 library definition is out as well.

It has been plaguing us for months, and seems very similar to the issue
described here by Curtis, and similar issues by other correspondents for
quite some time.

The difference is that the work directory is on a virtual drive created on
a Raid 5 array in a high end workstation. Yes, I know Raid slows
performance on work libraries, and it isn't my choice, but it's the way
this machine has been built, and I don't have the option to change it to
JBOD.

The core issue seems to be: the V8 engine, when talking to a Network
drive, a NAS drive or a Raid array is expecting a process to be finished
before it has physically completed.

How many times have we had to code delays into programs to deal with OS
response times? I have a hunch this is similar, and the SAS V8 engine is
expecting something Windows architecture cannot always deliver.


Incidentally, since it is irregular, and since it is a batch process with
small included code objects, I am looking at the batch manager
resubmitting the same code block if it fails with an error of this type.
Now I only need to be able to reset the SAS error flags. Unfortunately,
since I can't predict when the error will occur, it is going to be some
time before I will know if the changes to the batch manager work.

Kind regards

David




On Fri, 14 Jan 2005 14:56:18 -0800, David L. Cassell
<cassell.david(a)EPAMAIL.EPA.GOV> wrote:

>Curtis Amick <curtis(a)SC.RR.COM> wrote:
>> Got a difficult problem here. Recently my company upgraded network
>storage
>> to an EMC NAS (Network Attached Storage), from a non-NAS system. Now,
>those
>> of us who store SAS data sets on the network are encountering a
>serious
>> problem. When updating data sets, sometimes (rarely) those data sets
>will be
>> deleted. The error message looks like:
>> "ERROR: Rename of temporary member for (data set name) failed. File
>may
>> be found in a directory (your directory)"and the permanent data set
>is
>> gone.
>>
>> This happens randomly, and (apparently) only when the data set already
>> exists. That is, when doing like this:
>>
>> DATA NETDRIVE.DATASET; SET DATASET2; RUN; If netdrive.dataset
>already
>> exists (it's being "updated" by work.dataset2), then this error
>*might*
>> occur. If netdrive.dataset does not yet exist (it's being created by
>> work.dataset2), then problem will not occur.
>>
>> From SI Tech Support: They've seen this before (see SAS NOTE 005781,
>link
>> here: http://support.sas.com/techsup/unotes/SN/005/005781.html ), but
>can't
>> fix it because (according to TS rep) once SAS wants to write to NAS,
>they
>> "hand it off" to the network. And that's when the problem occurs.
>>
>> Here's what I think: When SAS updates a data set, it creates a
>temporary
>> data set to work on, keeping the original intact. When the step ends,
>(think
>> PROC SORT DATA=ND.dataset; RUN; (this killed me on Saturday. Had a
>macro
>> that sorted 20+ data sets, and lost 4!!! of them.)) the original data
>set is
>> over-written by the temp, taking on the name of the original. And I'm
>> thinking it's during that writing/re-naming process that the storage
>system
>> is losing our data sets. (SI calls it a "timing issue"). Doesn't
>happen when
>> working on local drives, and, like I mentioned earlier, hasn't
>happened yet
>> when *creating* permanent data sets; only when updating.
>>
>> Some suggestions (from SITS): change engines (v8, v612) (doesn't work,
>not
>> feasible), use -SYNCHIO (have tried it; doesn't seem to help), remove
>SAS
>> data sets from on-line virus scanning in the NAS (our IS dept is leery
>of
>> that one). Personally, I'd like to go back to previous storage
>(non-NAS, IS
>> dept isn't thrilled with that one).
>>
>> Probably can get around this problem by programming like so:
>> DATA ABC;
>> SET ND.DATASET;
>> (play with data set ABC...)
>> RUN:
>>
>> (delete ND.DATASET)
>>
>> DATA ND.DATASET;
>> SET ABC;
>> RUN:
>>
>> But I'd prefer something cleaner, less intrusive (especially for our
>less
>> "sophisticated" users). Plus, we've got LOTS of programs that are run
>daily,
>> weekly, monthly, etc that contain steps like: "proc sort data=ND.xxxx;
>run;"
>> and/or "data ND.xxxx; set ND.xxxx abc; run;" and/or (well, you get the
>> picture).
>>
>> To the point: has anyone else had this problem, and (if so) what did
>you do
>> to solve it?
>
>I haven't seen this problem before. But I'd just like to vent.
>
>How can your IS people not be responsive on this? Go to your bosses and
>show them how EXPENSIVE this is going to be. If your IS won't or can't
>fix this problem (scrap NAS or get it fixed), then you and all your
>other
>SAS people will have to re-write every bit of your SAS code to only
>create
>new data sets: this means sorting from the old set to a new one using
>the
>OUT= option. This will explode the disk space requirements on the
>network,
>costing the company *more* money, on top of the cost of all the
>programmer
>hours to alter and then test and then debug all the SAS code. Make it
>into a
>business case, and show your bosses that this problem with NAS is going
>to
>cost them hundreds of thousands of dollars in this fiscal year alone, as
>well
>as wrecking the schedule for any new programming projects (factor in all
>costs for that as well).
>
>There is no excuse for your IS not to have EMC all over this. EMC has a
>rep as a really responsive solutions provider, and I can't believe they
>got that rep by letting stuff like this happen.
>
>I wish I had better advice, but this isn't a SAS problem.
>
>David
>--
>David Cassell, CSC
>Cassell.David(a)epa.gov
>Senior computing specialist
>mathematical statistician
From: Richard A. DeVenezia on
David Johnson wrote:
> Actually Dave, I'm not convinced that it isn't a SAS problem.
>
> I have a track open with SAS on a similar issue involving a rename
> from the restructuring of a small data set. It is one within some
> hundreds of data sets created and modified in a work library as part
> of 46 programs included in a batch sequence.
>
> At irregular times, the "rename of temporary member" message comes
> up, SAS goes into syntax checking mode and sets Obs to 0, the batch
> manager detects an error and terminates the sequence.

Your SAS session is likely stressing hardware, and/or, your device
subsystem/driver arrangement/configuration can not keep up with what it is
being called to do.

http://www.devenezia.com/downloads/sas/rename-error/
I never got to the bottom of it.

Some candidates are low level caches or timing races in device driver
issuing/handling system semaphores.

--
Richard A. DeVenezia


From: "Johnson, David" on
Thank you Richard,

Until now, if a program exhibited misbehaviour above a given level
(input data error, undefined outcome warning, SAS Warning, SAS Error)
then the Batch Manager would identify the issue had occurred and
terminate the batch.

As misfortune would have it, this usually occurred before the three
longest running jobs had completed, leaving the largest amount of
processing to be completed manually during the day. Naturally, no
problems would manifest when the machine was being more closely
monitored.


Where am I going with that? If there is a systemic issue, then we might
expect it to manifest repeatedly, and 10-12 hours later the cause of the
problem may have gone. So, if I can resubmit failing processes
immediately then I might either see the rerun complete effortlessly, or
the problem persist.

While it seems that one may have a hardware or configuration problem as
you suggest, there is a confounding influence that is either often
absent or often lacks influence. Knowing what this was would make it
more likely that I could create the problem on demand and allow me to
test various fixes.


I think I have completed recoding the Batch Manager to resubmit rather
than abandon failed code and if I can get any further I'll advise.


I had scoped changes to my resource monitoring program to also trap
active process/thread and memory data but haven't coded up the APIs yet
to do that. I might do that now, just to exclude the suspicion that
another process like an AV application may be causing the problem. I
note in Curtis' original note that this was a possibility offered by
SAS.

Kind regards

David


-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L(a)LISTSERV.UGA.EDU] On Behalf Of
Richard A. DeVenezia
Sent: Wednesday, 24 January 2007 11:13 AM
To: SAS-L(a)LISTSERV.UGA.EDU
Subject: Re: ERROR: Rename... Losing data sets from network drives

David Johnson wrote:
> Actually Dave, I'm not convinced that it isn't a SAS problem.
>
> I have a track open with SAS on a similar issue involving a rename
> from the restructuring of a small data set. It is one within some
> hundreds of data sets created and modified in a work library as part
> of 46 programs included in a batch sequence.
>
> At irregular times, the "rename of temporary member" message comes up,

> SAS goes into syntax checking mode and sets Obs to 0, the batch
> manager detects an error and terminates the sequence.

Your SAS session is likely stressing hardware, and/or, your device
subsystem/driver arrangement/configuration can not keep up with what it
is being called to do.

http://www.devenezia.com/downloads/sas/rename-error/
I never got to the bottom of it.

Some candidates are low level caches or timing races in device driver
issuing/handling system semaphores.

--
Richard A. DeVenezia

************** IMPORTANT MESSAGE *****************************
This e-mail message is intended only for the addressee(s) and contains information which may be
confidential.
If you are not the intended recipient please advise the sender by return email, do not use or
disclose the contents, and delete the message and any attachments from your system. Unless
specifically indicated, this email does not constitute formal advice or commitment by the sender
or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its subsidiaries.
We can be contacted through our web site: commbank.com.au.
If you no longer wish to receive commercial electronic messages from us, please reply to this
e-mail by typing Unsubscribe in the subject line.
**************************************************************
From: LouisBB on
Dear Curtis,

If the trouble is in the number of simultaneous open files you could take a
look at the section "SPLITTING A SAS FILE DYNAMICALLY USING THE .OUTPUT()
METHOD"of Sugi31 paper 241 "Data Step Hash Objects as Programming Tools" by
Paul M. Dorfman and Koen Vyverman.
http://www2.sas.com/proceedings/sugi31/241-31.pdf

They give an example for sorted input data:
data _null_ ;
dcl hash hid (ordered: 'a') ;
hid.definekey ('id', 'transid', 'amt', '_n_') ;
hid.definedata ('id', 'transid', 'amt' ) ;
hid.definedone ( ) ;
do _n_ = 1 by 1 until ( last.id ) ;
set sample ;
by id ;
hid.add() ;
end ;
hid.output (dataset: 'OUT' || put (id, best.-l)) ;
run ;
And one for unsorted input data, using the hash of hashes method.

I hope this can offer an alternative, assuming you are using Sas9.

LouisBB.

"Richard A. DeVenezia" <rdevenezia(a)wildblue.net> wrote in message
news:51nmnvF1l6lvrU1(a)mid.individual.net...
> David Johnson wrote:
>> Actually Dave, I'm not convinced that it isn't a SAS problem.
>>
>> I have a track open with SAS on a similar issue involving a rename
>> from the restructuring of a small data set. It is one within some
>> hundreds of data sets created and modified in a work library as part
>> of 46 programs included in a batch sequence.
>>
>> At irregular times...


From: David L Cassell on
david.johnson(a)CBA.COM.AU wrote back:
>On Fri, 14 Jan 2005 14:56:18 -0800, David L. Cassell
><cassell.david(a)EPAMAIL.EPA.GOV> wrote:
>
> >Curtis Amick <curtis(a)SC.RR.COM> wrote:
> >> Got a difficult problem here. Recently my company upgraded network
> >storage
> >> to an EMC NAS (Network Attached Storage), from a non-NAS system. Now,
> >those
> >> of us who store SAS data sets on the network are encountering a
> >serious
> >> problem. When updating data sets, sometimes (rarely) those data sets
> >will be
> >> deleted. The error message looks like:
> >> "ERROR: Rename of temporary member for (data set name) failed. File
> >may
> >> be found in a directory (your directory)"and the permanent data set
> >is
> >> gone.
> >>
> >> This happens randomly, and (apparently) only when the data set already
> >> exists. That is, when doing like this:
> >>
> >> DATA NETDRIVE.DATASET; SET DATASET2; RUN; If netdrive.dataset
> >already
> >> exists (it's being "updated" by work.dataset2), then this error
> >*might*
> >> occur. If netdrive.dataset does not yet exist (it's being created by
> >> work.dataset2), then problem will not occur.
> >>
> >> From SI Tech Support: They've seen this before (see SAS NOTE 005781,
> >link
> >> here: http://support.sas.com/techsup/unotes/SN/005/005781.html ), but
> >can't
> >> fix it because (according to TS rep) once SAS wants to write to NAS,
> >they
> >> "hand it off" to the network. And that's when the problem occurs.
> >>
> >> Here's what I think: When SAS updates a data set, it creates a
> >temporary
> >> data set to work on, keeping the original intact. When the step ends,
> >(think
> >> PROC SORT DATA=ND.dataset; RUN; (this killed me on Saturday. Had a
> >macro
> >> that sorted 20+ data sets, and lost 4!!! of them.)) the original data
> >set is
> >> over-written by the temp, taking on the name of the original. And I'm
> >> thinking it's during that writing/re-naming process that the storage
> >system
> >> is losing our data sets. (SI calls it a "timing issue"). Doesn't
> >happen when
> >> working on local drives, and, like I mentioned earlier, hasn't
> >happened yet
> >> when *creating* permanent data sets; only when updating.
> >>
> >> Some suggestions (from SITS): change engines (v8, v612) (doesn't work,
> >not
> >> feasible), use -SYNCHIO (have tried it; doesn't seem to help), remove
> >SAS
> >> data sets from on-line virus scanning in the NAS (our IS dept is leery
> >of
> >> that one). Personally, I'd like to go back to previous storage
> >(non-NAS, IS
> >> dept isn't thrilled with that one).
> >>
> >> Probably can get around this problem by programming like so:
> >> DATA ABC;
> >> SET ND.DATASET;
> >> (play with data set ABC...)
> >> RUN:
> >>
> >> (delete ND.DATASET)
> >>
> >> DATA ND.DATASET;
> >> SET ABC;
> >> RUN:
> >>
> >> But I'd prefer something cleaner, less intrusive (especially for our
> >less
> >> "sophisticated" users). Plus, we've got LOTS of programs that are run
> >daily,
> >> weekly, monthly, etc that contain steps like: "proc sort data=ND.xxxx;
> >run;"
> >> and/or "data ND.xxxx; set ND.xxxx abc; run;" and/or (well, you get the
> >> picture).
> >>
> >> To the point: has anyone else had this problem, and (if so) what did
> >you do
> >> to solve it?
> >
> >I haven't seen this problem before. But I'd just like to vent.
> >
> >How can your IS people not be responsive on this? Go to your bosses and
> >show them how EXPENSIVE this is going to be. If your IS won't or can't
> >fix this problem (scrap NAS or get it fixed), then you and all your
> >other
> >SAS people will have to re-write every bit of your SAS code to only
> >create
> >new data sets: this means sorting from the old set to a new one using
> >the
> >OUT= option. This will explode the disk space requirements on the
> >network,
> >costing the company *more* money, on top of the cost of all the
> >programmer
> >hours to alter and then test and then debug all the SAS code. Make it
> >into a
> >business case, and show your bosses that this problem with NAS is going
> >to
> >cost them hundreds of thousands of dollars in this fiscal year alone, as
> >well
> >as wrecking the schedule for any new programming projects (factor in all
> >costs for that as well).
> >
> >There is no excuse for your IS not to have EMC all over this. EMC has a
> >rep as a really responsive solutions provider, and I can't believe they
> >got that rep by letting stuff like this happen.
> >
> >I wish I had better advice, but this isn't a SAS problem.
> >
> >David
> >--
> >David Cassell, CSC
> >Cassell.David(a)epa.gov
> >Senior computing specialist
> >mathematical statistician

>
>Actually Dave, I'm not convinced that it isn't a SAS problem.
>
>I have a track open with SAS on a similar issue involving a rename from
>the restructuring of a small data set. It is one within some hundreds of
>data sets created and modified in a work library as part of 46 programs
>included in a batch sequence.
>
>At irregular times, the "rename of temporary member" message comes up, SAS
>goes into syntax checking mode and sets Obs to 0, the batch manager
>detects an error and terminates the sequence.
>
>It doesn't appear to be the same place twice, it isn't practical to
>replace every data step with a new output table followed by a delete step,
>and it isn't an issue with space or permissions. So most of the usual
>diagnoses are irrelevant.
>
>Synchio is turned on, and the tables use V7 compliant naming and
>structure, so V6.12 library definition is out as well.
>
>It has been plaguing us for months, and seems very similar to the issue
>described here by Curtis, and similar issues by other correspondents for
>quite some time.
>
>The difference is that the work directory is on a virtual drive created on
>a Raid 5 array in a high end workstation. Yes, I know Raid slows
>performance on work libraries, and it isn't my choice, but it's the way
>this machine has been built, and I don't have the option to change it to
>JBOD.
>
>The core issue seems to be: the V8 engine, when talking to a Network
>drive, a NAS drive or a Raid array is expecting a process to be finished
>before it has physically completed.
>
>How many times have we had to code delays into programs to deal with OS
>response times? I have a hunch this is similar, and the SAS V8 engine is
>expecting something Windows architecture cannot always deliver.
>
>
>Incidentally, since it is irregular, and since it is a batch process with
>small included code objects, I am looking at the batch manager
>resubmitting the same code block if it fails with an error of this type.
>Now I only need to be able to reset the SAS error flags. Unfortunately,
>since I can't predict when the error will occur, it is going to be some
>time before I will know if the changes to the batch manager work.
>
>Kind regards
>
>David

I suspect that it *is* a SAS-related problem. But that does not make
it a SAS problem. Right? Do you have other apps which sufficiently
stress the disk I/O and buffering of the system? You might have to
write one yourself in C, because SAS is pretty darn efficient at read/write,
and it may be overtaxing your system components.

If nothing else - even highly tuned code to pump streams of data in
and out of your I/O subsystems - can cause this problem, then I
would have to point a finger at SAS. But if other high-end I/O apps
can cause similar problems, then it's the system.

Pinning this down may be a *major* pain in the NAS. :-)

HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330

_________________________________________________________________
Get Hilary Duff�s homepage with her photos, music, and more.
http://celebrities.live.com