RMAN not retrying on media manager errors [Oracle]

Prev: Oracle Database 11g Release 2 for Windows, experience?
Next: Exchange on the Cross

From: Ian Chard on 6 Apr 2010 10:00

Hi,

I'm using Oracle 10g, RMAN and TDPO (the TSM client for Oracle). I'm
trying to do a PITR to a non-production machine, but RMAN is intolerant
of any media manager problems that crop up, so things like this

ORA-19511: Error received from media manager layer, error text:
ANS1017E (RC-50) Session rejected: TCP/IP connection failure

and this

ORA-19511: Error received from media manager layer, error text:
ANS1314E (RC14) File data currently unavailable on server

result in an immediate 'failover to previous backup'. The first error
was caused by a transient network problem; I suspect the second was just
bad luck as the file was being reclaimed by the TSM server when TDPO
asked for it.

Both these errors would have gone away if RMAN had tried again, so is
there any way I can tell it to retry on error? If not, is there
something else I could do to improve the situation?

Thanks
- Ian

--
Ian Chard, Senior Unix and Network Gorilla | E: ian.chard(a)sers.ox.ac.uk
Systems and Electronic Resources Service | T: 80587 / (01865) 280587
Oxford University Library Services | F: (01865) 242287

From: Robert Klemme on 7 Apr 2010 14:41

On 06.04.2010 16:00, Ian Chard wrote:
> I'm using Oracle 10g, RMAN and TDPO (the TSM client for Oracle). I'm
> trying to do a PITR to a non-production machine, but RMAN is intolerant
> of any media manager problems that crop up, so things like this
>
> ORA-19511: Error received from media manager layer, error text:
> ANS1017E (RC-50) Session rejected: TCP/IP connection failure
>
> and this
>
> ORA-19511: Error received from media manager layer, error text:
> ANS1314E (RC14) File data currently unavailable on server
>
> result in an immediate 'failover to previous backup'. The first error
> was caused by a transient network problem; I suspect the second was just
> bad luck as the file was being reclaimed by the TSM server when TDPO
> asked for it.
>
> Both these errors would have gone away if RMAN had tried again, so is
> there any way I can tell it to retry on error? If not, is there
> something else I could do to improve the situation?

I have worked with a Ora 10g, RMAN and Tivoli on Linux a few years ago.
We had so frequent hangups (error message buried somewhere and RMAN
just sat there and did nothing) that I created a DB metric to detect
that situation. The solution then was to manually kill RMAN. :-(
That's of course not a solution to your problem but might be an
indication that this kind of integration does not work too well although
it sounds great on paper. Does anybody else have experience with that
combination?

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

|
Pages: 1
Prev: Oracle Database 11g Release 2 for Windows, experience?
Next: Exchange on the Cross