|
From: groups.broberg on 16 Jun 2008 17:48 On Jun 13, 10:32 pm, AGT <usenetpersonger...(a)gmail.com> wrote: > <groups.brob...(a)gmail.com> wrote in message > > news:a8f8b4b1-adbb-4916-9b23-8d72f6d221ba(a)f36g2000hsa.googlegroups.com... > Writes: > > >> Oracle support is not giving us satisfactory results. Perhaps you can > >> give some answers? > > >> We've recently upgraded our system to Oracle 10.2.0.3.0, running on > >> Solaris (sparc) 10 inside a ZFS zone (our previous system was Oracle > >> 9.2.0.4.0 running on sparc Solaris 8, and was running on that for the > >> last 5 years). Since the upgrade 6 weeks ago, we've had two instances > >> where our applications (running in the same O/S environment on a > >> different node on the cluster) have locked up - existing connections > >> to Oracle become unresponsive when executing SQL (with no error > >> message - they just block), and attempts to create new connections are > >> met with the error: > >> "ORA-03135: connection lost contact". > > How hard would it be to eliminate the zoning..? > I dont think this is related nor ZFS but if you could test w/o > these changes then youd know for sure. > > I dont know why you would do this in the first place. Zones are > appropriate for some things and you generally get better > throughput from ZFS over UFS but zones just stir up the pot for me. > > Dedicate the box(es) to Oracle only - dont even make a special > project for it - just use default. Keep things simple as possible. > > Maybe you have reasons for all this fancy overhead but so far I see none : > The zones are here to stay. We sell a turnkey solution that runs on self-contained hardware, so our apps plus the database all live on one box (really a cluster - two boxes, with the one as a failover node). The zones make it much easier to administer & monitor all the components of the system (db + apps) with a unified mechanism. Additionally, we perform our hot backups using zones & snapshots, which is much faster and less intrusive than what we used to have running rman - our backup window is now a second or two (while the snapshot is taken), at which point the snapshot of the db zone can be backed up at any point in the subsequent 24 hours. This works much better for us, as different clients have different backup strategies. It's simpler to let them point to a net-mountable volume that contains all the files they need to archive to whatever backup strategy they're using for their enterprise. In any case, we haven't figured out how to replicate the scenario yet - it only occurs in the production environment, and never showed up during our testing. We typically tested a months worth of operation at an accelerated rate (anywhere from 4x to 40x the normal speeds, so tests finished in 2 - 7 days). We could start running some tests at 1x speeds, but given the intermittent rate of failure, it would be hard to draw any conclusions from a non-zone-based system that didn't fail after running for a month or two. We may as well kick off a several-month simulation, though, in case we start to see this problem occur with regularity.
From: Robert Klemme on 17 Jun 2008 02:17 On 16.06.2008 23:48, groups.broberg(a)gmail.com wrote: > In any case, we haven't figured out how to replicate the scenario yet > - it only occurs in the production environment, and never showed up > during our testing. We typically tested a months worth of operation > at an accelerated rate (anywhere from 4x to 40x the normal speeds, so > tests finished in 2 - 7 days). We could start running some tests at > 1x speeds, but given the intermittent rate of failure, it would be > hard to draw any conclusions from a non-zone-based system that didn't > fail after running for a month or two. > > We may as well kick off a several-month simulation, though, in case we > start to see this problem occur with regularity. I don't know Solaris too well but is there any chance to have some monitoring run on the production box that exhibited the error over the course of a week and try to capture circumstances of the error surfacing that way? Maybe you can collect some network statistics along with other data and later analyze it and find the error. Kind regards robert
From: mpacheco_brazil on 17 Jun 2008 09:07 Not sure if it will help or if it is even related but the default value of the environment parameter USE_SHARED_SOCKET changed from 9i to 10g. In 9i the default was FALSE and now it is TRUE. Do you use any kind of gateway to other databases or a Firewall in production?
From: AGT on 17 Jun 2008 11:41 On Tue, 17 Jun 2008 08:17:40 +0200, Robert Klemme wrote: > On 16.06.2008 23:48, groups.broberg(a)gmail.com wrote: >> In any case, we haven't figured out how to replicate the scenario yet >> - it only occurs in the production environment, and never showed up >> during our testing. We typically tested a months worth of operation >> at an accelerated rate (anywhere from 4x to 40x the normal speeds, so >> tests finished in 2 - 7 days). We could start running some tests at >> 1x speeds, but given the intermittent rate of failure, it would be >> hard to draw any conclusions from a non-zone-based system that didn't >> fail after running for a month or two. Sounds like >> We may as well kick off a several-month simulation, though, in case we >> start to see this problem occur with regularity. You could or see below > I don't know Solaris too well but is there any chance to have some > monitoring run on the production box that exhibited the error over the > course of a week and try to capture circumstances of the error surfacing > that way? What if it all happens in millisecond or less... Hard to capture that. Even with dtrace > Maybe you can collect some network statistics along with > other data and later analyze it and find the error. Theres many a release of "Solaris" for many HW platforms. All of them must be patched regularly especially if 'something funny is going on". One of my clients utterly refuses to patch the OS claiming that would introduce a "change" yet they patch everything else Oracle without question.. As I said I have serious doubts that zoning or ZFS would cause this, but a missing kernel/driver patch or a simple tweak in /etc/system might make the ghost go away. For near instant rollback I suggest enabling live upgrade - just in case a "change" is problematic
|
Pages: 1 Prev: Unable to connect DBOracle10g from DOS/command prompt Next: archiver problems |