From: Matt on
Oracle 10.2.0.3 EE on RHEL 4

In a critical 24/7 database we have a very large transactional table
which is partitioned by date (weekly) and then further subpartitioned
by list (we have 3 subpartitions per week). The nature of the list
partitioning means we have one very large subpartition and 2 much
smaller subpartitions for every week.

The current purge strategy is to delete data from the 'large'
subpartition for any weekly partitions older than 6 weeks. This
causes the global (primary key) index to be much much large than it
needs to be because the some index leaf blocks are having almost all
of their entries deleted (leaving a very small number of rows and
loads of empty space).

So I've decided to rebuild the index to reclaim 90% of the wasted
space because this is causing the index datafile to grow
uneccessarily. Rebuilding the index shrinks the index size down from
30GB to 4GB.

However the uptime of this system is essential so I am being paranoid
about causing an unplanned outage. I've tested the online index
rebuild in a staging environment and I'm happy with it. No
transactions failed during testing and the rebuild finished in less
than 30 minutes.

Anyone have any thoughts about what other potential risk areas I need
to check to make sure this change doesn't cause any problems.

The rebuild will happen in a relatively quiet window when the
transaction load is 3 per second.

Thanks
From: vsevolod afanassiev on
In my experience this should work without problems.
Possible issues:
1. Oracle needs exclusive lock for very brief period at the beginning
and end of the
rebuild process. What matters is not number of transactions per second
but
transaction durations. If transactions are brief (seconds) them it
should be
fine, if you have any long transactions (hours) then rebuild process
may get delayed.
2. Index rebuild creates a challenge for space monitoring as during
rebuilt you
have two copies of the index. So make sure that free space in the
tablespace (if you don't use
autoextend) or on the filesystem (if using autoextend) is sufficient.
3. During rebuild Oracle creates Index-Organized table to keep track
of the changes occurring in the
base table. From memory it gets created in the default tablespace of
the user that owns the index.
Make sure there is enough free space.
4. Make sure the is enough space for archivelogs that rebuild will
generate (assuming db is in
archivelog mode).
5. I suggest testing the process for various error conditions: make it
run out of space,
kill with alter session and with 'kill -9' etc and check that Oracle
reverts to the original index.
From: Matt on
> 1. Oracle needs exclusive lock for very brief period at the beginning
> and end of the rebuild process. What matters is not number of transactions per second
> but transaction durations. If transactions are brief (seconds) them it
> should be fine, if you have any long transactions (hours) then rebuild process
> may get delayed.

The transactions execute in about 10-20ms so this should be no problem
for us.

> 2. Index rebuild creates a challenge for space monitoring as during
> rebuilt you have two copies of the index. So make sure that free space in the
> tablespace (if you don't use autoextend) or on the filesystem (if using autoextend) is sufficient.

The newly built index is about 4GB so we need this amount extra in the
tablespace. This index is growing at a rate of 3GB per month anyway
so the rebuild will inhibit this growth (at least in the medium term).

> 3. During rebuild Oracle creates Index-Organized table to keep track
> of the changes occurring in the base table. From memory it gets created in the default tablespace of
> the user that owns the index. Make sure there is enough free space.

During testing the temporary IOT was only a few megabytes (enough to
hold about 5,000 primary key entries).

> 5. I suggest testing the process for various error conditions: make it
> run out of space, kill with alter session and with 'kill -9' etc and check that Oracle
> reverts to the original index.

Excellent idea, thanks. I'll plan to do some failure testing in the
staging environment.

From: Matt on
The rebuild successfully survived various failures (kill -9,
tablespace full etc) but the kill -9 failure required a manual clean
up of the data dictionary before another online rebuild could be
attempted. This is because the journal table remains after the crash
and the next rebuild attempt tries to create a journal table which
already exists.

This can be cleaned up by running:

begin
dbms_repair.ONLINE_INDEX_CLEAN(<object_id>);
end;

thanks again for your response.