From: Tom Lane on
I'm in process of reviewing the restartable-recovery patch,
http://archives.postgresql.org/pgsql-patches/2006-07/msg00356.php
and I'm wondering if we really need to invent a "standby mode" boolean
to get the right behavior. The problem I see with that flag is that
it'd be static over a run, whereas the behavior we want is dynamic.
It seems entirely likely that a slave will be started from a base backup
that isn't quite current, and will need to run through some archived WAL
segments quickly before it catches up to the master. So during the
catchup period we'd prefer that it not do restartpoints one-for-one
with the logged checkpoints, whereas after it's caught up, that's what
we want.

I'm thinking that we could instead track the actual elapsed time since
the last restartpoint, and do a restartpoint when we encounter a
checkpoint WAL record and the time since the last restartpoint is
at least X. I'd be inclined to just use checkpoint_timeout for X,
although perhaps there's an argument to be made for making it
separately settable.

Thoughts?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

From: Simon Riggs on
On Mon, 2006-08-07 at 09:48 -0400, Tom Lane wrote:
> I'm in process of reviewing the restartable-recovery patch,
> http://archives.postgresql.org/pgsql-patches/2006-07/msg00356.php
> and I'm wondering if we really need to invent a "standby mode" boolean
> to get the right behavior. The problem I see with that flag is that
> it'd be static over a run, whereas the behavior we want is dynamic.
> It seems entirely likely that a slave will be started from a base backup
> that isn't quite current, and will need to run through some archived WAL
> segments quickly before it catches up to the master. So during the
> catchup period we'd prefer that it not do restartpoints one-for-one
> with the logged checkpoints, whereas after it's caught up, that's what
> we want.

That's a great observation. It also ties in neatly with the last piece
of function I've been trying to add.

Let's have it run at full speed, i.e. restartpoint every 100 checkpoints
up until we hit end-of-logs, then if we are not in standby_mode the
recovery will just end. [Also: Currently, we do not retry a request for
a archive file during recovery, though for balance with archive we
should retry 3 times.]

If we are in standby mode, then rather than ending recovery we go into a
wait loop. We poll for the next file, then sleep for 1000 ms, then poll
again. When a file arrives we mark a restartpoint each checkpoint.

We need the standby_mode to signify the difference in behaviour at
end-of-logs, but we may not need a parameter of that exact name.

The piece I have been puzzling over is how to initiate a failover when
in standby_mode. I've not come up with a better solution than checking
for the existence of a trigger file each time round the next-file wait
loop. This would use a naming convention to indicate the port number,
allowing us to uniquely identify a cluster on any single server. That's
about as portable and generic as you'll get.

We could replace the standby_mode with a single parameter to indicate
where the trigger file should be located.

This is then the last piece in the standby server puzzle.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

From: Tom Lane on
Simon Riggs <simon(a)2ndquadrant.com> writes:
> If we are in standby mode, then rather than ending recovery we go into a
> wait loop. We poll for the next file, then sleep for 1000 ms, then poll
> again. When a file arrives we mark a restartpoint each checkpoint.

> We need the standby_mode to signify the difference in behaviour at
> end-of-logs, but we may not need a parameter of that exact name.

> The piece I have been puzzling over is how to initiate a failover when
> in standby_mode. I've not come up with a better solution than checking
> for the existence of a trigger file each time round the next-file wait
> loop. This would use a naming convention to indicate the port number,
> allowing us to uniquely identify a cluster on any single server. That's
> about as portable and generic as you'll get.

The original intention was that all this sort of logic was to be
external in the recovery_command script. I'm pretty dubious about
freezing it in the C code when there's not yet an established
convention for how it should work. I'd kinda like to see a widely
accepted recovery_command script before we move the logic inside
the server.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

From: Simon Riggs on
On Mon, 2006-08-07 at 11:37 -0400, Tom Lane wrote:
> Simon Riggs <simon(a)2ndquadrant.com> writes:
> > If we are in standby mode, then rather than ending recovery we go into a
> > wait loop. We poll for the next file, then sleep for 1000 ms, then poll
> > again. When a file arrives we mark a restartpoint each checkpoint.
>
> > We need the standby_mode to signify the difference in behaviour at
> > end-of-logs, but we may not need a parameter of that exact name.
>
> > The piece I have been puzzling over is how to initiate a failover when
> > in standby_mode. I've not come up with a better solution than checking
> > for the existence of a trigger file each time round the next-file wait
> > loop. This would use a naming convention to indicate the port number,
> > allowing us to uniquely identify a cluster on any single server. That's
> > about as portable and generic as you'll get.
>
> The original intention was that all this sort of logic was to be
> external in the recovery_command script. I'm pretty dubious about
> freezing it in the C code when there's not yet an established
> convention for how it should work. I'd kinda like to see a widely
> accepted recovery_command script before we move the logic inside
> the server.

OK, I'll submit a C program called pg_standby so that we have an
approved and portable version of the script, allowing it to be
documented more easily.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

From: Bruce Momjian on
Simon Riggs wrote:
> On Mon, 2006-08-07 at 11:37 -0400, Tom Lane wrote:
> > Simon Riggs <simon(a)2ndquadrant.com> writes:
> > > If we are in standby mode, then rather than ending recovery we go into a
> > > wait loop. We poll for the next file, then sleep for 1000 ms, then poll
> > > again. When a file arrives we mark a restartpoint each checkpoint.
> >
> > > We need the standby_mode to signify the difference in behaviour at
> > > end-of-logs, but we may not need a parameter of that exact name.
> >
> > > The piece I have been puzzling over is how to initiate a failover when
> > > in standby_mode. I've not come up with a better solution than checking
> > > for the existence of a trigger file each time round the next-file wait
> > > loop. This would use a naming convention to indicate the port number,
> > > allowing us to uniquely identify a cluster on any single server. That's
> > > about as portable and generic as you'll get.
> >
> > The original intention was that all this sort of logic was to be
> > external in the recovery_command script. I'm pretty dubious about
> > freezing it in the C code when there's not yet an established
> > convention for how it should work. I'd kinda like to see a widely
> > accepted recovery_command script before we move the logic inside
> > the server.
>
> OK, I'll submit a C program called pg_standby so that we have an
> approved and portable version of the script, allowing it to be
> documented more easily.

I think we are still waiting for this. I am also waiting for more PITR
documentation to go with the recent patches.

--
Bruce Momjian bruce(a)momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings