From: Denis BUCHER on
Dear all,

a) I have a very strange problem with postfix, it worked for more than
one year without problem, and now, for about one month some incoming
emails started to fail in the night (approx. between 4AM and 5AM) but
not every day, with errors like :

* Out: 451 4.3.0 <email(a)domain.ch>: Temporary lookup failure
* Out: 451 4.3.0 Error: queue file write error

It's very strange for me, therefore any help on that subject would be
greatly appreciated !

b) I found this interesting link, but it doesn't contain a solution, it
just says "sending server will anyway retry later" :
http://flakshack.com/anti-spam/wiki/index.php?page=Queue+File+Write+Errors

c) More info :
That's what I see in the logs approximately at the same time these
errors are seen :

Jun 11 04:45:21 cirrus postfix/trivial-rewrite[3636]: warning:
dict_ldap_connect: Unable to bind to server ldap://localhost:389 as
cn=****, ou=****
, dc=hsolutions, dc=ch: -5 (Timed out)
Jun 11 04:45:25 cirrus postfix/trivial-rewrite[3648]: warning:
dict_ldap_connect: Unable to bind to server ldap://localhost:389 as
cn=****, ou=****, dc=hsolutions, dc=ch: -5 (Timed out)
Jun 11 04:45:26 cirrus postfix/smtpd[3411]: warning: dict_ldap_lookup:
Search error -5: Timed out

or :
Jun 11 04:46:14 cirrus postfix/trivial-rewrite[3636]: fatal:
ldap:/etc/postfix/ldap-aliases.cf(0,lock|fold_fix): table lookup problem
Jun 11 04:46:20 cirrus postfix/trivial-rewrite[3648]: fatal:
ldap:/etc/postfix/ldap-aliases.cf(0,lock|fold_fix): table lookup problem

or:
Jun 11 04:46:30 cirrus postfix/master[17478]: warning: process
/usr/libexec/postfix/trivial-rewrite pid 3636 exit status 1
Jun 11 04:46:31 cirrus postfix/master[17478]: warning:
/usr/libexec/postfix/trivial-rewrite: bad command startup -- throttling
Jun 11 04:46:32 cirrus postfix/master[17478]: warning: process
/usr/libexec/postfix/trivial-rewrite pid 3648 exit status 1

or even :
Jun 11 04:47:08 cirrus postfix/smtpd[3639]: warning: dict_ldap_lookup:
Search error -5: Timed out
Jun 11 04:47:09 cirrus postfix/trivial-rewrite[3481]: warning:
dict_ldap_lookup: Search error -5: Timed out
Jun 11 04:47:11 cirrus postfix/cleanup[3447]: warning: dict_ldap_lookup:
Search error -5: Timed out
Jun 11 04:47:12 cirrus postfix/smtpd[3639]: NOQUEUE: reject: RCPT from
host[IP]: 451 4.3.0 <email(a)domain.ch>: Temporary lookup failure;
from=<owner-selinux(a)tarius.tycho.ncsc.mil> to=<email(a)domain.ch>
proto=SMTP helo=<mail184.messagelabs.com>
Jun 11 04:47:13 cirrus postfix/trivial-rewrite[3481]: fatal:
ldap:/etc/postfix/ldap-domains.cf(0,lock|fold_fix): table lookup problem
Jun 11 04:47:14 cirrus postfix/cleanup[3447]: warning: 8E616B80020:
virtual_alias_maps map lookup problem for root(a)host
Jun 11 04:47:15 cirrus postfix/pickup[3593]: 8C561B80020: uid=0 from=<root>
Jun 11 04:47:16 cirrus postfix/cleanup[3447]: 8C561B80020:
message-id=<20100611024715.8C561B80020(a)host>
Jun 11 04:47:16 cirrus postfix/cleanup[3750]: warning: problem talking
to service rewrite: Connection reset by peer
Jun 11 04:47:16 cirrus postfix/master[17478]: warning: process
/usr/libexec/postfix/trivial-rewrite pid 3481 exit status 1
Jun 11 04:47:34 cirrus postfix/qmgr[17481]: 2916A7C0006:
from=<postmaster(a)domain.ch>, size=13273, nrcpt=1 (queue active)
Jun 11 04:47:40 cirrus postfix/cleanup[3761]: A0A18B8002E:
message-id=<20100611024714.A0A18B8002E(a)host>
Jun 11 04:47:41 cirrus postfix/smtpd[3639]: disconnect from
mail184.messagelabs.com[193.109.254.3]
Jun 11 04:47:49 cirrus postfix/cleanup[3750]: warning: dict_ldap_lookup:
Search error -5: Timed out
Jun 11 04:47:49 cirrus postfix/cleanup[3750]: warning: A16B9B8000B:
virtual_alias_maps map lookup problem for email(a)domain.ch
Jun 11 04:47:49 cirrus postfix/qmgr[17481]: A0A18B8002E:
from=<double-bounce(a)host>, size=778, nrcpt=1 (queue active)
Jun 11 04:47:59 cirrus postfix/trivial-rewrite[3764]: warning:
dict_ldap_lookup: Search error -5: Timed out
Jun 11 04:48:00 cirrus postfix/trivial-rewrite[3764]: fatal:
ldap:/etc/postfix/ldap-aliases.cf(0,lock|fold_fix): table lookup problem
Jun 11 04:48:01 cirrus postfix/trivial-rewrite[3768]: warning:
dict_ldap_connect: Unable to bind to server ldap://localhost:389 as
cn=mailadmin, ou=***, dc=***, dc=ch: -5 (Timed out)
Jun 11 04:48:01 cirrus postfix/trivial-rewrite[3768]: fatal:
ldap:/etc/postfix/ldap-aliases.cf(0,lock|fold_fix): table lookup problem
Jun 11 04:48:01 cirrus postfix/cleanup[3762]: warning: problem talking
to service rewrite: Connection reset by peer
Jun 11 04:48:01 cirrus postfix/master[17478]: warning: process
/usr/libexec/postfix/trivial-rewrite pid 3764 exit status 1
Jun 11 04:48:04 cirrus postfix/master[17478]: warning: process
/usr/libexec/postfix/trivial-rewrite pid 3768 exit status 1

Thanks a lot in advance for any help !

Denis

From: Seth Mattinen on
On 6/12/2010 16:03, Denis BUCHER wrote:
>
> c) More info :
> That's what I see in the logs approximately at the same time these
> errors are seen :
>
> Jun 11 04:45:21 cirrus postfix/trivial-rewrite[3636]: warning:
> dict_ldap_connect: Unable to bind to server ldap://localhost:389 as
> cn=****, ou=****
> , dc=hsolutions, dc=ch: -5 (Timed out)
> Jun 11 04:45:25 cirrus postfix/trivial-rewrite[3648]: warning:
> dict_ldap_connect: Unable to bind to server ldap://localhost:389 as
> cn=****, ou=****, dc=hsolutions, dc=ch: -5 (Timed out)
> Jun 11 04:45:26 cirrus postfix/smtpd[3411]: warning: dict_ldap_lookup:
> Search error -5: Timed out
>


Well, according to this your LDAP server isn't working.

~Seth

From: Sahil Tandon on
On Sun, 13 Jun 2010, Denis BUCHER wrote:

> a) I have a very strange problem with postfix, it worked for more
> than one year without problem, and now, for about one month some
> incoming emails started to fail in the night (approx. between 4AM
> and 5AM) but not every day, with errors like :
>
> * Out: 451 4.3.0 <email(a)domain.ch>: Temporary lookup failure
> * Out: 451 4.3.0 Error: queue file write error

[ .. ]

> Jun 11 04:45:21 cirrus postfix/trivial-rewrite[3636]: warning:
> dict_ldap_connect: Unable to bind to server ldap://localhost:389 as
> cn=****, ou=****

Your LDAP server stops responding; fix that.

--
Sahil Tandon <sahil(a)FreeBSD.org>

From: Jerrale Gayle on
On 6/12/2010 7:48 PM, Sahil Tandon wrote:
> On Sun, 13 Jun 2010, Denis BUCHER wrote:
>
>
>> a) I have a very strange problem with postfix, it worked for more
>> than one year without problem, and now, for about one month some
>> incoming emails started to fail in the night (approx. between 4AM
>> and 5AM) but not every day, with errors like :
>>
>> * Out: 451 4.3.0<email(a)domain.ch>: Temporary lookup failure
>> * Out: 451 4.3.0 Error: queue file write error
>>
> [ .. ]
>
>
>> Jun 11 04:45:21 cirrus postfix/trivial-rewrite[3636]: warning:
>> dict_ldap_connect: Unable to bind to server ldap://localhost:389 as
>> cn=****, ou=****
>>
> Your LDAP server stops responding; fix that.
>
>


The default config for Logrotate starts at, guess when, 4 AM!!!!

During this time, LDAP's logs get rotated and then LDAP is suppose to be
stopped and started at the END. Your logrotate COULD, however, be
stopping LDAP service for the entire duration of logrotate and THEN
started at the end of Logrotate. I think this is your problem!

Jerrale

From: Denis BUCHER on
Dear Jerrale, Sahil and Seth,

Thanks a lot for your answers :

Le 13.06.2010 03:15, Jerrale Gayle a �crit :
>>> a) I have a very strange problem with postfix, it worked for more
>>> than one year without problem, and now, for about one month some
>>> incoming emails started to fail in the night (approx. between 4AM
>>> and 5AM) but not every day, with errors like :
>>>
>>> * Out: 451 4.3.0<email(a)domain.ch>: Temporary lookup failure
>>> * Out: 451 4.3.0 Error: queue file write error
>> [ .. ]
>>
>>> Jun 11 04:45:21 cirrus postfix/trivial-rewrite[3636]: warning:
>>> dict_ldap_connect: Unable to bind to server ldap://localhost:389 as
>>> cn=****, ou=****
>> Your LDAP server stops responding; fix that.
>
>
> The default config for Logrotate starts at, guess when, 4 AM!!!!

This looks interesting !

> During this time, LDAP's logs get rotated and then LDAP is suppose to be
> stopped and started at the END. Your logrotate COULD, however, be
> stopping LDAP service for the entire duration of logrotate and THEN
> started at the end of Logrotate. I think this is your problem!

I also think it has something to do with backups or something like that,
and as logrotate is not in "user crontab" but somewhere in the system I
missed it.

But after verification (I thought you found the problem) but logrotate
is only started once a week, and the dates doesn't match the problem
(even if you were right, the hours match perfectly !) and LDAP logs are
not rotated at all...

I'm sure you had the right idea : "find what is happening between 4AM
and 5:30AM to understand what is the problem", but what else could it be
? I checked with backups, with other tasks in crontab... I'm sure it's
something like that (???)

Thanks a lot for you help, and if you have an other idea, it would be
great :-))

Have a nice week-end !

Denis