From: Laurent BARRAILLE on
Le 10/05/2010 19:14, Jim Kusznir a écrit :
> Hi all:
>
> I've got a couple Ubuntu 9.10 machines that are suffering from a
> recurring failure of winbind that essentially crash the machine. When
> the system is in the "crashed state", one can ping the system, but all
> forms of login fail.
It's normal, winbind don't works anymore, so all services using pam are
out of service.
> It will not even respond to tftpd requests; ssh
> connections "time out", but the initial port is opened (just no
> connect). Rebooting does NOT recover from this, in order to recover,
> I need to:
>
> 1) reboot into single user mode
>
Have you enough place on your partitions at this step ?
> 2) edit /etc/nsswitch.conf and remove winbind
> 3) remove winbind from all pam.d/*
> 4) boot normally
> 5) stop samba and winbind
> 6) delete /var/lib/samba/* and /var/cache/samba/*
> 7) start samba
> 8) rejoin doimain
> 9) start winbind
> 10) undo #2 and 3 above
>
> After this, winbind will work for a week or two. If I stop after step
> 4 above the system is usable, but without domain users able to log in.
> My diagnostics show that net ads users (and all other "samba"
> commands) work just fine and find all users. All winbind-specific
> commands (wbinfo -u, etc) fail. Oh, if I leave the system up in the
> crashed state, it begins to fill up logs to the tune of 32gigs in a
> few days. The above procedure repeats approximately once every 5 days
> on our main production system. I have a second workstation that sees
> very little use, and it has suffered the same crash, but far less
> frequently. I have also tried inserting step 6.5 where I delete the
> machine account on the DC, but that doesn't change anything. Also,
> our Ubuntu 9.04 system running the same configuration files has no
> issues. We have not tried 10.04.
>
> This problem has been plaguing our operations for over two months now,
> so any assistance would be greatly appreciated.
>
> Some log file snippits:
>
> (from some point "in the middle" of the crash):
> May 7 15:32:45 casas-lin winbindd[20677]: sys_select: pipe failed
> (Too many open files)
>
"Too many open files" means your system has reach the limit of open files

try tu use lsof command to see which process open too many files.

lsof|wc -l

to see how many files are open

lsof|less

to see all open files

cat /proc/sys/fs/file-max

to see the system limit

> May 7 15:32:45 casas-lin winbindd[20677]: [2010/05/07 15:32:45, 0]
> lib/events.c:287(s3_event
> _debug)
> May 7 15:32:45 casas-lin winbindd[20677]: s3_event: sys_select()
> failed: 24:Too many open f
> iles
> May 7 15:32:45 casas-lin winbindd[20677]: [2010/05/07 15:32:45, 0]
> lib/select.c:64(sys_selec
> t)
> May 7 15:32:45 casas-lin winbindd[20677]: [2010/05/07 15:32:45, 0]
> lib/debug.c:663(reopen_lo
> gs)
> May 7 15:32:45 casas-lin winbindd[20677]: Unable to open new log
> file /var/log/samba/log.wb
> -CASAS: Too many open files
> ------
> From startup (step 4 above):
> May 10 08:36:50 casas-lin kernel: May 10 08:38:42 casas-lin
> winbindd[1571]: [2010/05/10 08:38:
> 42, 0] libsmb/smb_signing.c:255(signing_good)
> May 10 08:38:42 casas-lin winbindd[1571]: signing_good: BAD SIG: seq 41
> May 10 08:42:25 casas-lin winbindd[1562]: [2010/05/10 08:42:25, 0]
> winbindd/winbindd_dual.c:1
> 86(async_request_timeout_handler)
> May 10 08:42:25 casas-lin winbindd[1562]:
> async_request_timeout_handler: child pid 1571 is n
> ot responding. Closing connection to it.
> May 10 08:42:25 casas-lin winbindd[1571]: [2010/05/10 08:42:25, 0]
> winbindd/winbindd.c:190(wi
> nbindd_sig_term_handler)
> May 10 08:42:25 casas-lin winbindd[1571]: Got sig[15] terminate (is_parent=0)
> May 10 08:42:25 casas-lin winbindd[1825]: [2010/05/10 08:42:25, 0]
> rpc_client/cli_pipe.c:687(
> cli_pipe_verify_schannel)
> May 10 08:42:25 casas-lin winbindd[1825]: cli_pipe_verify_schannel:
> auth_len 56.
> May 10 08:43:37 casas-lin winbindd[1825]: [2010/05/10 08:43:37, 0]
> libsmb/smb_signing.c:255(s
> igning_good)
> May 10 08:43:37 casas-lin winbindd[1825]: signing_good: BAD SIG: seq 23
> May 10 08:47:25 casas-lin winbindd[1562]: [2010/05/10 08:47:25, 0]
> winbindd/winbindd_dual.c:1
> 86(async_request_timeout_handler)
> May 10 08:47:25 casas-lin winbindd[1562]:
> async_request_timeout_handler: child pid 1825 is n
> ot responding. Closing connection to it.
> May 10 08:47:25 casas-lin winbindd[1825]: [2010/05/10 08:47:25, 0]
> winbindd/winbindd.c:190(wi
> nbindd_sig_term_handler)
> May 10 08:47:25 casas-lin winbindd[1825]: Got sig[15] terminate (is_parent=0)
> May 10 08:47:25 casas-lin winbindd[1832]: [2010/05/10 08:47:25, 0]
> rpc_client/cli_pipe.c:687(
> cli_pipe_verify_schannel)
> May 10 08:47:25 casas-lin winbindd[1832]: cli_pipe_verify_schannel:
> auth_len 56.
> May 10 08:48:38 casas-lin winbindd[1832]: [2010/05/10 08:48:38, 0]
> libsmb/smb_signing.c:255(s
> igning_good)
> May 10 08:48:38 casas-lin winbindd[1832]: signing_good: BAD SIG: seq 23
> May 10 08:52:25 casas-lin winbindd[1562]: [2010/05/10 08:52:25, 0]
> winbindd/winbindd_dual.c:1
> 86(async_request_timeout_handler)
> May 10 08:52:25 casas-lin winbindd[1562]:
> async_request_timeout_handler: child pid 1832 is n
> ot responding. Closing connection to it.
> May 10 08:52:25 casas-lin winbindd[1832]: [2010/05/10 08:52:25, 0]
> winbindd/winbindd.c:190(wi
> nbindd_sig_term_handler)
>
> ---------
> log.wb-CASAS (my domain is CASAS.WSU.EDU)
> [2010/05/10 09:12:26, 1] libsmb/clikrb5.c:697(ads_krb5_mk_req)
> ads_krb5_mk_req: krb5_get_credentials failed for ad1$@CASAS (KDC
> reply did not match expectations)
> [2010/05/10 09:12:26, 1] libsmb/cliconnect.c:745(cli_session_setup_kerberos)
> cli_session_setup_kerberos: spnego_gen_negTokenTarg failed: KDC
> reply did not match expectations
> [2010/05/10 09:12:26, 0] rpc_client/cli_pipe.c:687(cli_pipe_verify_schannel)
> cli_pipe_verify_schannel: auth_len 56.
> [2010/05/10 09:12:26, 1]
> rpc_client/cli_pipe.c:948(cli_pipe_validate_current_pdu)
> cli_pipe_validate_current_pdu: RPC fault code DCERPC fault
> 0x00000721 received from host ad1.casas.wsu.edu!
> -------
> log-wb-CASAS.old (during "crashed state"):
> [2010/04/19 08:17:23, 1] libsmb/clikrb5.c:697(ads_krb5_mk_req)
> ads_krb5_mk_req: krb5_get_credentials failed for ad1$@CASAS (Cannot
> resolve network address
> for KDC in requested realm)
> [2010/04/19 08:17:23, 1] libsmb/cliconnect.c:745(cli_session_setup_kerberos)
> cli_session_setup_kerberos: spnego_gen_negTokenTarg failed: Cannot
> resolve network address f
> or KDC in requested realm
> [2010/04/19 08:17:23, 0] rpc_client/cli_pipe.c:687(cli_pipe_verify_schannel)
> cli_pipe_verify_schannel: auth_len 56.
> [2010/04/19 08:17:23, 1]
> rpc_client/cli_pipe.c:948(cli_pipe_validate_current_pdu)
> cli_pipe_validate_current_pdu: RPC fault code DCERPC fault
> 0x00000721 received from host ad1
> .casas.wsu.edu!
> ------------
> My configuration
> ------------
> smb.conf
> ------------
> [global]
> security = ads
> netbios name = casas-lin
> realm = CASAS.WSU.EDU
> workgroup = CASAS
> password server = ad1.casas.wsu.edu
> workgroup = CASAS
> idmap uid = 10000-20000
> idmap gid = 10000-20000
> idmap backend = rid:CASAS.WSU.EDU=10000-20000
> winbind enum users = yes
> winbind enum groups = yes
> winbind use default domain = yes
> #template homedir = /home/%U
> template homedir = /net/files/home/%U
> template shell = /bin/bash
> ; client use spnego = yes
> domain master = no
> --------------
> /etc/krb5.conf
> -------------
> [logging]
> default =FILE:/var/log/krb5libs.log
> kdc =FILE:/var/log/krb5kdc.log
> admin_server =FILE:/var/log/kadmind.log
>
> [libdefaults]
> default_realm = CASAS.WSU.EDU
> dns_lookup_realm = false
> dns_lookup_kdc = true
> ticket_lifetime = 24h
> forwardable = yes
>
> [realms]
> EXAMPLE.COM = {
> kdc = kerberos.example.com:88
> admin_server = kerberos.example.com:749
> default_domain = example.com
> }
>
> CASAS.WSU.EDU = {
> kdc = ad1.casas.wsu.edu
> admin_server = ad1.casas.wsu.edu
> kdc = ad1.casas.wsu.edu
> }
>
> CASAS = {
> kdc = ad1.casas.wsu.edu
> admin_server = ad1.casas.wsu.edu
> kdc = ad1.casas.wsu.edu
> }
>
> [domain_realm]
> .example.com = EXAMPLE.COM
> example.com = EXAMPLE.COM
>
> casas.wsu.edu = CASAS.WSU.EDU
> .casas.wsu.edu = CASAS.WSU.EDU
> [appdefaults]
> pam = {
> debug = false
> ticket_lifetime = 36000
> renew_lifetime = 36000
> forwardable = true
> krb4_convert = false
> }
> ---------------
> /etc/pam.d/common-account
> ---------------
> account [success=1 new_authtok_reqd=done default=ignore] pam_unix.so
> account requisite pam_deny.so
> account required pam_permit.so
> account sufficient pam_winbind.so
> account required pam_krb5.so minimum_uid=1000
> ------------
> /etc/pam.d/common-auth
> ------------
> auth [success=3 default=ignore] pam_winbind.so krb5_auth krb5_ccache_type=FILE
> auth [success=2 default=ignore] pam_krb5.so minimum_uid=1000 try_first_pass
> auth [success=1 default=ignore] pam_unix.so nullok_secure try_first_pass
> auth requisite pam_deny.so
> auth required pam_permit.so
> ------------
> /etc/pam.d/common-password
> ------------
> password requisite pam_winbind.so
> password requisite pam_krb5.so minimum_uid=1000 use_authtok
> password [success=1 default=ignore] pam_unix.so obscure use_authtok
> try_first_pass sha512
> password requisite pam_deny.so
> password required pam_permit.so
> password optional pam_gnome_keyring.so
> -------------
> /etc/nsswitch.conf
> -------------
> passwd: compat winbind
> group: compat winbind
> shadow: compat
>
> hosts: files dns mdns4
> networks: files
>
> protocols: db files
> services: db files
> ethers: db files
> rpc: db files
>
> netgroup: nis
> ----------------
>
> Thanks!
> --Jim
>
--
To unsubscribe from this list go to the following URL and read the
instructions: https://lists.samba.org/mailman/options/samba
From: Jim Kusznir on
Some more info:

On my (working) Ubuntu 9.04 system, its often consistently at around
50% load, with winbind and syslogd using up that CPU. In
/var/log/syslog, I get fairly continuous logging of:

May 11 09:06:39 casas-thin-serv winbindd[11370]: rpc_api_pipe: host
ad1.casas.wsu.edu, pipe \NETLOGON, fnum 0x400f returned critical
error. Error was NT_STATUS_PIPE_DISCONNECTED
May 11 09:06:39 casas-thin-serv winbindd[11370]: [2010/05/11 09:06:39,
0] rpc_client/cli_pipe.c:rpc_api_pipe(914)

Authentication and other details work, but this is eating up a lot of
CPU and disk space (logs) for nothing....and I'm suspicious that this
might be connected to the issue.

My AD controller (ad1.casas.wsu.edu) is a Win Serv 2008r2 box with the
schema set to 2003 (IIRC...I know I did not set it to 2008, as I tried
that first, and had lots of breakage). This system is around to serve
mostly winbind clients, but 1-3 windows boxes...

--Jim
--
To unsubscribe from this list go to the following URL and read the
instructions: https://lists.samba.org/mailman/options/samba
From: Jim Kusznir on
Am I the only one experiencing such breaking from winbind? I'm
suspicious of whether it actually works at all, and if I can't get it
working better "real soon now", I'm going to have to ditch it all
together. I really can't afford half of my cpu resources tied up in
logging messages, or my critical servers crashing once a week due to
winbind. I can't believe something this bad would be turned out by
the samba team; their stuff is usually top notch. Yet, I've followed
all the instructions on the webiste, I've tried a few different times,
I've reformatted and reinstalled my network a couple times, and I've
been seeking help, asking people to point out what I'm doing
wrong...and it still doesn't work.

Any more suggestions? Anyone actually using winbind successfully?

--Jim

On Tue, May 11, 2010 at 9:10 AM, Jim Kusznir <jkusznir(a)gmail.com> wrote:
> Some more info:
>
> On my (working) Ubuntu 9.04 system, its often consistently at around
> 50% load, with winbind and syslogd using up that CPU.  In
> /var/log/syslog, I get fairly continuous logging of:
>
> May 11 09:06:39 casas-thin-serv winbindd[11370]:   rpc_api_pipe: host
> ad1.casas.wsu.edu, pipe \NETLOGON, fnum 0x400f returned critical
> error. Error was NT_STATUS_PIPE_DISCONNECTED
> May 11 09:06:39 casas-thin-serv winbindd[11370]: [2010/05/11 09:06:39,
>  0] rpc_client/cli_pipe.c:rpc_api_pipe(914)
>
> Authentication and other details work, but this is eating up a lot of
> CPU and disk space (logs) for nothing....and I'm suspicious that this
> might be connected to the issue.
>
> My AD controller (ad1.casas.wsu.edu) is a Win Serv 2008r2 box with the
> schema set to 2003 (IIRC...I know I did not set it to 2008, as I tried
> that first, and had lots of breakage).  This system is around to serve
> mostly winbind clients, but 1-3 windows boxes...
>
> --Jim
>
--
To unsubscribe from this list go to the following URL and read the
instructions: https://lists.samba.org/mailman/options/samba
From: Jim Kusznir on
Ack, this message got burried in my mail reader...Thanks for the reply.

My entire smb.conf is included in my origional message to the list;
I'll paste it again here:

smb.conf
------------
[global]
security = ads
netbios name = casas-lin
realm = CASAS.WSU.EDU
workgroup = CASAS
password server = ad1.casas.wsu.edu
workgroup = CASAS
idmap uid = 10000-20000
idmap gid = 10000-20000
idmap backend = rid:CASAS.WSU.EDU=10000-20000
winbind enum users = yes
winbind enum groups = yes
winbind use default domain = yes
#template homedir = /home/%U
template homedir = /net/files/home/%U
template shell = /bin/bash
; client use spnego = yes
domain master = no
--------------

Thanks for the help!!

BTW: I tried the ubuntu team, they just ignored me.

--Jim

On Fri, May 14, 2010 at 6:35 AM, Eliel <slayer.r0x(a)gmail.com> wrote:
> Share the smb.conf of your workstations, lets see what can be done.
> Did you change the limit of open files?
> did you saw any zombie file running in the machine?
>
> As i told before, this is something that you should ask to the ubuntu team.
> I'm usind winbind in Debian workstations, and just work fine. "Never"
> crashes. Its running 3 months in a row by now, and counting.
>
> Let's take a peek in what you're doing, and then try to solve your problem.
>
> Regards
>
> On Thu, May 13, 2010 at 2:12 PM, Jim Kusznir <jkusznir(a)gmail.com> wrote:
>> Am I the only one experiencing such breaking from winbind?  I'm
>> suspicious of whether it actually works at all, and if I can't get it
>> working better "real soon now", I'm going to have to ditch it all
>> together.  I really can't afford half of my cpu resources tied up in
>> logging messages, or my critical servers crashing once a week due to
>> winbind.  I can't believe something this bad would be turned out by
>> the samba team; their stuff is usually top notch.  Yet, I've followed
>> all the instructions on the webiste, I've tried a few different times,
>> I've reformatted and reinstalled my network a couple times, and I've
>> been seeking help, asking people to point out what I'm doing
>> wrong...and it still doesn't work.
>>
>> Any more suggestions?  Anyone actually using winbind successfully?
>>
>> --Jim
>>
>> On Tue, May 11, 2010 at 9:10 AM, Jim Kusznir <jkusznir(a)gmail.com> wrote:
>>> Some more info:
>>>
>>> On my (working) Ubuntu 9.04 system, its often consistently at around
>>> 50% load, with winbind and syslogd using up that CPU.  In
>>> /var/log/syslog, I get fairly continuous logging of:
>>>
>>> May 11 09:06:39 casas-thin-serv winbindd[11370]:   rpc_api_pipe: host
>>> ad1.casas.wsu.edu, pipe \NETLOGON, fnum 0x400f returned critical
>>> error. Error was NT_STATUS_PIPE_DISCONNECTED
>>> May 11 09:06:39 casas-thin-serv winbindd[11370]: [2010/05/11 09:06:39,
>>>  0] rpc_client/cli_pipe.c:rpc_api_pipe(914)
>>>
>>> Authentication and other details work, but this is eating up a lot of
>>> CPU and disk space (logs) for nothing....and I'm suspicious that this
>>> might be connected to the issue.
>>>
>>> My AD controller (ad1.casas.wsu.edu) is a Win Serv 2008r2 box with the
>>> schema set to 2003 (IIRC...I know I did not set it to 2008, as I tried
>>> that first, and had lots of breakage).  This system is around to serve
>>> mostly winbind clients, but 1-3 windows boxes...
>>>
>>> --Jim
>>>
>> --
>> To unsubscribe from this list go to the following URL and read the
>> instructions:  https://lists.samba.org/mailman/options/samba
>>
>
--
To unsubscribe from this list go to the following URL and read the
instructions: https://lists.samba.org/mailman/options/samba
From: Chris Smith on
On Thu, May 13, 2010 at 1:12 PM, Jim Kusznir <jkusznir(a)gmail.com> wrote:
> Any more suggestions?  Anyone actually using winbind successfully?

What changes if you change:

/etc/nsswitch.conf
-------------
passwd: compat winbind
group: compat winbind

to:
-------------
passwd: compat
group: compat

?

Does it still crash?

Chris
--
To unsubscribe from this list go to the following URL and read the
instructions: https://lists.samba.org/mailman/options/samba