From: Bela Lubkin <filbo@armory.com> Subject: Re: Dying processes (inetd, cron, syslogd, sshd) Date: 8 Aug 2005 06:10:34 -0400 Message-ID: <200508080310.aa22956@deepthought.armory.com> References: <1122912804.902870.308720@g44g2000cwa.googlegroups.com>If this page was useful to you, please click to help others find it:
<1123492892.449563.152090@g43g2000cwa.googlegroups.com> keith@actual-systems.com wrote: > Anyone have any idea's on this problem? I posted on August 1st, but never saw it come back to me. This time I'm Bcc'ing you so you'll see it even if USENET swallows it again... keith@actual-systems.com wrote: > We are having problems on various SMP machines (5.0.6a + rs506a > installed) where at times of large load most of the running processes > just seem to stop (e.g. inetd, cron, syslogd, sshd,....) This always > seems to occur at times of large stress to the disks, but we have never > managed to put our fingers on exactly what is causing it. When it does > happen not only does the inetd process die, but also cron and syslog > which makes it very tricky for us to put anything in place to try and > catch what is happening. > > We are able to ping the machine when it does happed and also login at > the console and over a modem but not over a telnet of ssh connection. > > We have had an issue open with SCO before who advised us to install > scodb and set it to trigger when the inetd process stops - and when it > does to get a sysdump. We have tried this, but the sysdump created was > too big for swap - do you know of any way from within scodb to reduce > the size of the sysdump created? > > This machine (which has had the problem once a day for the last three,) > is used as a backup server in our office. All that runs on it is two > rsync's of our main machine - one for mail/uucp spools, and one for the > main data. The problem always has occured during these rsyncs, normally > when transferring a large file. scodb can't reduce the size of a crash dump, but you can force the dumps to fit by limiting the amount of memory seen by the kernel. To do this, append " mem=1m-100m" to DEFBOOTSTR in /etc/default/boot (substituting a bit less than actual size of your dump area in place of "100m"). The load you describe would probably run in 12MB of RAM, but don't limit memory more than you have to. The problem might be memory size-related. You want to keep as much as you can of the machine's normal memory size. [new material begins] > What would be the outcome if you had one process that kept on wanting > more and more resource? There are some problem scenarios like that. A common one is a process spinning out of control, allocating more and more memory. It will eventually use all available memory; its next allocation attempt will fail, and in most cases it will then die. Unless you have changed the defaults, such a process usually writes a core dump. On OSR5, during the dumping of a process's core, the process continues to own all of its memory until the dump is complete. This means that the machine remains critically out of memory for a long time. The process may have grown nearly as large as your combined RAM + swap. To dump it, not only does the kernel have to write that much data, it also may have to page a large portion of it in from swap. This can take many minutes with large memory and a slow disk... During that period, other processes that try to allocate memory will usually fail. Their subsequent behavior depends on their error handling. Some will dump core, some will exit gracefully, some may even stay up. And some may get into weird catatonic states. > Do other processes hold onto the resource they have or will they > eventually get 'bullied' out of the resource they are using and > essentially stop (which theoretically would give the results I am > seeing.) For memory, a "hog" process will cause others to get written out to swap, but those processes still "own" their memory (it will get paged back in if they need to access it). The troubles happen when a process tries to allocate more memory while the system is strapped. There are probably other resources where similar things could happen. > Any ideas? or does anyone have any idea's as to how I would track down > what was causing this to happen. If you had a process spin out and dump, it would leave a huge core file that you would be able to find. If a process spins out and dies _without_ leaving a dump, a more subtle trace is left. Normally, OSR5 doesn't use any swap at all; `swap -l` will have identical values in the "blocks" and "free" columns. ("Normal" modern systems have enough RAM that they never need to invoke the tremendous performance loss of swapping.) After such an incident, `swap -l` will show quite a bit of swap in use. This represents pages that got pushed out, and whose processes have never actually needed to access them since the incident. What does `crash` "p" show in the "EVENT" column for the hung processes? >Bela<
Have you tried Searching this site?
Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.
Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.
Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.
We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.
Click here to add your comments
Don't miss responses! Subscribe to Comments by RSS or by Email
Click here to add your comments
If you want a picture to show with your comment, go get a Gravatar