From: Bela Lubkin <belal@sco.com> Subject: Re: Panic save very sloooooow Date: Fri, 15 Feb 2002 09:29:01 GMT References: <3c6b0b39.29156534@news.texas.net> Carl Sawyer wrote: > I am trying to figure out why we are getting panics every 3 to 5 days. > Swap space was not big enough to hold a memory image (2 gig) so last > weekend we put in a spare 10 gig and created an empty 5 gig partition > as our new default dump device (as per SCO TA). > > Today we got our first panic. Operator said "Y" to dump to the > default and hit ENTER when it said to insert media. And nothing > happened for about eight minutes when it displayed one period. I was > not there, operator said we were getting some disk lights, but I > cannot attest to frequency. An hour later it had displayed a total of > nine periods and management made us reboot the system so users could > get back in. > > A few days ago I posted here looking for a way to induce panics so I > could test this procedure but I decided that I should not do the > recommended changes on a production system. I did, however, use > sysdump to write a full memory image to /dev/dump and it took about > five minutes to run. Then I tested a script I wrote to preserve this > image in a dated disk file and this, too, took about 5 min. > > Details: > SCO OSR 5.0.4 with all applicable patches, etc. > Four 800 mhz CPUs > 2 gig RAM > Two SCSI RAID controllers: > - DAC960pg > - Adaptec/DPT 3400s - this is where the disk with the dump device > lives, though it is not part of an array. > > Questions: > 1) What does each period mean when it is saving the panic image?
Each dot represents 1/80 of your total memory, so a complete dump should fill one entire line of the console screen. > 2) Any ideas why it is so slow? Each dot is about 25MB on your system, so it might be slower than you might be used to from another system. On the other hand, 8 minutes is far too long for a mere 25MB. Are there any other devices on the Adaptec/DPT 3400s controller where the dump device lives? In other words: do you have any evidence that this adapter is working normally? You say you "created an empty 5 gig partition as our new default dump device (as per SCO TA)". First, which TA? Second, show us how you've informed the kernel of this -- excerpt from /etc/default/boot or /etc/conf/cf.d/sassign. Finally, show us the device node for the device you're dumping to, and output from `hwconfig -h`. The dump could be slow due to the nature of the panic: if it was caused by a problem in the disk drivers in the kernel, they might be in an insane state. Or there may be a problem with interrupt delivery. I believe the kernel dump code uses some tricks to continue writing even if interrupts are not being received from the host adapter to which the dump is being written; this would probably be slow (I'm not sure, have never seen it in action).
> 3) Any other ideas or suggestions (I'm open to anything!)? Can you tell whether the partial dump attempt wrote anything? Probably not, but you can next time. Create a filesystem on the dump partition or division; mount it; copy some files on; look at them. Do not set the system up to automatically mount this filesystem (i.e. don't run `mkdev fs` or whatever the scoadmin equivalent is). Just create enough stuff to be recognizable. Then unmount it. If the system panics, it should overwrite that "watchdog" filesystem. If the slow dumping behavior persists, you'll probably kick out of it before the dump completes, but at least you can tell whether it was doing anything, by trying to mount that filesystem. If the dump was writing anything it'll have trashed the beginning of the filesystem and it'll be completely unrecoverable. Another suggestion: link the kernel debugger into the kernel. Edit /etc/conf/sdevice.d/scodb, change "N" to "Y". Now relink the kernel _twice_ (it'll change some data table sizes the first time), then reboot. Once scodb is in, you have a new way to test, and new behavior in a panic. To test: boot the system to single-user mode. Hit control-X on the console to get into scodb. At the scodb prompt, type "sysdump()". This calls the kernel function that writes a panic dump. It should go through the usual motions. Does it? Do the dots move across the screen at a reasonable pace? (Let it finish if it's fast; hit RESET or power if it's slow, and go figure out why...) New behavior: if the system panics, it will drop to a scodb prompt. "?" gives brief help. You can get a call traceback with "stack". If you can transcribe the panic traceback, someone here might be able to see what the problem is. NOTE: if the console is in graphics mode (X or DOS Merge, for instance), a panic will drop to scodb, but you won't be able to see what you're doing. It's best to leave the console in character mode when you're trying to catch a panic in scodb. > 4) I am thinking of paying Caldera to help us dig through the saved > dump files (once we have a couple) and diagnose the problem - any > thoughts on this? Sounds good to me. ;-} >Bela<
Have you tried Searching this site?
Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.
Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.
Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.
We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.
Click here to add your comments
Don't miss responses! Subscribe to Comments by RSS or by Email
Click here to add your comments
If you want a picture to show with your comment, go get a Gravatar