APLawrence - Information and Resources for Unix and Linux Systems, Bloggers and the self-employed
RSS Feeds Get APLawrence.com by RSS











(OLDER) <- More Stuff -> (NEWER) (NEWEST)
Home > News Posts > panic dump ––>Re: Panic save very sloooooow
Printer Friendly Version




News Group Posts

panic dump




From: Bela Lubkin <belal@sco.com>
Subject: Re: Panic save very sloooooow
Date: Fri, 15 Feb 2002 09:29:01 GMT
References: <3c6b0b39.29156534@news.texas.net> 

Carl Sawyer wrote:

> I am trying to figure out why we are getting panics every 3 to 5 days.
> Swap space was not big enough to hold a memory image (2 gig) so last
> weekend we put in a spare 10 gig and created an empty 5 gig partition
> as our new default dump device (as per SCO TA).
> 
> Today we got our first panic.  Operator said "Y" to dump to the
> default and hit ENTER when it said to insert media.  And nothing
> happened for about eight minutes when it displayed one period.  I was
> not there, operator said we were getting some disk lights, but I
> cannot attest to frequency.  An hour later it had displayed a total of
> nine periods and management made us reboot the system so users could
> get back in.
> 
> A few days ago I posted here looking for a way to induce panics so I
> could test this procedure but I decided that I should not do the
> recommended changes on a production system.  I did, however, use
> sysdump to write a full memory image to /dev/dump and it took about
> five minutes to run.  Then I tested a script I wrote to preserve this
> image in a dated disk file and this, too, took about 5 min.
> 
> Details:
> SCO OSR 5.0.4 with all applicable patches, etc.
> Four 800 mhz CPUs
> 2 gig RAM
> Two SCSI RAID controllers: 
> - DAC960pg
> - Adaptec/DPT 3400s - this is where the disk with the dump device
> lives, though it is not part of an array.
> 
> Questions:
> 1) What does each period mean when it is saving the panic image?





Each dot represents 1/80 of your total memory, so a complete dump should
fill one entire line of the console screen.

> 2) Any ideas why it is so slow?

Each dot is about 25MB on your system, so it might be slower than you
might be used to from another system.  On the other hand, 8 minutes is
far too long for a mere 25MB.

Are there any other devices on the Adaptec/DPT 3400s controller where
the dump device lives?  In other words: do you have any evidence that
this adapter is working normally?

You say you "created an empty 5 gig partition as our new default dump
device (as per SCO TA)".  First, which TA?  Second, show us how you've
informed the kernel of this -- excerpt from /etc/default/boot or
/etc/conf/cf.d/sassign.  Finally, show us the device node for the device
you're dumping to, and output from `hwconfig -h`.

The dump could be slow due to the nature of the panic: if it was caused
by a problem in the disk drivers in the kernel, they might be in an
insane state.  Or there may be a problem with interrupt delivery.  I
believe the kernel dump code uses some tricks to continue writing even
if interrupts are not being received from the host adapter to which the
dump is being written; this would probably be slow (I'm not sure, have
never seen it in action).



> 3) Any other ideas or suggestions (I'm open to anything!)?

Can you tell whether the partial dump attempt wrote anything?  Probably
not, but you can next time.  Create a filesystem on the dump partition
or division; mount it; copy some files on; look at them.  Do not set the
system up to automatically mount this filesystem (i.e. don't run `mkdev
fs` or whatever the scoadmin equivalent is).  Just create enough stuff
to be recognizable.  Then unmount it.

If the system panics, it should overwrite that "watchdog" filesystem.
If the slow dumping behavior persists, you'll probably kick out of it
before the dump completes, but at least you can tell whether it was
doing anything, by trying to mount that filesystem.  If the dump was
writing anything it'll have trashed the beginning of the filesystem and
it'll be completely unrecoverable.

Another suggestion: link the kernel debugger into the kernel.  Edit
/etc/conf/sdevice.d/scodb, change "N" to "Y".  Now relink the kernel
_twice_ (it'll change some data table sizes the first time), then
reboot.

Once scodb is in, you have a new way to test, and new behavior in a
panic.  To test: boot the system to single-user mode.  Hit control-X on
the console to get into scodb.  At the scodb prompt, type "sysdump()".
This calls the kernel function that writes a panic dump.  It should go
through the usual motions.  Does it?  Do the dots move across the screen
at a reasonable pace?  (Let it finish if it's fast; hit RESET or power
if it's slow, and go figure out why...)

New behavior: if the system panics, it will drop to a scodb prompt. "?"
gives brief help.  You can get a call traceback with "stack".  If you
can transcribe the panic traceback, someone here might be able to see
what the problem is.

NOTE: if the console is in graphics mode (X or DOS Merge, for instance),
a panic will drop to scodb, but you won't be able to see what you're
doing.  It's best to leave the console in character mode when you're
trying to catch a panic in scodb.

> 4) I am thinking of paying Caldera to help us dig through the saved
> dump files (once we have a couple) and diagnose the problem - any
> thoughts on this?

Sounds good to me.  ;-}

>Bela<
 

If this page was useful to you, please click to help others find it:  

Your +1's can help friends, contacts, and others on the web find the best stuff when they search.

Comments?



Click here to add your comments



Don't miss responses! Subscribe to Comments by RSS or by Email

Click here to add your comments


If you want a picture to show with your comment, go get a Gravatar



Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Publishing your articles here

Jump to Comments



Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.

Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.

We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.

g_face.jpg

This post tagged:

       - Bela
       - SCO_OSR5




Unix/Linux Consultants

Skills Tests

Guest Post Here