APLawrence - Information and Resources for Unix and Linux Systems, Bloggers and the self-employed
RSS Feeds Get APLawrence.com by RSS











(OLDER) <- More Stuff -> (NEWER) (NEWEST)
Home > News Posts > X windows hang 5.0.7 X11, crash debugging, video interrupts,high resolution timers, kernel panic crash
Printer Friendly Version




News Group Posts

X windows hang 5.0.7 X11, crash

debugging, video interrupts, high resolution timers, kernel panic crash



From: Bela Lubkin <belal@sco.com>
Subject: Re: Xwindow hang on osr507
Date: Wed, 8 Oct 2003 08:39:36 GMT
References: <3F81C954.12D82EC5@tenzing.org>
<20031006203426.GL672@jpradley.jpr.com>
<20031006232654.GA714@sco.com>
<3F83582E.4C0B941C@tenzing.org> Roger Cornelius wrote: > > > | I have two dissimilar 5.0.7 systems which exhibit the same problem. > > > | When exiting from a console X session, X hangs approximately 75% of the > > > | time. It appears to be exiting, but I end up with a blank root window > > > | with the crosshatch pattern and an "x" as the mouse pointer. I can move > > > | the pointer but nothing else. Alt-Fkey or ctrl-prtscreen will switch > > > | away, but I just get a blank screen. Attempting to switch to another > > > | tty again results in a beep. > > > | > > > | The systems: > > > | IBM x345 > > > | SCO odt window manager > > > | On board video identified by mkdev graphics as: > > > | ATI RAGE PRO/LT-PRO/XL/Mobility (P/M/M1) > > > | Also tried an ATI Xpert@Play card with same results. > > > | > > > | Dell Precision 330 > > > | fvwm2 window manager > > > | Matrox Millenium G200 (configured for Matrox G100/G200/G400 series > > > | adapters) > > > | > > > | Both systems have osr507mp and osr507up installed. > > > | > > > | I've tried various resolution configurations in mkdev graphics but no > > > | change in the problem. > > > | > > > | After the hang and from another login, I can kill the X process which > > > | results in a black or sometimes garbled screen. I can log in again, > > > | though I can't see what's happening on the screen. On the Dell box, I > > > | can then log out and the screen returns to normal. On the IBM box, > > > | logging out just gives me another blank screen.












I asked you to try editing each entry in the active grafinfo file to
add:

> >   MEMORY(VID, 0x000A0000,0x0020000);    /* Standard VGA video memory window */

after the existing "MEMORY" line(s) in each mode.  You say:

> This changed the behaviour on the IBM system and possibly fixed it on
> the Dell.  For the latter, the couple of opportunities I've had to exit
> X worked correctly.

Perhaps you could cycle it a few more times for confidence?  If it's as
random as it seemed, just running the X server and exiting as quickly as
possible ought to be a decent "smoke test".

>                      For the former, I exited X three times today.  The
> first time, I was returned to the shell prompt as should be normal.  The
> second time, I got a blank, black screen, like JPR described, which I
> used to log in blind, then ran clean_screen which got the video back. 
> The third time, I got a kernel panic and reboot.



So previously the X server was hanging on exit (not affecting the whole
machine) about 75% of the time.  I assume that 75% is a very rough
estimate.  Now, out of 3 samples, one exited cleanly and two more went
wrong (in different ways).  So without further examination of the
failure modes, I would tend to conclude that whatever was causing the
problem is still happening.  Only the failure modes have changed.  That
is, if you were to run 100 cycles under the new setup, you would see
about 25 successful exits, about 75 failures -- same as before.

Since the new failure modes include worse options (panic vs. a mere
unusable screen), you should probably undo the patch on the IBM.

Repeating part of the original message:

> > > | After the hang and from another login, I can kill the X process which
> > > | results in a black or sometimes garbled screen.  I can log in again,
> > > | though I can't see what's happening on the screen.  On the Dell box, I
> > > | can then log out and the screen returns to normal.  On the IBM box,
> > > | logging out just gives me another blank screen.

Let's go back to the original grafinfo file.  After a "bad" exit, you
seem to be saying the X server is still running.  You can see this from
a network login, so the rest of the system is fine.

I don't quite understand from this description what happens on the IBM
when you run a new X server.  Are you saying that it too is blank, or
that it displays normally?  In other words, has the console become
totally unusable at this point, or are you able to return to a usable X
server as often as you want, but not to text mode?

Anyway, next time the exit hang happens, examine that X server's process
tree.  In particular, does it have a subprocess called `vbiosd`?  What
happens if you kill _that_ rather than the X server -- does X then
finish exiting in a more normal manner?

I'm thinking that you may end up with a still blank or trashed screen,
but at least your ability to flip multiscreens should return.  It might
be that you can flip, but still can't see what you're doing.  But you
should be able to distinguish between e.g. a multiscreen that was
sitting at a shell prompt; `echo '\07'` will beep -- vs. one that was
sitting at a login prompt.

Once the X server has exited relatively gracefully, try to get to a
shell prompt and run /etc/clean_screen.  If you can't get to a shell
prompt on the console, run it from the network login as `clean_screen
< /dev/tty02` (substituting the name of the tty on which X was running
-- or, if you've flipped multiscreens, the one you think is currently
"displayed").

I'm trying both to develop a viable workaround for temporary use; and to
better understand the problem so that we can solve it permanently
without a clumsy workaround.  So please describe the results very
carefully.

Now, back to the panic:

>                                                   Here are [what I think
> are] the important parts of the output of crash's panic command:
> 
> Unexpected trap in kernel mode:
> cr0 0x8001003B     cr2  0x0011001C     cr3 0x00002000     tlb  0x00000000
> ss  0x00000001     uesp 0x0080A2CC     efl 0x00010286     ipl  0x00000000
> cs  0x00000158     eip  0xF005919A     err 0x00000002     trap 0x0000000E
> eax 0x00002000     ecx  0x00000001     edx 0x00000014     ebx  0xE0000E1C
> esp 0xE0000DE0     ebp  0xE0000E0C     esi 0x00000001     edi  0x00000000
> ds  0x00000160     es   0x00000160     fs  0x00000000     gs   0x00000000
> cpu 0x00000001

...

> Kernel Stack before Trap:
> STKADDR   FRAMEPTR  FUNCTION   POSSIBLE ARGUMENTS
> e0000de0  e0000e0c  v86vint    (u+0xe1c,0)

Hmmm.  Well, it panic'd while running code under an interrupt that was
being serviced in virtual 8086 mode.  Presumably that would be an
interrupt that was provoked by something the adapter's BIOS did while
coming down from graphics mode; and should have been handled by code
within the BIOS.  The panic was a trap E (an illegal memory reference);
the bad reference address was 0x11001C (CR2).  That address isn't a
sensible address for BIOS code to be accessing.  We have no basis to
determine whether this is a BIOS bug or a bug in the simulated 8086
environment under which the Unix kernel is running the BIOS.

This does remind me of another thing that you should try, though.  In
fact something that all three of the original posters should try.  Many
modern systems have a BIOS setup item that boils down to "Should an
interrupt vector be assigned to the video board?".  In most cases this
should be set to "no" for Unix.  To be precise, I do not know of any
case where it needs to be "yes", but I could easily believe that some
video BIOSes might require it and I simply haven't run into one.  This
is another one of those things that you'll learn about right away: if
you turn it off and the board/BIOS really need it, getting _into_ X will
fail and you'll back out the change.

Yet a third thing that you could try is to disable the high-precision
timer interrupts that were first introduced in OSR506.  To do this, boot
with "defbootstr clock.disable_short_timers=1".  The BIOS code may be
getting an unexpectedly high speed stream of timer interrupts, which
could get it in trouble.

> I'll post again as I have more details, but I won't have console access
> to the IBM again until Thursday.

I've given you several conflicting ideas to try.  When you have access,
you'll have to decide what to fiddle with.  I don't think it would be
wise to try more than one of these ideas at the same time, because you
wouldn't be able to tell which behavior changes were caused by what.

I think my order of attack would be:

  1. Revert to the original grafinfo -- the change didn't help in this
     case, and made the failure mode worse at times

  2. Disable VGA IRQ in BIOS setup; test

  3. Unless that made X unusable, leave it off even if it didn't help,
     because it leaves more IRQs free for other devices

  4. Try "defbootstr clock.disable_short_timers=1"; test

  5. If that doesn't fix the problem, reboot without it and forget about
     that setting

  6. If neither of those fix the problem, work towards a workaround
     based on killing `vbiosd` and running `clean_screen`

  7. Comment on all the steps you took so we learn what was really
     relevant...

>Bela<


If this page was useful to you, please click to help others find it:  

Your +1's can help friends, contacts, and others on the web find the best stuff when they search.

Comments?



Click here to add your comments



Don't miss responses! Subscribe to Comments by RSS or by Email

Click here to add your comments


If you want a picture to show with your comment, go get a Gravatar



Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Publishing your articles here

Jump to Comments



Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.

Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.

We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.

g_face.jpg

This post tagged:

       - Bela
       - SCO_OSR5




Unix/Linux Consultants

Skills Tests

Guest Post Here