APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

An example of using the scodb debugger (SCO Unix)

Taken from a newsgroup post




From: Bela Lubkin <filbo@armory.com>
Subject: Re: Dying processes (inetd, cron, syslogd, sshd)
Date: 15 Aug 2005 14:53:55 -0400
Message-ID: <200508151153.aa00461@deepthought.armory.com> 
References: <1122912804.902870.308720@g44g2000cwa.googlegroups.com>
<1124115144.362415.307700@o13g2000cwo.googlegroups.com> keith@actual-systems.com wrote: > We were tripped into scodb as we had setup a watch point on the inetd > process. > > We work out the pid of inetd using crash (p | grep inetd) - then we > convert it to hex and go into scodb. > > ------------- > # crash > dumpfile = /dev/mem, namelist = /unix, outfile = stdout > > p | grep inetd > 29 s 256 1 256 0 76 0 selwait inetd load nxec > > q > > (PID 29=1D in hex) > > Ctrl+alt+D > debug0:1> bp wl &proc[1d].p_utime > debug0:2> q Ok, that should work. You can skip the `crash` part, use "ps()" in scodb and pick out inetd's PID there (already printed in hex). But you can't pipe it to "grep inetd", so that's a bit of a negative... > Then once this problem occured we were in debug mode and I did the > following to get the dumpfile I have. > > ------------------- > debug0:4> bp dis * > debug0:5> sysdump() > debug0:6> bc * > debug0:7> q > # sysdump -i /dev/swap -fbhm -o /usr/tmp/dump > --------------------- > > Now the following is the output when I used scodb to look at the > dumpfile I have - I would have followed your advice about getting the > PID of inetd, but I think the process must have already stopped by the > time I had the dump so I just did 'stack' to give you everything and > then 'p' from within crash to get al the running processes. Again, "ps()" in scodb will do that. The process was dying when the dump was taken. Your three pieces of evidence here are inconsistent with each other. You've set up to break when PID 0x1D is dying. The `scodb -d dump` output from after the breakpoint says it's showing PID 0x147. And the `crash -d dump` output has an inetd PID 0x34. So at best I have to assume these are from three different events. > back:/usr/tmp # scodb -d dump > dumpfile = dump > namelist = dump > stunfile = dump > varifile = dump > PID 0147: /etc/inetd > scodb:1> > scodb:1> stack > E0000DC0 exit2(0, 9) <- psig+18F > E0000DE8 psig(? 8057644, 0, 0, F01D4D98) <- systrap+39F > E0000E10 systrap(E0000E1C) <- > scall_nokentry+14 > scodb:2> q `exit2(0, 9)' is, I believe, delivering a signal 9 (SIGKILL) to the process. There are few things inside the kernel that generate SIGKILL. This suggests a user process is deliberately killing the dying processes. > # crash -d dump > dumpfile = dump, namelist = dump, outfile = stdout > > p > PROC TABLE SIZE = 83 > SLOT ST PID PPID PRI EVENT NAME FLAGS > 0 s 0 0 95 runout sched load sys lock nwak > 1 r 1 0 80 init load nwak > 2 s 2 0 95 vhand vhand load sys lock nwak nxec > 3 r 3 0 95 bdflush load sys lock nwak nxec > 4 r 4 0 36 CPU1 idle proc load sys lock nxec 2000000 > 5 p 5 0 36 CPU2 idle proc load sys lock nxec 2000000 > 6 s 6 0 95 kmd_id kmdaemon load sys lock nwak nxec > 7 r 7 1 95 htepi_daemon load sys lock nwak > 8 s 8 0 95 pbintrpool strd load sys lock nwak nxec > 10 r 55 1 73 ifor_pmd <-- load nxec exit > 11 z 57 55 76 zombie <-- nou nxec exit > 12 r 52 1 80 syslogd <-- load nwak nxec exit > 13 s 43 1 95 0xc1031150 htepi_daemon load sys lock nwak > 14 r 524 1 75 getty load > 15 r 85 1 75 strerr <-- load exit > 18 r 525 1 75 getty load > 20 r 526 1 75 getty load > 21 r 276 1 81 cron <-- load nwak nxec exit > 25 s 206 1 95 0xc1035150 htepi_daemon load sys lock nwak > 26 r 210 1 95 htepi_daemon load sys lock nwak > 29 z 568 1 76 zombie <-- nou exit > 31 z 384 1 76 zombie <-- nou nxec exit > 33 z 382 1 76 zombie <-- nou nxec exit > 34 p 327 1 76 inetd <-- load nxec exit > 35 r 17987 1 80 getty load nwak > 41 r 392 1 75 lockd <-- load nxec exit > 42 r 393 392 75 lockd load nxec > 43 r 394 392 75 lockd load nxec > 44 r 395 392 75 lockd load nxec > 45 r 396 392 75 lockd load nxec > 47 r 434 1 80 calserver <-- load nwak nxec exit > 48 r 17869 1 76 rsync <-- load exit > 49 r 442 1 81 caldaemon <-- load nwak exit > 50 r 449 1 66 prngd <-- load nxec exit > 53 r 527 1 75 getty load > 54 r 507 1 76 sshd <-- load nxec exit > 55 r 528 1 75 getty load > 56 r 529 1 75 getty load > 57 r 530 1 75 getty load > 58 r 531 1 75 getty load > 59 r 532 1 75 getty load > 60 r 533 1 75 getty load > 61 r 534 1 75 getty load > 62 r 535 1 80 sdd <-- load nwak exit > 63 z 17871 17869 88 zombie <-- nou exit > 64 z 17240 276 73 zombie <-- nou exit > 65 r 17872 1 75 rcmd <-- load nxec exit > 68 s 17900 17869 95 0xfc2e3b90 rsync load nwak nxec I deleted some less interesting columns from the table. Notice the large number of processes with the "exit" flag. I marked them with "<--" to make them visually obvious. These are all dying. Does this list more or less match your experience of what dies in one of these events? (Your mental list should be a little longer since there are a few here that have gotten far enough along in dying that we can't see what they were running, just "zombie"). To capture a better dump next time, I suggest rigging the system up so that it breaks into scodb when _any_ of these processes die. That means you'll catch it as early as possible -- possibly while the "attacker" process is still issuing the SIGKILLs. > And also looking at the memory as suggested shows it wasn't critically > close to it's limits (unless this is showing the freemem etc after the > inetd process has stopped.) > > scodb:2> freemem > 6EEA0 > scodb:3> freeswap > 100000 > scodb:4> availrmem > 6FB0C > scodb:5> availsmem > 16E941 Right, that makes it look like it has nothing to do with memory. > As always your help here is very much appreciated. I'm wondering if we should take it offline -- I feel like I should be looking directly at one of these dumps (hopefully one collected as early as possible in an event). Instead of setting up individual breakpoints for all the processes you expect to die, you can use the fact that it's SIGKILL to your advantage. Take a look at the code for kill(): scodb> u kill+175 kill+175 movl %eax,4(%edi) kill+178 cmpl %eax,9 kill+17D jne kill+18C kill+17F movb %al,-1(%ebp) If your kernel matches the one I'm looking at here, kill+17F is an address only reached during a `kill(pid, 9)' call. (If your kernel is different: do "u kill", then type "/,9". It will scroll forward to the right point in the disassembly. Hit <Return> a few times after that, then "q" to exit the disassembler.) Set a breakpoint on that ("bp kill+17F"). Test it by: $ sleep 100 & [1] 27388 $ kill -9 27388 ... and you should drop into the debugger. Quit. There shouldn't be a lot of SIGKILLing going on in a normal working system. Hopefully this won't trigger until the problem event. If it does, you might have to find another way. But if it does false trigger, you should also try to understand who is killing -9 whom -- it might be an important clue. Suppose a particular daemon has a habit of creating a child, letting it run for a while, then killing it. Suppose its memory of the child PID is corrupted somehow. e.g. what if it does: do_something & childpid=$! ... ps -ef | grep $childpid | awk '{ print "kill -9 " $2 }' | sh This is horribly bad code, do not use it as an example of anything besides trouble... So this "works fine" for a while, when $childpid is 12345. But one day it runs its child right when the system PID counter has rolled over, gets PID 23. `ps -ef | grep 23` is likely to match several processes other than the intended victim. Anyway, if you trap on delivery of signal 9, you're likely to catch the problem happening. And if it happens without hitting that trigger, you learn that it _isn't_ a process-to-process kill, it's actually coming directly out of the kernel -- also very interesting news. >Bela<

Got something to add? Send me email.





(OLDER)    <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> -> An example of using scodb debugger (SCO Unix)



Increase ad revenue 50-250% with Ezoic

Kerio Samepage


Have you tried Searching this site?

Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us