Date: Wed, 20 Feb 2002 06:07:45 -0500 From: Tony Lawrence <tony@aplawrence.com> Subject: Re: How to trouble-shoot "file corruption"? JP wrote: > My client has a major business application written in Providex and run on an > SCO box. A year ago, they built a clone Pentium III with hardware RAID to > house the application. This SCO (5.0.5) box had file corruption almost > every day that required some data files to be rebuilt. Having suffered for > a year, they bought a brand new HP LH3000 PC with integrated RAID and a pair > of fast mirrored drives. They also upgraded the SCO to 5.0.6a. > Unfortunately, after excited for 2 months, the same problem occurred again. > > The application vendor suggested that it was a hardware problem. However, > after performing some diagnosis under the direction of HP, no problems were > reported. Athough the files seemed to be scrambled, there was no signs of > file system corruption after running fsck. I am not quite convinced that it > was a hardware problem. > > Are there any ways to find out if it is a program bug, an OS problem or > hardware issue? Any ideas?
There's no easy, absolute way to prove that the app is messing up the file, but you can make some efforts toward that end. Start by renaming the current data file and then copy it to its original name. Since the version with the new name uses the same disk space and inode numbers that the old file did, it should scramble if the disk hardware is messing up those areas of the disk (you check that by running sum on it regularly- it shouldn't change). To check that a disk controller is scrambling things ( I have had this), you need to write some test programs that create similar sized files and write to them in similar patterns to the app. I find the easiest way to do this is to write the same random data to 3 or 4 files with the writes randonly spaced over a good chunk of time- then run sum on the files, which of course should all sum exactly the same. If they don't, the controller starts to look very suspect. Otoh, memory could do this. *Very* unlikely, because this depends on the bad memory never being used by the OS (which would always cause a panic if used for code and surely would cause strange behavior otherwise) and always being used by the app *only* for data- but I guess anything is possible under some contrived circumstance, so to eliminate that, we write a little program that just fires up and allocates a nice block of memory similar to what the app uses (you can get that from ps -el). This app writes different patterns into its space, and reads it back (it's a memory tester, so you need different patterns to check stuck bits, bleeding bits, etc.). You need to start it before starting the real app. It's probably a silly exercise, but if you have to PROVE something to someone.. After all that, you are going to get the argument that some other program on the system is writing into the file (can you tell I've been through this once or twice?). If you have lots of disk space you can turn on auditing and show that no other program every touched the data file. If you don't have the space, you have to take running snapshots with fuser or lsof- that may not satisfy a very stubborn vendor who is convinced that *their* programs never screw up.
And when all is said and done, it's almost always "their" fault. Not that I'm complaining about the income opportunities, of course :-) -- Tony Lawrence SCO/Linux Support Tips, How-To's, Tests and more:
More Articles by Tony Lawrence

Have you tried Searching this site?
Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.
Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.
Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.
We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.
Click here to add your comments
Don't miss responses! Subscribe to Comments by RSS or by Email
Click here to add your comments
If you want a picture to show with your comment, go get a Gravatar