APLawrence - Information and Resources for Unix and Linux Systems, Bloggers and the self-employed
RSS Feeds Get APLawrence.com by RSS











(OLDER) <- More Stuff -> (NEWER) (NEWEST)
Home > News Posts > troubleshooting data file corruption testing disks memory––>Re: How to trouble-shoot 'file corruption'?
Printer Friendly Version




News Group Posts

trubleshooting data file corruption testing disks

memory


Date: Wed, 20 Feb 2002 06:07:45 -0500
From: Tony Lawrence <tony@aplawrence.com>
Subject: Re: How to trouble-shoot "file corruption"?

JP wrote:

> My client has a major business application written in Providex and run on an
> SCO box.  A year ago, they built a clone Pentium III with hardware RAID to
> house the application.  This SCO (5.0.5) box had file corruption almost
> every day that required some data files to be rebuilt.  Having suffered for
> a year, they bought a brand new HP LH3000 PC with integrated RAID and a pair
> of fast mirrored drives.  They also upgraded the SCO to 5.0.6a.
> Unfortunately, after excited for 2 months, the same problem occurred again.
> 
> The application vendor suggested that it was a hardware problem.  However,
> after performing some diagnosis under the direction of HP, no problems were
> reported.  Athough the files seemed to be scrambled, there was no signs of
> file system corruption after running fsck.  I am not quite convinced that it
> was a hardware problem.
> 
> Are there any ways to find out if it is a program bug, an OS problem or
> hardware issue?  Any ideas?





There's no easy, absolute way  to prove that the app is messing up the 
file, but you can make some efforts toward that end.  Start by renaming 
the current data file and then copy it to its original name.  Since the 
version with the new name uses the same disk space and inode numbers 
that the old file did, it should scramble if the disk hardware is 
messing up those areas of the disk (you check that by running sum on it 
regularly- it shouldn't change).

To check that a disk controller is scrambling things ( I have had this), 
you need to write some test programs that create similar sized files and 
  write to them in similar patterns to the app.  I find the easiest way 
to do this is to write the same random data to 3 or 4 files with the 
writes randonly spaced over a good chunk of time- then run sum on the 
files, which of course should all sum exactly the same.  If they don't, 
the controller starts to look very suspect.

Otoh, memory could do this.  *Very* unlikely, because this depends on 
the bad memory never being used by the OS (which would always cause a 
panic if used for code and surely would cause strange behavior 
otherwise) and always being used by the app *only* for data- but I guess 
anything is possible under some contrived circumstance, so to eliminate 
that, we write a little program that just fires up and allocates a nice 
block of memory similar to what the app uses (you can get that from ps 
-el).  This app writes different patterns into its space, and reads it 
back (it's a memory tester, so you need different patterns to check 
stuck bits, bleeding bits, etc.).  You need to start it before starting 
the real app. It's probably a silly exercise, but if you have to PROVE 
something to someone..

After all that, you are going to get the argument that some other 
program on the system is writing into the file (can you tell I've been 
through this once or twice?).  If you have lots of disk space you can 
turn on auditing and show that no other program every touched the data 
file. If you don't have the space, you have to take running snapshots 
with fuser or lsof- that may not satisfy a very stubborn vendor who is 
convinced that *their* programs never screw up.














And when all is said and done, it's almost always "their" fault.  


Not that I'm complaining about the income opportunities, of course :-)


-- 
Tony Lawrence
SCO/Linux Support Tips, How-To's, Tests and more: 





If this page was useful to you, please click to help others find it:  

Your +1's can help friends, contacts, and others on the web find the best stuff when they search.

Comments?




More Articles by



Click here to add your comments



Don't miss responses! Subscribe to Comments by RSS or by Email

Click here to add your comments


If you want a picture to show with your comment, go get a Gravatar


cartoon

Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Publishing your articles here

Jump to Comments



Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.

Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.

We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.

pavatar.jpg

This post tagged:

       - Disks/Filesystems
       - Memory
       - Troubleshooting




Unix/Linux Consultants

Skills Tests

Guest Post Here