APLawrence - Information and Resources for Unix and Linux Systems, Bloggers and the self-employed
RSS Feeds Get APLawrence.com by RSS











(OLDER) <- More Stuff -> (NEWER) (NEWEST)
Home > News Posts > defragmenting Unix filesystems ––>Re: Defrag freeware?
Printer Friendly Version




News Group Posts

defragmenting Unix filesystems




Newsgroups: comp.unix.sco.misc
From: bill@wjv.com.REMOVEME (Bill Vermillion)
Subject: Re: Defrag freeware ?
Date: Tue, 16 Nov 1999 02:03:46 GMT

In article <80ovid$3n4$1@soap.pipex.net>, Marc Redmile-Gordon
<marc@carsplus.co.uk> wrote:

>Tom has informed me of my erroneous thinking, but, the way I "see"
>disk writes & deletions ( just as displayed in a "FAT" windows
>defrag tool ) - it seems inevitable that after a period of time the
>disk will become fragmented.

>What makes Unix's file system so different from the FAT system ?

[This started off to be a short reply - but got a bit long winded,
so if you aren't interested in filesystems you may hit 'n' now.
It's long, but it isn't HTML - wjv]

One way I would describe this is that Unix has a file 'system', and
DOS has a file handler.   There is a world of difference between
the strategies behind the two.



The FAT file-system is really nothing more than the original file
system that came out with CPM - in the mid 1970s -with changes along
the way to handle newer/larger devices, but still hewing to most of
 the original design.

On a FAT disk there is an area of the disk reservered for file
names, and a block of bytes associated with this which will then
point to blocks on the disk which are allocated for . The only way
it knows how to add more data is find the first free block on the
disk and start filling it up from there.

Let's say file "A"  looks like this in the disk  AAAAAAAAAA
then we create file B so the disk looks like  AAAAAAAABBBB, and
more to 'a' and it's now AAAAAAAAAABBBBAAAA.  This continue along
if you delete any files and addfile until it looks like alphabet
soup.  (of course if you never delete any file you won't have
fragementation)

The advent of DOS 2.0 brought forth the hierarchical file system,
as up to that point you could not store more than 512 files on a
disk. The hierarchy made it possible increase the number of files
available for storage which was needed to handle the new 'hot' items
the $2000 5MB hard disks that were just coming out.

However if you look at the IBM floppies, or the hard disk, with a
binary editor you can see a pattern   of  e5e5e5e5, and if you
delete a file the first letter of the file name is replaced with
e5.

Where did this come from. That is the worst case pattern for single
density disks. By 'worst case pattern' that means that if you write
this pattern and there is any problem with disk, then this is the
pattern that is most likely to fail first. Double density uses a
different pattern but the MSDOS world still kept the old convention.
That meant that testing with less than the worst-case problem could
pass disks that will later fail. (Now you know why you had problems
with floppies in that OS!). 

It has such other 'fun' things (until the NT file system or the 32
bit system) of allocating up to 16Kbytes for a 1 byte file.  It has
been hamstrung by the old design.  And this is called a 'system'.
Hah.

The System V file system, also had problems earlier but advances
over the past few years have elminated most of them.

Originally Sys V had a free list which contained a list of free
blocks. This was not a list of ALL the blocks but the first group
of free blocks (ISTR it was 100 but may have been more). When the
last block on this list was used, the system would gather more free
blocks to add to the list. But then if
you deleted files you added the just deleted blocks to those blocks 
to the free list.  That meant that if the free list had blocks
100 to 200 on it, assuming that you had allocated 0 thru 99, and
then deleted it, you would be seeking to a lower number and then a
higher number when a new file was created - and you have just
started fragmenting the drive.

Those of us using these in the old days used to have cron
run    fsck -S    on the filesystem overnight.  The capital S option
says rebuild the freelist IF and ONLY IF the rest of the file
system is OK.  That kept the fragmentation at bay awhile longer, as
the data was put on the lower numbered blocks first.

In the meantime the Berkeley Software Distribtuion (BSD) tried to
overcome this with their Fast File System.

This system organized the hard drive into zones each of which
had several cylinders in it.  (A cylinder is a track, and all
tracks underneath that track physically on the bottom of the top
disk, and on both sides of all lower platters).

This meant that the only delay when moving from one sector to
another in a single cylinder would essentially be the time it took
to switch the data from one head to another.  This also took
into effect disk rotation so that when the last sector on a given
head was written the time was computer the data circuit would
switch to the next head.

There was a certain amount of time that this required, so there was
a delay added so that when the head was ready to write again, it
would start with the 'first' sector on that track, which was
rotationally further along than the 'first' sector on the previous
track.  But it essentially elminated all mechanical delay and with
bigger blocks and only headswitching delay, improved things
drastically.

The system also allocated files across the disk into each cylinder
group.  That means that data would be scattered across the disk,
but kept together from the very first useage.  It was attempting to
keep all data for a given file contigous, but leave enough space
for all new files to be contiguous if at all possible.

This goes counter to the DOS implementation where all file are kept
close together.

This is because DOS based systems are 'synchronous' operating
system - which means it can only handle one task at a time. The
computer can do nothing else while it is completing each task.
Writing a large file to disk slows things down. Having to seek
across a disk to find the pieces also slows things down.. That
makes defragmentation and drives with fast seek times almost
mandatory. That's where a local company made good by designing
a disk controller with enough memory so that it looked like a
hard-drive to the OS and the OS could go back to work. That company
is DPT (since acquired by Adaptec - it's in the process now)

In the modern Unix system - which vary among the installations -
and vendors - the files are written in an entirely different
manner.

The Unix systems are asynchronous file systems (for the most part).
That means when Unix tells you the file is written - by given you
your prompt back - or the program control back to you - it probably
still is sitting in cache in the OS.  Every little bit the OS
gathers up the data in cache and writes it to disk.  Different
implementations - depending a lot on the controllers - will
start at the first of a disk and just go to cylinders in numerical
order instead of seeking all over.

The fast file systems (a name started by BSD and now covers many)
allocates larger blocks - typically 8096 bytes at a time, and tried
to keep those block in a file contiguous.  Any data that doesn't
fill a full block is written to a 'fragment'.  When more data is
added to a file that has data in a fragment, when it reaches 8096
bytes (or more) a new 8K segement is added.  This means that all
reads are in 8K segments.  (Some Unix variants targeted to
multi-media and broadcast streaming video, allocate over 1MB at a
time).

The last white paper I read on SCO's fast file implementation 
differs mainly in the fact that BSD used 8K blocks, while the EAFS
system kept the block size but allocates up to 32 contiguous blocks
at once.  This overcomes the BSD problem of reallocating out of the
fragments.  This gets away from the original block at a time
method.

The current Sys5 file systems also gained speed by using a bit-map
instead of the free list.  Other thing such a sorting synchronous
file requests before asynchrnous requests also permitted return of
control to the user faster than in other systems.  

To sum it up, while the Unix file systems have progressed in design
over the past 20 or so years with new methods of implementing the
system to speed performance, organize storage, cache items, the
standard FAT system on DOS has really only undergone changes to
be able to handle increasingly larger file sizes, by increasing the
minimum number of bytes that are allocated to a file.

You did ask:
>What makes Unix's file system so different from the FAT system ?

That's just a brief overview.   Since my home machine has been *ix
based since 1983 - after spending 9 slow months with DOS 2 on a
jenyouine IBM peecee - I'm slightly prejudiced.

Bill
-- 
Bill Vermillion   bv @ wjv.com 

--------------050B5803F7AF064498B56BFD--


If this page was useful to you, please click to help others find it:  

Your +1's can help friends, contacts, and others on the web find the best stuff when they search.

Comments?



Click here to add your comments



Don't miss responses! Subscribe to Comments by RSS or by Email

Click here to add your comments


If you want a picture to show with your comment, go get a Gravatar



Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.

Publishing your articles here

Jump to Comments



Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.

Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.

We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.

g_face.jpg

This post tagged:

       - Defragmentation
       - Disks/Filesystems
       - SCO_OSR5




Unix/Linux Consultants

Skills Tests

Guest Post Here