APLawrence.com -  Resources for Unix and Linux Systems, Bloggers and the self-employed

manipulating image files

From: "Brian K. White" <brian@aljex.com>
Subject: Re: Can I compress GIF files the same way I do JPG files ?
Date: 4 Nov 2005 18:16:03 -0500
Message-ID: <024101c5e195$b5db1de0$6600000a@venti> 
References: <C6uaf.12856$Zv5.6812@newssvr25.news.prodigy.net> 

----- Original Message ----- 
From: "E Arredondo" <atk@sbcglobal.net>
Newsgroups: comp.unix.sco.misc
Sent: Thursday, November 03, 2005 3:08 PM
Subject: Can I compress GIF files the same way I do JPG files ?

>I normally run this command from the shell to create thumbnails :
> /usr/local/bin/djpeg one.jpg | /usr/local/bin/pnmscale 0.2 | 
> /usr/local/bin/cjpeg > tnailone.jpg
> Is there something similar for GIF files, or should I find a GIF to JPG 
> translator so I can use the above command to create my thumbnails ?
> /usr/local/bin/djpeg one.gif | /usr/local/bin/pnmscale 0.2 | 
> /usr/local/bin/cjpeg > tnailone.gif
> Thanks

I use ImageMagick, which has a program "convert", which does everything in
one shot.

I have it in a cgi script that displays thumbnails of scanned documents, and
it creates the thumbnail on the fly if it doesn't already exist, is older
than the matching fulls size image, or is zero bytes (a trick I use for a
place-holder for a page that has been scanned but has not yet been uploaded
to the server)

# curl -s http://www.aljex.com/bkw/sco/ImageMagick.tar.bz2 |bzcat |tar xvf -

snipped from the real script
# convert -geometry 280x160 ${IMGPATH}-${DOCPAGE}.${FMT}
${THMPATH}-${DOCPAGE}.${FMT} >>$TL 2>&1
or rather:
# convert -geometry 280x160 fullsize.png thumbnail.png

Try just "convert --version" first.

You may need to update your oss646 and gwxlibs before it runs.

Note there is a "convert" that is part of the sco development system that
you might have, and that might be in your path ahead of
/usr/local/bin/convert. From it's man page I decided I'll probably never use
it and just renamed it.

Note that the above command can be used in at least 2 versions that each
work different amounts and produce different quality of results.
Keep the command all the same but replace "-geometry" with "-sample"
-sample does a lot less work, and so is faster and kills your cpu less, but
produces a much cruder result.

You can convert from one image type to another at the same time too.
convert -geometry 280x160 fullsize.png thumb.gif

Some of the things it can do, it uses ghostscript to do the work so you
should install that too. (the version of afpl-gs on my site might be the
most up to date version available at the moment.)

It can combine a series of images into a single multipage ps, pdf, tiff,
even pcl.

All this is expensive though. Some operations can take 30 seconds of 100%
cpu, per page, on a fast server, and use HUGE amounts of temp space in
/usr/tmp, which is probably on your root partition, which may not really
have the room for that kind of abuse. And sometimes it can fail, leaving
those huge files behind, leaving you closer to crashing due to no room on
your root fs.

So do not get carried away with the magic ability to convert anything to
anything. Pick an image format that will serve most needs just like it is, a
compromise format, and scan into that format in the first place, and try to
convert it as little as possible. It works ok for small numbers but it won't
scale at all. You can't run a report that converts 20 pcl and png files of a
statement into a pdf, mime-encodes it into an email and mails it, once for
each of the thousands of customers in your customer file for example. But
you can send an html with the same images mime encoded as inline
attachements, or send an email with a special url that hits a cgi script
that views the same images.

I have been using two different formats
200dpi 1bit png
pros: smallest file size 30 to 60k typical, sometimes over 70, rarely
reaches 100k even for very dense detail filled sheets, exactly fax quality
(except scanners are better than fax machines generally both in terms of the
initial scan, and the fact that there os no data loss as happens over the
phone line), high enough resolution for pretty fine print, when converted to
the same resolution and same color depth tiff for feeding into vsi-fax, it's
a fast lightweight conversion, web browsers all display png's natively so no
image viewer is needed unlike tiff, compresion is lossless unlike jpeg,
format is unencumbered unlike gif. (except NOW gif is ok too. but I was
doing this long before that)
cons: the 1bit means that some times a low contrast sheet will wash out as
too much white or too much black (there is no grey) and you can't read the
crucial text of the document.
Playing with the threshhold value of the scanner driver can sometimes
aleviate this. Sometimes it's not good enough.

100dpi 4bit greyscale jpeg 45% quality setting
pros: low-contrast pages can be read. (think light foreground, like the
middle sheet of a weak dot matrix carbonless form, on a dark background,
think green/blue/pink sheets of a dot matrix carbonless form), every pc can
view them via the browser no viewer app or plugin needed.
cons: 100dpi is very low, still ok for screen, prints okish on high res
(600dpi) printers, prints horribly on 300dpi or lower printers, faxes
utterly utterly horribly, more expensive conversion for faxing due to both a
resizing/resampling, and a bit depth change, and even with the heavy
compromises of the low dpi, the lossy jpeg compression, the very low jpeg
quality setting (ie: makes it even more lossy, more of an approximation of
the original data than a record of it) even after all that, the file size is
about 3 times larger than the 200dpi 1bit png's

When you scan thousands of sheets per day, for years, you better be doing
your math and be prepared with hard drive space, tape backup space, and a
method for archiving old images out of live production after a few years. Or
bank on the pace of technology keeping up so that by the time you would need
to archive, you can instead just upgrade and keep it all. So far for us,
after about 5 years since we added scaning to our application, most (not
all) of our customers who have scanning have been able to to upgrade at the
same times they normally would have, and the pace of technology has been
going a little faster than how fast they accumulate images. So when they
fill up, and get a new server, the hard drives are bigger & cheaper enough
that they can keep all they'd scanned, and still have room to continue at
that pace for another 5 years, which by that time it will be upgrade time
again anyways, if not earlier. A couple people scanned at a much higher rate
and we added larger hard drives to their existing box (added, not replaced
so it was painless) and replaced the tape drive with a larger & faster one.

In my case, I use my existing application as the filing & retreival
mechanism for the images, so each page is named by the application at
scan-time, according to what file/record/accountnumber etc... the user is
sitting on. The application actually sends the instruction to the users pc
telling it to scan and what filename to save it as. For this to be
practical, every page is a seperate file. A "document" is just a set of file
names that match a pattern. This allows individual pages to be deleted and
added by different users at different times easily. I decided that dealing
with multipage files would not be a useful, however much users sometimes
think they want a pdf. I can and do give them web pages and cgi scripts and
html mime emails that present the "document" roughly the same as what a pdf
would do, without requiring the end user to have acrobat installed, nor wait
for it to load up, nor click on an attachment, nor deal with security
settings that may disallow the attachement. I have figured out a reasonable
"good enough" answer for printing too, based on html, which can also be put
into an email. Printing (pagination in particular) was the one big
legitimate need for using pdf.

I'm not sure I would try as hard to avoid pdf today if I was designing the
system from the beginning today. These days a lot of scanners have the
ability to create pdf's and a lot of people are prepared and expecting to
deal with them. I don't know about the faxing issue. Maybe vsifax faxes them
natively these days, or maybe hylafax. Maybe I could accept the less
convenient ability (or inability) to selectively remove or replace a single
page within a document if that was my only problem. I still think I'd rather
receive a hundred html emails with embedded inline images than a hundred pdf
attachement emails every day from my various vendors and customers and I had
to open and view them all as part of my all day every day job. It's a
million times faster. You have to think in those terms, not open one test
email and say it's fine to click on the attachement, possibly ok through a
security dialog, wait for acrobat...

And I'm sorry for veering off into what is obviously a pet peeve of mine :)

Brian K. White  --  brian@aljex.com  --  http://www.aljex.com/bkw/
filePro  BBx    Linux  SCO  FreeBSD    #callahans  Satriani  Filk!

Got something to add? Send me email.

(OLDER)    <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> -> manipulating image files

Increase ad revenue 50-250% with Ezoic

Kerio Samepage

Have you tried Searching this site?

Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us

Zawinski's Law: Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can. (Jamie Zawinski)

This post tagged: