|
Prev: Cannibals In The White House! Meet President George Walker Bush,Laura Bush, George Herbert Walker Bush, Barbara Bush, Dan Quayle, RonaldReagan, Nancy Reagan, Bill Clinton, Richard Nixon, Gerald Ford, Dwight D.Eisenhower, John F. Kennedy, And All The
Next: ANN: VintaSoftTwain.NET Library v3.0 has been released.
From: Terry Smythe on 24 Oct 2007 14:54 Can somebody point me to a web site containing advice on how to acquire a good quality PDF document vs tolerable file size? I'm starting a project to convert an association's monthly journals, going back to 1964, into PDF for web display initially followed by DVD's to members. There are 500+ issues. Each issue is an average 50 pages, 8 1/2 x 11. Color cover, gray scale images and line drawings scattered throughout. Text that ideally should able to capture. I'm currently using PaintShopPro9 through an HP4670 "See-Thru" scanner at 300dpi, into JPG for most pages, and TIF (Fax-CCITT3) for text only pages. I'm then using Acrobat 8 to create a PDF document by inserting all the images, followed by Acrobat OCR, followed by Acrobat optimization. All images comes throuat about 2400 pixels wide. Using these images with a current 68 page issue, my PDF document emerges at 32 megs after OCR and optimization. If I go one step further and resize all images down to 1500 pixels wide, file size shrinks to 7.3 megs after OCR and Optimization. Before image resizing, the PDF image is crisp and sharp. After resizing, image is somewhat fuzzy, but OCR capture appears unaffected and the document prints out quite nicely. Have I found an optimal process that gives me the best I can hope for? Or have others found a better process for this kind of application? BTW, I'm working with my own personal unbroken 43 year journal collection, and I'm not prepared to guillotine off the spines for auto-feed scanning. This is why I'm using the HP 4670, well suited for this situation. I'm also not prepared to OCR scan the original journals into Microsoft Word, followed by extensive editing. If I am to take this route, I do have ABBYY Fine-Reader OCR Professional 8, and have experimented with this approach. Not at all swift, particularly with many of the earlier journals which are not good quality. Thoughts of others for this application? Regards, Terry Smythe Winnipeg, Canada smythe(a)shaw.ca
From: Barry Watzman on 24 Oct 2007 16:34 I use an HP 5490C scanner, which has an ADF (automatic document feeder), and Adobe Acrobat 6. If you do this without an ADF it will take forever, but if you do it with an ADF, you will have to "unbind" the material .... best done with a printer's paper shear. 5490's can be bought on E-Bay really cheap these days, but be sure to get one with the power supply and ADF feed tray (note: The 5490C is just a 5470C with a C9866A document feeder, but the power supply has to be changed when you add the ADF (the new power supply comes with a 5490C or a C9866A but is different from the power supply that comes with a 5470C). When I say "cheap", I mean really cheap, like often under $20. Make up a profile for both the cover and the content (2 separate profiles) by scanning a single example of each manually (only to make the profile ... set the size, gamma, highlight and shadow so that there is no clipping (of blacks or whites) and no "bleed through"). Acrobat will do the scans just fine, scanning the odd pages (fronts) first and the even pages (backs) separately, and then properly merging them into a single document with pages in the proper sequence (if the material is not double sided, it's even easier). Do the scans all at 300 dpi unless there is specific reason to use a higher resolution. This doesn't do OCR, but Acrobat can do OCR (although I've never used it), or you can feed the PDF files into Omni-Page later. Terry Smythe wrote: > Can somebody point me to a web site containing advice on how to acquire a > good quality PDF document vs tolerable file size? > > I'm starting a project to convert an association's monthly journals, going > back to 1964, into PDF for web display initially followed by DVD's to > members. > > There are 500+ issues. Each issue is an average 50 pages, 8 1/2 x 11. > Color cover, gray scale images and line drawings scattered throughout. > Text that ideally should able to capture. > > I'm currently using PaintShopPro9 through an HP4670 "See-Thru" scanner at > 300dpi, into JPG for most pages, and TIF (Fax-CCITT3) for text only pages. > I'm then using Acrobat 8 to create a PDF document by inserting all the > images, followed by Acrobat OCR, followed by Acrobat optimization. > > All images comes throuat about 2400 pixels wide. Using these images with > a current 68 page issue, my PDF document emerges at 32 megs after OCR and > optimization. > > If I go one step further and resize all images down to 1500 pixels wide, > file size shrinks to 7.3 megs after OCR and Optimization. > > Before image resizing, the PDF image is crisp and sharp. After resizing, > image is somewhat fuzzy, but OCR capture appears unaffected and the document > prints out quite nicely. > > Have I found an optimal process that gives me the best I can hope for? Or > have others found a better process for this kind of application? > > BTW, I'm working with my own personal unbroken 43 year journal collection, > and I'm not prepared to guillotine off the spines for auto-feed scanning. > This is why I'm using the HP 4670, well suited for this situation. > > I'm also not prepared to OCR scan the original journals into Microsoft Word, > followed by extensive editing. If I am to take this route, I do have ABBYY > Fine-Reader OCR Professional 8, and have experimented with this approach. > Not at all swift, particularly with many of the earlier journals which are > not good quality. > > Thoughts of others for this application? > > Regards, > > Terry Smythe > Winnipeg, Canada > smythe(a)shaw.ca > >
From: Chuck Tribolet on 25 Oct 2007 00:46 If the pages are mostly text, use pmg, not tif and esp.not jpg. "Terry Smythe" <smythe(a)shaw.ca> wrote in message news:exMTi.139826$1y4.66993(a)pd7urf2no... > Can somebody point me to a web site containing advice on how to acquire a good quality PDF document vs tolerable file size? > > I'm starting a project to convert an association's monthly journals, going back to 1964, into PDF for web display initially > followed by DVD's to members. > > There are 500+ issues. Each issue is an average 50 pages, 8 1/2 x 11. Color cover, gray scale images and line drawings scattered > throughout. Text that ideally should able to capture. > > I'm currently using PaintShopPro9 through an HP4670 "See-Thru" scanner at 300dpi, into JPG for most pages, and TIF (Fax-CCITT3) > for text only pages. I'm then using Acrobat 8 to create a PDF document by inserting all the images, followed by Acrobat OCR, > followed by Acrobat optimization. > > All images comes throuat about 2400 pixels wide. Using these images with a current 68 page issue, my PDF document emerges at > 32 megs after OCR and optimization. > > If I go one step further and resize all images down to 1500 pixels wide, file size shrinks to 7.3 megs after OCR and Optimization. > > Before image resizing, the PDF image is crisp and sharp. After resizing, image is somewhat fuzzy, but OCR capture appears > unaffected and the document prints out quite nicely. > > Have I found an optimal process that gives me the best I can hope for? Or have others found a better process for this kind of > application? > > BTW, I'm working with my own personal unbroken 43 year journal collection, and I'm not prepared to guillotine off the spines for > auto-feed scanning. This is why I'm using the HP 4670, well suited for this situation. > > I'm also not prepared to OCR scan the original journals into Microsoft Word, followed by extensive editing. If I am to take this > route, I do have ABBYY Fine-Reader OCR Professional 8, and have experimented with this approach. Not at all swift, particularly > with many of the earlier journals which are not good quality. > > Thoughts of others for this application? > > Regards, > > Terry Smythe > Winnipeg, Canada > smythe(a)shaw.ca > >
From: Barry Watzman on 25 Oct 2007 13:55 JPEG works fine for text as long as you don't get overly aggressive with the compression. Trying save an 8.5" x 11" page of monochrome text at 300 dpi in a 10k JPEG file is a recipe for disaster, but in 100K or a bit more there is no problem even with subsequent OCR. Chuck Tribolet wrote: > If the pages are mostly text, use pmg, not tif and esp.not jpg. > > > "Terry Smythe" <smythe(a)shaw.ca> wrote in message news:exMTi.139826$1y4.66993(a)pd7urf2no... >> Can somebody point me to a web site containing advice on how to acquire a good quality PDF document vs tolerable file size? >> >> I'm starting a project to convert an association's monthly journals, going back to 1964, into PDF for web display initially >> followed by DVD's to members. >> >> There are 500+ issues. Each issue is an average 50 pages, 8 1/2 x 11. Color cover, gray scale images and line drawings scattered >> throughout. Text that ideally should able to capture. >> >> I'm currently using PaintShopPro9 through an HP4670 "See-Thru" scanner at 300dpi, into JPG for most pages, and TIF (Fax-CCITT3) >> for text only pages. I'm then using Acrobat 8 to create a PDF document by inserting all the images, followed by Acrobat OCR, >> followed by Acrobat optimization. >> >> All images comes throuat about 2400 pixels wide. Using these images with a current 68 page issue, my PDF document emerges at >> 32 megs after OCR and optimization. >> >> If I go one step further and resize all images down to 1500 pixels wide, file size shrinks to 7.3 megs after OCR and Optimization. >> >> Before image resizing, the PDF image is crisp and sharp. After resizing, image is somewhat fuzzy, but OCR capture appears >> unaffected and the document prints out quite nicely. >> >> Have I found an optimal process that gives me the best I can hope for? Or have others found a better process for this kind of >> application? >> >> BTW, I'm working with my own personal unbroken 43 year journal collection, and I'm not prepared to guillotine off the spines for >> auto-feed scanning. This is why I'm using the HP 4670, well suited for this situation. >> >> I'm also not prepared to OCR scan the original journals into Microsoft Word, followed by extensive editing. If I am to take this >> route, I do have ABBYY Fine-Reader OCR Professional 8, and have experimented with this approach. Not at all swift, particularly >> with many of the earlier journals which are not good quality. >> >> Thoughts of others for this application? >> >> Regards, >> >> Terry Smythe >> Winnipeg, Canada >> smythe(a)shaw.ca >> >> > >
From: Chuck Tribolet on 26 Oct 2007 21:28 But it would smaller and sharper as a png. "Barry Watzman" <WatzmanNOSPAM(a)neo.rr.com> wrote in message news:4720d909$0$5005$4c368faf(a)roadrunner.com... > JPEG works fine for text as long as you don't get overly aggressive with the compression. Trying save an 8.5" x 11" page of > monochrome text at 300 dpi in a 10k JPEG file is a recipe for disaster, but in 100K or a bit more there is no problem even with > subsequent OCR. > > > Chuck Tribolet wrote: >> If the pages are mostly text, use pmg, not tif and esp.not jpg. >> >> >> "Terry Smythe" <smythe(a)shaw.ca> wrote in message news:exMTi.139826$1y4.66993(a)pd7urf2no... >>> Can somebody point me to a web site containing advice on how to acquire a good quality PDF document vs tolerable file size? >>> >>> I'm starting a project to convert an association's monthly journals, going back to 1964, into PDF for web display initially >>> followed by DVD's to members. >>> >>> There are 500+ issues. Each issue is an average 50 pages, 8 1/2 x 11. Color cover, gray scale images and line drawings >>> scattered throughout. Text that ideally should able to capture. >>> >>> I'm currently using PaintShopPro9 through an HP4670 "See-Thru" scanner at 300dpi, into JPG for most pages, and TIF (Fax-CCITT3) >>> for text only pages. I'm then using Acrobat 8 to create a PDF document by inserting all the images, followed by Acrobat OCR, >>> followed by Acrobat optimization. >>> >>> All images comes throuat about 2400 pixels wide. Using these images with a current 68 page issue, my PDF document emerges at >>> 32 megs after OCR and optimization. >>> >>> If I go one step further and resize all images down to 1500 pixels wide, file size shrinks to 7.3 megs after OCR and >>> Optimization. >>> >>> Before image resizing, the PDF image is crisp and sharp. After resizing, image is somewhat fuzzy, but OCR capture appears >>> unaffected and the document prints out quite nicely. >>> >>> Have I found an optimal process that gives me the best I can hope for? Or have others found a better process for this kind of >>> application? >>> >>> BTW, I'm working with my own personal unbroken 43 year journal collection, and I'm not prepared to guillotine off the spines for >>> auto-feed scanning. This is why I'm using the HP 4670, well suited for this situation. >>> >>> I'm also not prepared to OCR scan the original journals into Microsoft Word, followed by extensive editing. If I am to take >>> this route, I do have ABBYY Fine-Reader OCR Professional 8, and have experimented with this approach. Not at all swift, >>> particularly with many of the earlier journals which are not good quality. >>> >>> Thoughts of others for this application? >>> >>> Regards, >>> >>> Terry Smythe >>> Winnipeg, Canada >>> smythe(a)shaw.ca >>> >>> >>
|
Next
|
Last
Pages: 1 2 3 4 Prev: Cannibals In The White House! Meet President George Walker Bush,Laura Bush, George Herbert Walker Bush, Barbara Bush, Dan Quayle, RonaldReagan, Nancy Reagan, Bill Clinton, Richard Nixon, Gerald Ford, Dwight D.Eisenhower, John F. Kennedy, And All The Next: ANN: VintaSoftTwain.NET Library v3.0 has been released. |