WAC Workshop. March 2005.Written and Presented by: Lori Bailey.
Table of Contents:
Editing Your PDF for Accessibility
Required Software.
Scanned PDF documents represent the most nefarious type of document in terms of accessibility to users of assistive technology. Why? Because, in most cases, a document scanned directly to PDF or scanned and then converted to PDF will be transferred as a large image file. Each page will contain one large image with all text, tables, images, and graphics grouped into that image. Text on the page is not searchable or selectable. To the assistive technology user, the document appears completely blank.
To make a scanned document accessible, you must convert the image of the document into "real" text. That is, the text must be selectable and scalable. This is usually done through OCR (Optical Character Recognition). If the PDF version is also to be your accessible version, you'll need to add additional accessibility mark-up adding "tags" to your PDF, adding alternative text for images, graphs, and charts, and adding header information to data tables. In addition, text created from a scanned image of a document is often converted into unexpected segments and these segments may be out-of-order in terms of the expected read-order of the document. You'll need to perform several checks to insure correct read-order is established, once your document is converted.
Basically, you have two choices for creating your PDF document from paper. You can scan your document into an image file (typically a TIFF) and then convert the image file into a PDF. Or you can scan directly into PDF using the "Create PDF from Scanner" option in Acrobat; using your scanner's PDF conversion option; or using commercially available conversion software.
PDF experts tend to suggest using the first option scanning to a TIFF and then importing into Acrobat or your PDF creation software. By separating the steps, you can focus first on creating clean, high-quality scans of the document and then worry about converting to an accessible PDF. If you process your documents directly to PDF, you may need to do several rescans at different DPI and different settings, before you have a PDF that can be successfully manipulated. However, once your settings have been established, we found little difference between creating a TIFF and scanning directly to PDF.
Regardless of how you scan your document, you will need to do some follow-up after the PDF version has been created to add accessible features. This can be a very simple process, for simple documents, or a very lengthy and complex process, for complex documents, and much depends on what software you have available.
Each scanner is different and uses different software, different defaults, and different preset configurations. You'll probably need to experiment to find out which settings work best for the types of documents you are scanning and converting. In the examples below, we used an Epson Perfection 1660 Photo scanner and customized the settings to 400DPI Black and White Photo output.
In order to insure your document is accessible to users of assistive technology, you'll need to edit the PDF document:
OCR Tip: After performing OCR, switch to Select Text mode and try to select text in your document. The text that is highlighted has been interpreted by Acrobat. Any text that cannot be highlighted failed to be converted. Also, notice if text is highlighted in an odd order or if some blocks of text are skipped. This indicates problems with read order.
As Acrobat performs its OCR process, it creates a list of "suspect" words and characters that could not be clearly identified. You can see all the suspect items at once: from the DOCUMENT menu, choose PAPER CAPTURE and FIND ALL OCR SUSPECTS. Acrobat highlights all the suspect items in the document.
You must address each OCR suspect. Any OCR suspect that you ignore will not be converted into readable text and will be ignored by screen readers.
You can walk through the OCR suspects one by one:
After you have performed OCR and addressed all the suspect characters, you can do a quick check to insure that the text of your document is available to screen readers: Save as text (accessible).
Once you are certain that the necessary text is available on the document, you can add tags to your document. Adding tags creates a duplicate of your document that is marked-up for accessibility. Only the very latest assistive technology can read an untagged PDF. Plus, untagged PDF cannot be reflowed to fit available screen size and cannot contain additional information, such as alternative text for images. Thus, only a tagged PDF can be considered accessible.
You can use Acrobat's automated feature to add tags to your document:
After adding tags, you can do a few quick-checks to insure your document will work well with assistive technology. You can also use these techniques at any point in your conversion process to check the accessibility of your document.
Highlighting content is a simple method to confirm:
Document reflow assists users who enlarge the text or who are using small screens or resolutions, by reformatting the document to fit in the available screen. Without reflow, users may be forced to scroll widely horizontal as well as vertically.
To check for reflow:
The best way to check a document's accessibility is to use the same assistive technology your users will use to access the document. However, if you don't have access to a screen reader or screen enlarger, you can still get a sense of how those technologies will interpret your document by listening to it being read by Acrobat's "Read Out Loud" feature. Although not practical for lengthy documents, such as dissertation chapters or articles, this is a good strategy for shorter documents that will receive high circulation on your web site or will be required reading for your users.
To read out loud:
For longer documents, you may want to narrow your reading to only a few key pages: in particular, those pages that contain graphics, tables, columns, or text boxes.
Any problems you find during your checks will most likely need to be addressed by editing the tagged version of your pdf. For detailed guidance on how to edit tags and markup images, tables, and links for accessibility, see the WAC Handout: "Checking Your PDF for Accessibility". It is available online at: www.wac.ohio-state.edu/pdf/checking.
The WAC has put together an extensive collection of guides and resources on various production methods for accessible PDF. Visit us online at: www.wac.ohio-state.edu/pdf.
Adobe offers a number of excellent resources as well. One we recommend: Acrobat for Educators which includes a selection of FREE online video tutorials that guide you through how to use Acrobat from simple Bookmarks and Articles to advanced Document Collections. Check it out at: www.adobe.com/education/acrobat/acrobat_training.html.
Want more? Check out the discussions, tips, and tools offered by Planet PDF, a community of advanced developers ready to help you with quick solutions to your PDF problems. Includes a very useful collection of software titles for all types of PDF creation and conversion. Online at: www.planetpdf.com.
A number of companies offer software that specializes in converting PDF to either accessible (selectable & searchable) PDF or to other, more accessible, formats (Word, Excel, etc.). Here are a few:
ABBYY PDF Transformer ($49.99): Quickly and accurately convert any PDF file into Microsoft® Word, Excel or HTML files without retyping and reformatting. PDF Transformer is an ideal utility for business and home users that need to edit and repurpose a wide variety of PDF files. [http://www.abbyyusa.com/pdftransformer.htm]
Able2Extract Professional ($120): Convert your PDF data into fully formatted Excel spreadsheets and editable Word documents with Able2Extract Professional. Supports scanned documents, offering 10 different conversion options in total. [http://www.investintech.com/prod_a2e_pro.htm]
Adobe Acrobat Capture ($195): Adobe Acrobat® Capture® 3.0 software is the perfect addition to Adobe Acrobat 7.0 for people who want to process high volumes of scanned paper and turn them into searchable tagged Adobe PDF files. [http://store.adobe.com/enterprise/accessibility/acrobatcapture30.html]
ISICopy ($99): ISICopy works with Adobe® Acrobat® software to extract text from image-based PDF files, converting it into valuable editable text. There is no need to OCR an entire page; if you have a paper-based PDF file, you can select the precise amount of text you want to copy and then paste it into any application.
ScanSoft OmniPage Pro ($120): Quickly turn paper and PDF files into editable electronic documents that look just like the original complete with text, tables and graphics. Robust new tools enable you to turn text documents into audio books and add digital signatures to your electronic documents. [http://www.scansoft.com]
SolidConverter ($50): You do NOT need Adobe® Acrobat® or Reader® to use our converter! Solid Converter PDF can be used as a standalone converter tool or as a plugin for Microsoft Word® and Adobe® Acrobat® (not Reader). Solid Converter PDF is also available through Explorer's right click local menu. A command line interface is available for batch processing. [http://www.solidpdf.com/]
For large jobs:
PrimeOCR ($1500, limited # of pages): includes an "Accessible PDF" module that meets Section 508 guidelines for an accessible document. [http://primerecognition.com/augprime/prime_ocr.htm]
Note: prices are offered for reference only (subject to change) and do not include any educator's or volume-licensing discounts, if applicable. Before ordering Adobe products, find out if it is available through our OSU volume-license agreement with SHI. See "Adobe Ordering Procedures" on the OIT Site Licensed Software page (available to OSU faculty and staff only): [https://cweb1.net.ohiostate.edu/software/lookup.cgi?adobeclp&1.0&win&Adobeorder.pdf]
This document was created with the Illinois Accessible Web Publishing Wizard for Office.
Web Accessibility Center at The Ohio State University, March 2005. www.wac.ohio-state.edu.