|
Text-only
version.
Publishing Office Documents to the Web
WAC Workshop. November 2004.Written
and Presented by: Lori Bailey.
Covers: problems of publishing Office documents, different types
of publication and methods for conversion to HTML, strategies for
accessible design in Office, and cleaning up existing Office documents
converted to HTML.
Contents:
- Why are Office Documents a problem?
- Options for Web Publishing of Office Documents.
- Using Conversion Software.
- Preparing Your Document for Accessible HTML Conversion.
- Cleaning Up Office Documents After Conversion
to HTML.
Why are Office documents a problem?
Although there are plenty of competitors out there, Microsoft Office is a dominant software package for word processing, spreadsheets, and presentations; and, with the Buckeye Bundle licensing agreement, it represents a favored program on the OSU campus. Because it is so prevalent around campus, many designers assume that users have easy access to current versions of Office and can easily open any document created in Office. In addition, the current version of the Internet Explorer browser includes a plug-in designed to allow users to open Office documents seamlessly from within the browser. And, if that is not enough, Office includes a handy option to "Save as Web page," allowing designers to quickly convert any Office document into an HTML file, ready for publication to the Web. So, why is it a problem to publish Office documents to the Web?
Unfortunately, all these Web publishing options have what can be severe limitations for distribution, depending on the users particular platform, configuration, and browser version. For one thing, Office doesn't play nicely with older versions of itself. Users who haven't upgraded may have a lot of difficulty opening your Office files created in the latest, greatest version. The same goes for the latest version of IE. In addition, IE opens the file within the browser window and this can be confusing for users, especially when trying to print or save the file. And, of course, not all users are browsing with IE Netscape, Mozilla, and Opera offer no such special support for Office files. Finally, Microsoft's conversion to HTML leaves much to be desired: it creates clunky, tag-heavy versions that can result in odd-looking displays and lost information.
From an accessibility stand point, no matter which method is used to post the document, the essential step is to create the document in a way that sets up the information for access by assistive technology. This means using the tools and functions of Office correctly, using a layout that can be understood linearly, and adding the necessary information for all elements to be represented as text.
However, even a well-designed document requires careful consideration before deciding how it should be published to the Web. More than other users, assistive technology users may not be updating their software as quickly or frequently. New versions are more likely to cause conflicts with assistive technology and when you rely on your software to complete even the simplest tasks, you can't afford to have conflicts. In addition, distribution strategies that require a particular configuration or additional plug-ins, rely on external programs, or change the focus or navigational structure of the current task can result in real barriers to access by assistive technology users. The best strategy requires the least amount of work by the user and works with a minimum of user-required software. Thus, the ideal solution is to create a clean, well-formed HTML version of your document that can be displayed in just about any browser. When the ideal is not an option, as you consider which alternate Web publishing option is best for your particular project, you'll need to consider the complexity of the document, the purpose of the document, the audience you are trying to reach, and what software and hardware configurations you can reasonably expect them to have access to.
Options for Web Publishing of Office Documents
There are probably a myriad of configurations and options you can use to post your Office documents on the Web. We've narrowed it down to a handful of the most common techniques with a brief discussion of the pros and cons of each method.
Table of Contents
Posting original document
You can post Word, Excel, and PowerPoint documents directly to your server and link to them from you're your Web pages.
Benefits:
-
Easiest method -- no conversion necessary.
-
Formatting retained exactly how creator intended.
-
Documents are fully editable in compatible versions of Office
Drawbacks:
-
Only users with the latest versions of Internet Explorer, a compatible version of the Office program, or an Office document viewer plug-in will be able to read the document. Users most often have difficulty when a user with an older version of Office tries to open your newer Office document and users who try to share documents across platforms (MAC to PC or PC to MAC). Also, LINUX users on campus frequently cannot open Office documents.
-
Users may have difficulty viewing files in desired application. If the user is browsing with the latest version of IE, the document frequently opens within the browser and not within an existing copy of Office on the user's computer. This can be confusing and cause some difficulty with saving, printing, and editing.
Requirements:
Best suited for:
-
Known user group with identified compatible software.
-
Files where complete formatting information must be preserved and shared with users (for editing/commenting).
-
Examples -- students sharing documents in a computer classroom, collaborative projects, working drafts of internal training modules.
Table of Contents
Posting a Rich Text version
Rich Text Format (RTF) offers a reasonable balance between accessibility and format preservation. One step above the plain text file (TXT), Rich Text Format can preserve multiple fonts and sizes, multiple colors, some styles (underline, bold, italics), backgrounds, and pictures and photographs. [Detailed specifications for the Rich Text Format can be found at : http://www.biblioscape.com/rtf15_spec.htm. A copy of this file is included on the WAC workshop CD.]
Benefits:
-
Simple conversion: most Office documents offer a "Save As" option for Rich Text Format.
-
No additional plug-in required: can be opened in most browsers or word-processing programs.
-
Accessible to screen readers and assistive technology, older versions of Office, other applications, PDAs, and etc.
Drawbacks:
-
Images convert to bitmap files which are very, very large files and can make the file extremely bulky requiring large available memory to open.
-
Complex formatting is lost. Documents using columns, embedded tables, graphs, charts, headers, footers, and other visual elements may convert poorly to RTF.
Requirements:
-
Because RTF is designed to open across platforms and word processing programs, you are not required to include a link to an RTF-compatible viewer. However, you should still clearly identify links to RTF files, as users may want to handle them differently than standard HTML pages (e.g., open them in a word processor program rather than a browser window).
-
Additionally, you should test your RTF files in different browsers to insure they open smoothly.
Best suited for:
-
Simple documents: single-column layouts with no graphics or images.
Table of Contents
Using the "Save as Web Page" Feature
Many designers say, "What's the big deal? You can just save the document as HTML and post it that way." Unfortunately, this seemingly simple solution is not quite so effortless as it implies. For one thing, Microsoft Office actually includes three different versions of the "Save as Web Page" feature: single-file Web page, Web page, and Web page-filtered; and which you choose can have a considerable impact on the usability and accessibility of the document. Also, generally speaking, Microsoft Office HTML converter writes poor, non-standard HTML. Whatever type of HTML conversion you choose, your final HTML version will likely include unclosed tags, poorly nested tags, unnecessary tags, special tags that only work in the latest version of Internet Explorer (and may break other browsers completely), and missing tags or mark-up necessary to make the document fully accessible to users of assistive technology. In no case, and under no circumstances that we found, could a designer reliably use any version of the Microsoft Office "Save as Web Page" feature and not do follow-up editing of the converted document in an HTML editor. "Save as Web Page" can be used as a first step a step that does take care of the vast majority of the work involved in converting your Office document to an HTML file but it is only a first step and many more may be needed to create a truly accessible version of your document ready for Web publication and distribution.
Save As: Single File Web Page (MHTML)
Microsoft designed the single file Web page save option to facilitate the portability of Web pages created in Office. When a document is saved as a single file Web page, it is converted into a MIME encapsulated document that contains all the dependent files associated with that page images, graphics, charts, movies, sound files, etc. This allows designers to distribute a web page that would normally require multiple files as a single file. This format was not intended to be published directly to the Web for general distribution on a Web server, but as a way to share documents via e-mail or other electronic distribution without needing to create a Zip archive or other method of grouping associated files. The MHTML format will display in the most recent versions of IE, but does not successfully display in other browsers.
Save As: Web Page
Microsoft's "Save as web page" option is a quick and dirty method for converting your Office documents to HTML. It's quick, because it takes about the same about of time as saving a regular version of your document. It's dirty, because the HTML that gets created is often incomplete, extraneous, and cluttered with Microsoft Office specific tags that only work in the latest version of IE. If you plan to use the "Save as web page" feature, you'll need to also plan on doing some substantive clean-up work within the HTML document afterward. The WAC recommends using clean-up software, such as Dreamweaver's "Clean up Word HTML" or TidyHTML to facilitate this process.
Save As: Web Page
Filtered
The filtered web page option in Office produces essentially the same document as the "Web page" option, but it removes all Office specific tags. These tags can add a substantial amount of code to your document, greatly increasing the size of the document, and are generally only supported in the latest version of IE. Thus, a filtered web page has the advantage of being both smaller and more browser friendly. It is important to keep in mind however, that Microsoft has not yet created a conversion process that inserts or prompts you to add all the necessary accessibility tags and mark-up to make your HTML document fully accessible and compliant with the OSU standards. Whether you use the "Save as Web Page" or "Save as Web Page Filtered" option, you will still have to do additional work to insure, among other things, that all images have alternative text, all data tables have summaries and headers, and graphs, charts, and other special elements are represented in a way accessible to assistive technology.
About PowerPoint Conversion
Office only offers the "Save As Web Page" for PowerPoint files and depending on your version of Office, the converted document may be a highly accessible (although cumbersome) frames-based set of web pages or it may be a highly inaccessible set of large image files. If you are primarily converting PowerPoint presentations, it is highly recommended that you use available conversion software beyond the internal tools that Office provides or expect to do extensive post-conversion editing. Other workarounds include saving an "Outline" version of the presentation (if the Outline text fully represents the presentation) and converting this to either Rich Text Format or HTML. Graphic elements can then be added back to this text-only version using accessible HTML.
Table of Contents
Using Conversion Software
Rather than using the built-in features of Office, the WAC recommends taking advantage of one of the free or low cost conversion software alternatives. In particular, we highly recommend the Accessible Web Publishing Wizard ($39.95) from the Illinois Center for Instructional Technology Accessibility (iCita), which works as a plug-in within Office to add an additional "Save as accessible Web page" option. When selected, an wizard walks you through the process of adding necessary accessibility information to your document, before converting to at least two accessible HTML versions: text-only and text-mostly. PowerPoint presentations can also be converted into Slideshow, Outline, and Handout views. You will have the opportunity to work with this conversion tool during our workshop.
Preparing Your Document for Accessible HTML Conversion
As stated above, from an accessibility stand point, no matter which method is used to post the document, the essential step is to create the document in a way that sets up the information for access by assistive technology. This means using the tools and functions of Office correctly, using a layout that can be understood linearly, and adding the necessary information for all elements to be represented as text. To facilitate the best possible conversion with the least amount of follow-up editing, In general, you will need to:
-
Use headings and styles to format text. Create custom styles if necessary, but use standard styles as much as possible.
-
Provide text alternatives for images.
-
Use tables for data, not layout and use the insert table or "Draw Table" tool to define your tables.
-
Uses bulleted and numbered lists; do not create pseudo-lists using special characters, images, or tabs.
-
Turn off "Smart Quotes" (under the Format menu, choose Auto Format options) to avoid conversion problems
Cleaning Up Office Documents After Conversion to HTML
Generally speaking, regardless of how you convert your Office document, you will need to do some post-conversion clean-up of the HTML. In particular, you will need to insure that:
For the document:
-
Identify and validate the appropriate HTML DOCTYPE.
-
Insure each file includes a unique title using the <title> tag and add meta-data where appropriate.
-
Remove invalid HTML from the <head>.
-
Add any Header and Footer information that is necessary for the document. Office typically ignores Headers and Footers when converting to HTML.
-
Restructure document: remove unnecessary header tags, insure all text is within a paragraph or header tag.
-
Review lists and bullets: insure <li> tags for each list item. Remove or replace bullet images or bullets created by strange characters.
-
Add navigation structure. For longer documents, add in-page navigation that allows the user to quickly move to different sections of the document (can also be prepared before conversion). Also, add appropriate links to other documents on your site (e.g., home page link, link to course information). Keep in mind, if you have a lot of repeated navigation at the top of your document, you will need a Skip Navigation link as well.
For images and graphics:
-
Provide alternative text or designate an empty ALT tag for every image or graphic.
-
Add long descriptions as appropriate for graphics, charts, and other detailed graphic information.
-
Place images appropriately within the read-order of the text. Office often pushes floating graphics up or down the page when converting to HTML. Check to insure graphics are placed where they make sense within the flow of the text.
For data tables:
-
Designate row and column headers using the <th> tag. Some converters will use the <td> tag with the "scope" attribute. You need to change these to <th>.
-
Provide summaries using the "summary" attribute in the <table> tag.
-
Provide titles using the <caption> tag.
-
If the data table has two or more levels of row or column headers, use the "headers" attribute to associate data cells with the appropriate headers.
-
Convert to proportional rather than absolute sizing (recommended).
Resources for Office to HTML Conversion
Accessible Web Publishing Wizard ($39.95) from the Illinois Center for Instructional Technology Accessibility (iCita): "offers an alternative to the native web publishing features in Microsoft Office for Word. Powerpoint and Excel. The standard web publishing options often create XML based web content that can only be viewed by Microsoft Internet Explorer. Even if non-XML options are selected, users cannot easily add information that is required for accessibility. The Accessible Web Publishing Wizard simplifies the task of converting PowerPoint presentations, Word documents, and Excel spreadsheets to accessible and vaild HTML 4.01 with CSS through an easy-to-use user interface and automation of many of the details of conversion needed for accessibility. The HTML generated meets or exceeds Section 508 and W3C WCAG 1.0 Double-A requirements for accessibility by people with disabilities and validates to HTML 4.01 and CSS standards." [Description from the "Overview" page: http://cita.disability.uiuc.edu/software/office/]
HTML Tidy: an open source utility for tidying up HTML. Tidy is composed from an HTML parser and an HTML pretty printer. The parser goes to considerable lengths to correct common markup errors. It also provides advice on how to make your pages more accessible to people with disabilities, and can be used to convert HTML content into XML as XHTML. Tidy is W3C open source and available free. It has been successfully compiled on a large number of platforms, and is being integrated into many HTML authoring tools. Recently the maintenance of Tidy has been taken over by a group of dedicated volunteers on SourceForge, see: http://tidy.sourceforge.net/. [Description from the developer's web site: http://www.w3.org/People/Raggett/#tidy.]
HTML Tidy On-Demand: Online version of the conversion tool allows you to clean-up an existing HTML page, upload a file, or copy and paste your HTML into the tool; choose the settings you want and get your tidied version instantly. Great for those individual files that you want to post quickly.
|