WAC Workshop. November 2004.Written and Presented by: Lori Bailey.
Covers: problems of publishing Office documents, different types of publication and methods for conversion to HTML, strategies for accessible design in Office, and cleaning up existing Office documents converted to HTML.
Contents:
Although there are plenty of competitors out there, Microsoft Office is a dominant software package for word processing, spreadsheets, and presentations; and, with the Buckeye Bundle licensing agreement, it represents a favored program on the OSU campus. Because it is so prevalent around campus, many designers assume that users have easy access to current versions of Office and can easily open any document created in Office. In addition, the current version of the Internet Explorer browser includes a plug-in designed to allow users to open Office documents seamlessly from within the browser. And, if that is not enough, Office includes a handy option to "Save as Web page," allowing designers to quickly convert any Office document into an HTML file, ready for publication to the Web. So, why is it a problem to publish Office documents to the Web?
Unfortunately, all these Web publishing options have what can be severe limitations for distribution, depending on the users particular platform, configuration, and browser version. For one thing, Office doesn't play nicely with older versions of itself. Users who haven't upgraded may have a lot of difficulty opening your Office files created in the latest, greatest version. The same goes for the latest version of IE. In addition, IE opens the file within the browser window and this can be confusing for users, especially when trying to print or save the file. And, of course, not all users are browsing with IE Netscape, Mozilla, and Opera offer no such special support for Office files. Finally, Microsoft's conversion to HTML leaves much to be desired: it creates clunky, tag-heavy versions that can result in odd-looking displays and lost information.
From an accessibility stand point, no matter which method is used to post the document, the essential step is to create the document in a way that sets up the information for access by assistive technology. This means using the tools and functions of Office correctly, using a layout that can be understood linearly, and adding the necessary information for all elements to be represented as text.
However, even a well-designed document requires careful consideration before deciding how it should be published to the Web. More than other users, assistive technology users may not be updating their software as quickly or frequently. New versions are more likely to cause conflicts with assistive technology and when you rely on your software to complete even the simplest tasks, you can't afford to have conflicts. In addition, distribution strategies that require a particular configuration or additional plug-ins, rely on external programs, or change the focus or navigational structure of the current task can result in real barriers to access by assistive technology users. The best strategy requires the least amount of work by the user and works with a minimum of user-required software. Thus, the ideal solution is to create a clean, well-formed HTML version of your document that can be displayed in just about any browser. When the ideal is not an option, as you consider which alternate Web publishing option is best for your particular project, you'll need to consider the complexity of the document, the purpose of the document, the audience you are trying to reach, and what software and hardware configurations you can reasonably expect them to have access to.
There are probably a myriad of configurations and options you can use to post your Office documents on the Web. We've narrowed it down to a handful of the most common techniques with a brief discussion of the pros and cons of each method.
You can post Word, Excel, and PowerPoint documents directly to your server and link to them from you're your Web pages.
Rich Text Format (RTF) offers a reasonable balance between accessibility and format preservation. One step above the plain text file (TXT), Rich Text Format can preserve multiple fonts and sizes, multiple colors, some styles (underline, bold, italics), backgrounds, and pictures and photographs. [Detailed specifications for the Rich Text Format can be found at : http://www.biblioscape.com/rtf15_spec.htm. A copy of this file is included on the WAC workshop CD.]
Many designers say, "What's the big deal? You can just save the document as HTML and post it that way." Unfortunately, this seemingly simple solution is not quite so effortless as it implies. For one thing, Microsoft Office actually includes three different versions of the "Save as Web Page" feature: single-file Web page, Web page, and Web page-filtered; and which you choose can have a considerable impact on the usability and accessibility of the document. Also, generally speaking, Microsoft Office HTML converter writes poor, non-standard HTML. Whatever type of HTML conversion you choose, your final HTML version will likely include unclosed tags, poorly nested tags, unnecessary tags, special tags that only work in the latest version of Internet Explorer (and may break other browsers completely), and missing tags or mark-up necessary to make the document fully accessible to users of assistive technology. In no case, and under no circumstances that we found, could a designer reliably use any version of the Microsoft Office "Save as Web Page" feature and not do follow-up editing of the converted document in an HTML editor. "Save as Web Page" can be used as a first step a step that does take care of the vast majority of the work involved in converting your Office document to an HTML file but it is only a first step and many more may be needed to create a truly accessible version of your document ready for Web publication and distribution.
Microsoft designed the single file Web page save option to facilitate the portability of Web pages created in Office. When a document is saved as a single file Web page, it is converted into a MIME encapsulated document that contains all the dependent files associated with that page images, graphics, charts, movies, sound files, etc. This allows designers to distribute a web page that would normally require multiple files as a single file. This format was not intended to be published directly to the Web for general distribution on a Web server, but as a way to share documents via e-mail or other electronic distribution without needing to create a Zip archive or other method of grouping associated files. The MHTML format will display in the most recent versions of IE, but does not successfully display in other browsers.
Microsoft's "Save as web page" option is a quick and dirty method for converting your Office documents to HTML. It's quick, because it takes about the same about of time as saving a regular version of your document. It's dirty, because the HTML that gets created is often incomplete, extraneous, and cluttered with Microsoft Office specific tags that only work in the latest version of IE. If you plan to use the "Save as web page" feature, you'll need to also plan on doing some substantive clean-up work within the HTML document afterward. The WAC recommends using clean-up software, such as Dreamweaver's "Clean up Word HTML" or TidyHTML to facilitate this process.
The filtered web page option in Office produces essentially the same document as the "Web page" option, but it removes all Office specific tags. These tags can add a substantial amount of code to your document, greatly increasing the size of the document, and are generally only supported in the latest version of IE. Thus, a filtered web page has the advantage of being both smaller and more browser friendly. It is important to keep in mind however, that Microsoft has not yet created a conversion process that inserts or prompts you to add all the necessary accessibility tags and mark-up to make your HTML document fully accessible and compliant with the OSU standards. Whether you use the "Save as Web Page" or "Save as Web Page Filtered" option, you will still have to do additional work to insure, among other things, that all images have alternative text, all data tables have summaries and headers, and graphs, charts, and other special elements are represented in a way accessible to assistive technology.
Office only offers the "Save As Web Page" for PowerPoint files and depending on your version of Office, the converted document may be a highly accessible (although cumbersome) frames-based set of web pages or it may be a highly inaccessible set of large image files. If you are primarily converting PowerPoint presentations, it is highly recommended that you use available conversion software beyond the internal tools that Office provides or expect to do extensive post-conversion editing. Other workarounds include saving an "Outline" version of the presentation (if the Outline text fully represents the presentation) and converting this to either Rich Text Format or HTML. Graphic elements can then be added back to this text-only version using accessible HTML.
Rather than using the built-in features of Office, the WAC recommends taking advantage of one of the free or low cost conversion software alternatives. In particular, we highly recommend the Accessible Web Publishing Wizard ($39.95) from the Illinois Center for Instructional Technology Accessibility (iCita), which works as a plug-in within Office to add an additional "Save as accessible Web page" option. When selected, an wizard walks you through the process of adding necessary accessibility information to your document, before converting to at least two accessible HTML versions: text-only and text-mostly. PowerPoint presentations can also be converted into Slideshow, Outline, and Handout views. You will have the opportunity to work with this conversion tool during our workshop.
As stated above, from an accessibility stand point, no matter which method is used to post the document, the essential step is to create the document in a way that sets up the information for access by assistive technology. This means using the tools and functions of Office correctly, using a layout that can be understood linearly, and adding the necessary information for all elements to be represented as text. To facilitate the best possible conversion with the least amount of follow-up editing, In general, you will need to:
Generally speaking, regardless of how you convert your Office document, you will need to do some post-conversion clean-up of the HTML. In particular, you will need to insure that:
For the document:
For images and graphics:
For data tables:
Accessible Web Publishing Wizard ($39.95) from the Illinois Center for Instructional Technology Accessibility (iCita): "offers an alternative to the native web publishing features in Microsoft Office for Word. Powerpoint and Excel. The standard web publishing options often create XML based web content that can only be viewed by Microsoft Internet Explorer. Even if non-XML options are selected, users cannot easily add information that is required for accessibility. The Accessible Web Publishing Wizard simplifies the task of converting PowerPoint presentations, Word documents, and Excel spreadsheets to accessible and vaild HTML 4.01 with CSS through an easy-to-use user interface and automation of many of the details of conversion needed for accessibility. The HTML generated meets or exceeds Section 508 and W3C WCAG 1.0 Double-A requirements for accessibility by people with disabilities and validates to HTML 4.01 and CSS standards."
[Description from the "Overview" page: http://cita.disability.uiuc.edu/software/office/]
HTML Tidy: an open source utility for tidying up HTML. Tidy is composed from an HTML parser and an HTML pretty printer. The parser goes to considerable lengths to correct common markup errors. It also provides advice on how to make your pages more accessible to people with disabilities, and can be used to convert HTML content into XML as XHTML. Tidy is W3C open source and available free. It has been successfully compiled on a large number of platforms, and is being integrated into many HTML authoring tools. Recently the maintenance of Tidy has been taken over by a group of dedicated volunteers on SourceForge, see: http://tidy.sourceforge.net/.
[Description from the developer's web site: http://www.w3.org/People/Raggett/#tidy.]
HTML Tidy On-Demand: Online version of the conversion tool allows you to clean-up an existing HTML page, upload a file, or copy and paste your HTML into the tool; choose the settings you want and get your tidied version instantly. Great for those individual files that you want to post quickly.
This document was created with the Illinois Accessible Web Publishing Wizard for Office.
Web Accessibility Center, The Ohio State University: 2004. www.wac.ohio-state.edu.