Monday, September 21st, 2009
There has long been a myth that “search engines can’t read PDFs” so it is better to put all content on an indexable HTML page. This may have been true a few years ago, but nowadays most of the major search engines have no trouble crawling and indexing PDF files. There are several fantastic guides out there about how to optimize a PDF for the search engines, such as this one from 2007 on Search Engine Land.
However, even though I just clearly stated that search engines can crawl and index PDF files, I still recommend putting text-rich content on an HTML page over a PDF file (whenever possible) for a few reasons:
1. No website navigation in PDFs. More often than not, the PDF does not maintain the same look and feel of the website, let alone provide any navigational elements. While it is true that PDFs can include clickable links, the vast majority of them do not have the site’s global navigation, and thus users will be left with nowhere to go but back to the search results.
3. Users may not be expecting PDFs. This may be just me, but I personally hate clicking through a search result and not immediately viewing a web page, but rather waiting for my browser to unfreeze while Adobe Acrobat takes its sweet time launching to load a PDF file. By the time the PDF is finally loaded, oftentimes I am already regretting that I clicked to view it while directing my cursor to the Back button.
There are some cases in which PDFs should remain as PDFs, such as brochures and other print material, but articles and technical papers certainly can be converted to HTML pages. I will follow up shortly with another post on how to go about doing so.