Select Page

Picking the right archive file format for your digital product.

Overall, I wouldn’t worry too much about the differences between this file format and that one. But, you should have some knowledge about which files are best for the long term. It’s really rather easy. If the only thing you’ll ever want to do with a book is print it, PDF is probably the best file format for your archive. PDF accurately represents page presentation, and for products that just require facsimile reproduction, it’s well suited. Unfortunately, many publishers think PDF is the only format they’ll ever need. This is understandable. After all, they’re in the business of printing books, right? Most every desktop publishing application exports to PDF, making it very easy to quickly obtain a snapshot of the finished product. It also makes it easy to transport, since PDF’s can be opened on computers running different operating systems. But what happens when the publisher works with a vendor or a licensee who needs to break apart pieces of the book and tag the text for search-ability with other books on a CD-ROM or an online database? PDF’s can be difficult when it comes to data extraction. To remedy this situation, many publishers are standardizing their file archive in XML.

XML stands for Extensible Markup Language. It is a general-purpose markup language that allows its users to define their own description tags. Its primary purpose is to facilitate the sharing of structured data across different information systems, particularly the web. It’s very flexible, and from XML, many additional file formats can be created. In fact, using the right software, you can create PDF files from XML.

Again, don’t be too concerned with file formats. In most cases, your partner will be the one to worry about this, and will let you know what’s best for their use of your data. If you don’t have the format they need, let them create the data for you. Just be sure to get a copy for your archive!