Solutions to File Format Challenges

File Format Challenge--shutterstock_108312266

Part 2 of this series introduced you to the challenges of digital preservation and discussed solutions to archival storage media challenges. This part explores in detail challenges of and solutions to file formats.

File Format Challenges and Solutions

Another challenge of digital preservation has to do with file formats. Obsolescence is of particular concern. To illustrate, in the early and mid-1980s, WordStar was the most popular DOS word processing software in the world. Today it is effectively “abandonware” (i.e., no longer developed or maintained). Anyone attempting to preserve a WordStar document in 1985 would undoubtedly have a difficult time getting his or her personal computer to read it today—even if the storage medium used at the time were still readable!

File size can also be of concern for digital preservation. When preserving digital family history records, you should always preserve them at the highest resolution you can afford.

The reason is that a digital record’s resolution quality cannot be improved once the digital record is created. And since you don’t know how a record will be used in the future (either by you or your posterity/extended family), resolution can become problematic. For example, if the record is to be printed, print quality will reflect resolution quality of the digital record when you archived it.

For photographs, TIFF (Tagged Image File Format) provides very high resolution, but it also creates large files that consume considerable amounts of archival storage capacity. Converting a TIFF file to the JPEG format will reduce the size of the file, but the reduction will come at the expense of resolution. That’s because JPEG processing does lossy compression of the digital bits—which means that many of them are discarded in order to achieve a significant reduction in file size. JPEG decompression always results in altered file content compared to the original. Such JPEG images may be suitable for viewing on a website, but they may disappoint if you try to print them. (Note: JPEG stands for Joint Photographic Experts Group—originators of the JPEG standard.)

Fortunately, there are file formats that can help overcome both challenges described above.

The first is the PDF/A format (Portable Document Format for Archiving). Recognizing the impact of file format obsolescence on digital preservation, the International Organization for Standardization (ISO) defined in 2005 an “electronic document file format for long term preservation.” Based on the Adobe PDF 1.4 format, PDF/A provides a self-contained, self-describing format that is independent of external sources. For example, it embeds relevant fonts and color information with the content data so that future computer software will be able to render the document exactly as it can be rendered today. In effect, PDF/A uses a software archiving approach to digital preservation.

Combined with the archive-grade storage media described in Part 2 of this series, PDF/A provides a breakthrough in personal archiving!

PDF/A can be used for most record types. Audio and video are exceptions. Also, PDF/A does not allow encryption and requires the use of standards-based metadata (i.e., descriptive information). Since fonts used in the document must be embedded with the content data, the resulting file size will be larger than a corresponding (regular) PDF file.

Nevertheless, PDF/A offers the promise of renderability well into the future.

For more information about PDF/A, see the REFERENCES section. Also note that a PDF/A file has the same file extension as a non-archival PDF file (i.e., .pdf)—therefore you cannot detect a PDF/A file without examining the metadata that describe it.

A partial list of PDF/A software for Windows is provided here:Adobe Acrobat (get Version 8.0 or later)soft Xpansion Perfect PDF Master (free for personal use)Nuance PDF ConverterSolid PDF CreatorMicrosoft Office 2007 via its “Save as PDF” plugin (float your cursor over “Save As,” click on “PDF or XPS,” click the “Options…” button, then select “ISO 19005-1 compliant (PDF/A)” under “PDF Options”)

Mac OS PDF/A software includes Microsoft Office 2011, OpenOffice, Nuance PDF Converter, and Adobe Acrobat (get Version 8.0 or later).

Another file format worth noting is JPEG 2000. Like PDF/A, it is also an ISO standard, although it applies strictly to images.

As an improvement to the 1992 JPEG standard, JPEG 2000 provides both lossy and lossless compression. Lossless compression allows the exact original data to be reconstructed from the compressed data. And yet, lossless compression typically achieves 50% to 60% reduction in file size compared with source files—without sacrificing resolution quality in the conversion! For this reason, among others, JPEG 2000 is becoming popular in the digital preservation industry. File extensions for JPEG 2000 files are .jp2 and .j2k.

Combined with the archive-grade storage media described in Part 2 of this series, JPEG 2000 provides a breakthrough in personal archiving of images by simultaneously delivering the benefits of high resolution and reasonable file size!

PNG (Portable Network Graphics) is another ISO standard file format that provides lossless data compression. In some cases, such as images having areas with many pixels of the same color, PNG is even more space efficient than JPEG 2000. However, JPEG 2000 is more error resilient than PNG and is gaining a foothold in the digital preservation industry; hence the author’s focus on JPEG 2000 for general use.

JPEG 2000 software for Windows is identified in the following incomplete list of products:

Adobe Acrobat and Adobe PhotoshopFastStone Image Viewer (free for personal use)XnView (free for personal use)ACDSee Photo EditorCorel PaintShop Photo Pro

Mac OS JPEG 2000 software includes Apple Preview, GraphicConverter 7, XnView, ACDSee Pro, and the Adobe products mentioned above.

There is a streaming (motion) version of JPEG 2000 that is used for digital video. If your software offers this version, be sure to select the image version for still images.The author successfully tested most of the Windows software identified above for PDF/A and JPEG 2000 (the Adobe products were not tested). The following test results are worth noting:

  • Using a 14 megabyte TIFF image as a source file, all tested JPEG 2000 products provided a lossless compression benefit of 61%.
  • Solid PDF Converter Plus creates a PDF/A with JPEG 2000 lossless compression, thus combing the best of both formats. However, the author encountered a software bug with one test and reported it to Solid Documents. A commitment was received to fix the problem, but no time frame was given.
  • soft Xpansion Perfect PDF Master (which is free for personal use) does not allow for the addition of descriptive information (metadata) when creating PDF/A files. You must purchase the business version of this product from soft Xpansion in order to get this capability, which is discussed in Part 4 of this series.

To preserve digital audio files, the Waveform Audio File Format (WAV) is recommended. Compatible with both Windows and Mac OS operating systems, WAV software is plentiful (search on “free WAV software”) and is expected to be used for many years to come. MP3 and SP2 should be avoided for preservation. One reason is that converting a WAV file to MP3 compresses the audio data as much as 91%, which contradicts the digital preservation principle of preserving at the highest resolution you can afford.
Likewise, digital video should be preserved using QuickTime or the Audio Video Interleave (AVI) format. QuickTime runs with both Windows and Mac OS operating systems, but creating AVI files with Mac OS is not easily done. Flash, MPEG-2, and MPEG-4 should be avoided when preserving digital video (compression is a factor here also, as with digital audio).

Addressing the Challenge of Obsolete File Formats

The file format recommendations provided here are intended to maximize the renderable life of your digital family history records. In the event that any of the file formats you use appear to be losing vendor support, you should promptly migrate the affected records by converting them to replacement file formats and writing the transformed files to archive-grade storage media. This type of migration is called a transformation in the digital preservation industry.

To illustrate, if a new JPEG format is introduced in the future that enhances the JPEG 2000 format, vendors will undoubtedly provide software to convert JPEG 2000 files to the new format. This software will be available for a number of years as customers gradually transition to the new format, providing a window of opportunity for you to migrate your affected images.

Since the converted images must be rewritten, such a migration might also provide a needed media refreshment—which is an example of reducing overall preservation workload by combining or overlapping tasks.

In order to ensure that you can migrate digital records before their file formats become obsolete, it is critical that you stay abreast of digital preservation technology.

Once again, this kind of migration work, as well as the technology monitoring it entails, may need to be performed by your posterity or extended family, since you may not outlive the file formats you choose to preserve your digital records. Therefore, it behooves you to prepare them for such transformation migrations and ongoing technology watching.

 After reading the discussion above about the solutions to file format challenges, you are ready for Part 4 of the series, which helps you get started preserving your digital family history records.


Recommended PDF/A (frequently asked questions) (white paper)

This article is part of the Preserving Your Family History Records Digitally series by Gary T. Wright. Each article in the series is part of the white paper, Preserving Your Family History Records Digitally.

About the Author