Getting Started with Digital Preservation

Digital Preservation--shutterstock_115834222

Part 3 of this series introduced you to the challenges of preserving digital file formats. It also provided effective solutions to the challenges. This part will guide you in getting started with preserving your digital family history records.

Getting Started with Digital Preservation

Assuming you already have a personal computer, you can get started with digital preservation by purchasing a digitizing device, an archival storage device, and obtaining the software you want to create archival file formats such as PDF/A and JPEG 2000.

The digitizing device can be a scanner or a digital camera. Perhaps you will want both. Scanners are easier to use, but not as versatile as digital cameras.

When using a scanner, always scan at the highest resolution for which you can afford the archival storage capacity. At the very least, you should scan at 300 dpi (dots per inch) if you never intend to print larger than the resulting digital record size. 1200 or greater dpi is recommended if you think you will ever want to print a larger version of the record. The scanning device you purchase will have software that allows you to set the desired dpi.

When using a digital camera to digitize a physical record, make sure you have natural, flat, uniform lighting so you can avoid shadows and reflections. Using a tripod is recommended, especially to keep the lens parallel to the record being photographed (camera lenses magnify skew if they are not parallel).

Most digital cameras allow you to choose a dpi setting, so always choose the highest setting available. Then, when you load your digital pictures to your computer, they will have maximum resolution when functioning as source files for the archival versions you create. Although these camera pictures will require significant capacity on your computer’s hard disc when you load them, you can delete them once you have created archival files and written them to archive-grade storage media.

If you have analog audio recordings that you want to preserve digitally, you can purchase an audio digitizer. These USB devices can digitize virtually any type of analog audio signal so the recording can be archived in the WAV format.

To digitize your analog family movies, a professional service is recommended to minimize cost and maximize quality. The same applies to 35mm slides, although slide scanning attachments may be available for the scanner you purchase (but they may be pricey). For more information on scanning slides, see reference [7].

If using a service, be sure to specify which archival file format you want for the output (either AVI (.avi) or QuickTime (.mov) for digital video, and lossless JPEG 2000 for digitized slides). If JPEG 2000 is unavailable, you might ask for TIFF and then convert the files to JPEG 2000 yourself.

As explained in this series, M-DISCs are recommended for personal archiving. To acquire Millenniata and associated LG products, you can go to the Millenniata website (millenniata.com). These products will also be available from some popular retail outlets in October 2011. Remember that virtually any DVD or Blu-ray drive can read an M-DISC.

Adding Descriptive Information to Records

Once you have the necessary software, digitizing equipment, and archival storage in place, you are ready to get started with digital preservation. However, before you attempt to preserve any records, it is important that you develop a plan to add descriptive information (called metadata in the digital preservation industry) to the digital records you are planning to preserve.

At a minimum, descriptive information should include both contextual and historical information. Contextual information describes what the record is—for example, a copy of someone’s death certificate, a photograph of a named person, etc. Contextual information also relates the record to its environment, throwing more light on the person(s) to whom the record applies. The more complete and descriptive contextual information is that you add to a digital record, the more valuable, interesting, and endeared the record will become—to you, your posterity, and your extended family.

Historical information provides the source of the record (for example, the county, city, town, or church archive from which a copy of a birth certificate was obtained). It should also identify the creator of the record, if such information can be determined. This is important for copyright reasons, which are discussed below.

Your plan to add descriptive information to digital records should begin with file names. A file name can contain both contextual and historical information. For example, when the author scanned a photograph of a distant relative, the scanning software gave the output file a generated name of:

110237489853.tif

One would never know from this file name what the record actually is (other than a TIFF image). But by changing the file name to:

Photo of Esther Elizabeth Knight on her wedding day 8 May 1917.tif

anyone looking at the file name will immediately know exactly what the record is. When searching the contents of an archival disc, having this much information for all the file names listed will certainly help you zero in on the object of your search very quickly!

A caution is in order here. Current personal computer operating systems have a limit of 256 characters to identify the location of a file on the computer’s hard disc (called the file path). These 256 characters include the file name as well as the names of all folders that must be opened to navigate to the file. Folder names may also be descriptive. Therefore, the more nesting of folders you use, the fewer characters will be left for the file name; and hence the fewer characters will be available for descriptive information in the file name.

In general, it is best to rename files with descriptive information when you first create or load them—otherwise, you may never get around to doing it.

In order to create a full set of descriptive information, you should also add reference information (or tags—another type of metadata) to files when you create them. Reference information allows search software to assist you in locating and accessing records.

When the author scanned file 110237489853.tif as explained above, he also added the following tags by clicking on the appropriate software option buttons:

Title: Esther Elizabeth Knight on her wedding day 8 May 1917Subject: Esther Elizabeth Knight wedding photoAuthor: in the public domainKeywords: Esther, Elizabeth, Knight, wedding, 1917, bride, photo, public domain

If tags are to be used effectively, both file creation software and search software must support such tags. It has already been pointed out that soft Xpansion Perfect PDF Master (which is free for personal use) does not allow the addition of tags when creating PDF/A files—you must purchase soft Xpansion’s business version of this product to get this capability.

Any time you deal with records, make sure you adhere to copyright law in regards to copying, printing, and distribution. This applies whether you are working with digital records or physical records.

To avoid violations, track down the source or owner of each record (if possible), then apply applicable copyright law. A wonderfully clear and concise summary of copyright law as it pertains to genealogy has been written by Michael Patrick Goad.8 Please take time to study his short, well written article. Some key points from it are reproduced here:

  • If an original work of authorship was created after 1977, it’s copyrighted and it’s going to be for a very long time. The earliest that any work created after that will lose its copyright will be about 2049 – that’s assuming that the author died right after he authored the work.
  • If it was created before 1923, there is no copyright on it anymore, so long as it was published. If it wasn’t published, it may still be protected by copyright.
  • Works published before March 1, 1989 without proper copyright notice are almost always in the public domain because, under the law that existed before that, a proper copyright notice was required for copyright protection.
  • Works published from 1923 to 1963 had to be renewed after an initial copyright term for protection to continue. The U.S. Copyright Office estimates that over 90% of works eligible for renewal were never renewed.

A second article written by Gary Hoffman provides additional useful information that augments Goad’s article with further insight. Please review this article as well. 9

Archiving Records

Before writing any records to an archive-grade optical disc, you will want to organize them so as to be as efficient in writing as possible. An archive-grade optical disc is designed to be permanent; therefore you cannot change anything after it is written. You can write the entire disc at one time, or you may write just a portion of it and add files later. In general, writing one record at a time is not practical.The number of records (files) you can store on an optical disc depends on the disc type and the average size of the records you want to write, as shown here.

Storage Media TypeNumber of 2.5 MB records that can be writtenNumber of 1 MB records that can be written
CD260650
M-DISC18804700
Blu-ray18,80047,000

(MB means megabyte)

Please note that there are no archive-grade Blu-ray discs available currently.

To simplify writing, it is recommended that you first copy the target files to a temporary folder and monitor the size of the folder as you proceed. For Windows, this can be done by floating your cursor over the folder name—a pop up will display the total capacity of the folder. In general, you should not exceed a folder size of 650 MB if writing to a CD, or 4700 MB (4.7 GB) if writing to an M-DISC (but only 4200 MB for other types of DVDs since their outer tracks are easily damaged by physical handling).

Once the temporary folder is populated with the target files, you can start the writing (i.e., etching or burning) process. If the folder size exceeds the disc’s capacity, writing will stop when the disc is full, leaving all remaining files unwritten. Of course, maximizing the number of files written to each disc minimizes the number of discs required.

An important preservation principle developed at Stanford University is LOCKSS (Lots Of Copies Keep Stuff Safe). The basic concept is this—the more copies you archive in different locations, the safer your records will be.

To apply LOCKSS to your archive, you should write a minimum of two discs per set of files and store them in two different locations as far apart as practical. Writing three discs and storing them in three different locations is even better. Perhaps you can exchange archival discs with friends and/or family to enhance the safety of your archived data.

It’s a good idea to periodically test your archival storage media by opening files randomly and examining the contents to detect errors. This should be done at least annually.

If errors are found on a disc, retrieve a copy of the disc (which is why you need to apply LOCKSS!) and determine if it is error free. If so, then you can replicate the copy and dispose of the flawed disc. If the copy is also flawed and you have no more copies to examine, then you have no choice but to test each file and copy the error-free files to new archival storage media. For those files with errors, you can recreate them if you still have the original physical records and can redigitize them.

Of course, applying LOCKSS to your archive requires that you get organized and develop a process to track (i) locations of the archival storage media, (ii) media age, (iii) when the media should be tested next, and (iv) when a media refresh migration should be performed. Fortunately, there is an abundance of software available to help you do this, such as Microsoft Access or Intuit QuickBase (an online database).

Sharing Your Digital Records

As mentioned in Part 1 of this series, sharing a digital record with others is fast and easy—as long as you have an Internet connection and email services. The author uses Yahoo email (mail.yahoo.com) because it is free and offers unlimited storage capacity. Also, it allows you to attach a file as large as 20 MB to an email. However, whether or not someone can receive such a large file depends on his or her email capabilities.

Should you want to send someone a file that is larger than the person’s email software will accept, you can use a free transfer service instead. TransferBigFiles.com is a website that allows you to transfer large files over the Internet at no charge. YouSendIt.com will also do this for a fee. Once you upload a file that you want to transfer, a link is provided which you can then email to your intended recipient. That person need only click on the link in your email to download the file to his or her computer.

Backing Up Your Archived Records

A side benefit of an email service that provides unlimited storage capacity is that it provides a means to extend the LOCKSS principle for your personal archive. By sending yourself emails with attached preservation files, you can create a collection of such emails that will be stored on the email service provider’s computer infrastructure. In effect, you can backup your archive on this infrastructure.

You should never rely on this approach to be your primary or even secondary archive, however, since the email service provider could start limiting storage capacity at any time or could even go out of business. And organizing so many emails to function as your primary archive might be difficult. Also, you may have difficulty accessing your email inbox when you urgently need to retrieve a record from your digital archive.

Online (cloud) backup is also becoming a popular way to backup family history records because of its convenience. But newcomers to cloud backup have much to learn and consider. The Library of Congress has published a blog10 that explains these considerations. You should review this blog if you are interested in exploring cloud backup.

However, you should never rely on cloud backup as your primary or even secondary archive. There is no guarantee that your data will be saved indefinitely. Some cloud backup services (including Amazon web services) have already crashed, resulting in lost data for some customers. Also, information in the cloud can be hacked. Bottom line, you should not count on cloud backup services alone to protect your important family history records!

As Time Goes By…

. . . it is important that you, your posterity, and your extended family monitor technology changes and take appropriate actions as needed. These actions, which comprise the ongoing aspects of digital preservation, include:

  • Transforming file formats that are becoming obsolete to their replacement formats.
  • Copying files to newer archival storage media to prevent data loss (unless you are using M-DISCs).
  • Migrating files to newer archival storage media so they can continue to be read if existing storage technology is becoming obsolete.

Clearly, digital preservation is not a one-time activity, nor is it a single-generation project. Your responsibility in the digital preservation chain is to gather, digitize, and preserve records the very best you can, then pass them on to the next generation of your posterity and/or extended family that has been prepared to carry on the work.
In many respects, digital preservation is like a relay race—you carry the baton for a period of time and then pass it on to the next runner. To prevent the baton from being dropped during the handoff, you and the next runner must work together in perfect synchronization. This means preparing and motivating the next runner to carry on the race without missing a step.

As this process is carried on from one generation to the next, your digital family history records can be preserved in perpetuity. Yes, it takes work—but the payback cannot be measured.

After reading the guidelines above, you should find Part 5 of the series helpful. It provides a step-by-step summary of preserving your family history records digitally.

References

www.flatbed-scanner-review.org/35mm_slide_film_scanners/scanning_35mm_slides.htmlAlso see—www.abstractconcreteworks.com/essays/scanning/Backlighter.html

8 Copyright Fundamentals for Genealogy (by Michael Patrick Goad 29 July 2003)www.pddoc.com/copyright/genealogy_copyright_fundamentals.htmAlso see— www.pddoc.com/copyright

9 Who Owns Genealogy? Cousins and Copyright (by Gary B. Hoffman)www.genealogy.com/14_cpyrt.html

10 Personal Archiving in the Cloud (by Mike Ashenfelder)http://blogs.loc.gov/digitalpreservation/2011/06/personal-archiving-in-the-cloud/

This article is part of the Preserving Your Family History Records Digitally series by Gary T. Wright. Each article in the series is part of the white paper, Preserving Your Family History Records Digitally.

About the Author