Issue number 19-20, Spring 2002

 

"'Why Do We Need to Keep This in Print? It's on the Web ...':
a Review of Electronic Archiving Issues and Problems"

by Dorothy Warner

 

Indeed! It may be on the web today, but is there a plan in place to ensure that it will be there in twenty or more years? Probably not. In the haste to make information available electronically there are few agreed-upon plans for the preservation of digital information and much has already been lost. The particular concern of preserving electronic state government documents recently became an issue for our State Documents Interest Group of the Documents Association of New Jersey (DANJ) when we recognized that not only are fewer documents produced in print format but there is not a state plan to preserve the electronic documents being produced. For several years the Division of Elections in New Jersey eliminated the web page that gave the previous year's election lists and results. Fortunately, the concern from those using the information prompted the Division of Elections to begin to retain this information. But the earlier information is gone. Recently, Public Utilities created a new web page and eliminated virtually all of the documents that had existed on the earlier page. At least one agency replaces its old annual report with the new one. The predicament in New Jersey is not an isolated one. Our response was to research the issue of digital preservation and to present a report of recommendations to the State Librarian. The report, edited by Sue Lyons (2001, available at the DANJ website, http://www.danj.org/DANJ), provides a thoughtful overview of the concerns and problems of digital archiving, offering recommendations for a cooperative process and plan by the state. In the report, Lyons cites several examples of lost digital information, including data from the Viking mission to Mars and all computerized data from a New York study mapping land use and environmental data throughout the state.

At the federal government level the situation is the same. There is no overall plan for archiving federal government documents that exist only in digital format. Instead each agency determines its own preservation policy. Recently, at an annual government documents conference, this author listened to a representative from the Bureau of Labor Statistics (BLS) promise the audience of government documents librarians that all digital information at the BLS would be preserved forever. But will Congress adequately fund BLS to be able to follow through on this guarantee? That remains to be seen. The Government Printing Office (GPO) has had significant budget cuts at the same time that Congress has given GPO the mandate to cut printing costs by making information available digitally. This, of course, does offer wider access to the information today, but what about tomorrow? Clearly, the rush to make information available quickly and widely, often for "future planning" purposes, has overshadowed the need to ensure that the very same information will continue to be available for planners and historians of the future. The cart is again before the horse. While reading David McCullough's recent book on John Adams, it was evident to me that the book relies heavily on Adams' personal writings - much of which is still available in its original form. One wonders about the resources that will be available to historians of the future.

The Federal Depository Library Program (FDLP) regulates those libraries committed to being official depositories of federal government documents. One of the responsibilities of each member library is to ensure access to government information. Because of the concern of potential technological obsolescence, there is a substantial amount of printing taking place of electronic documents as lengthy as 500 pages (both state and federal) both by libraries and by end-users. There is great concern among FDLP members about our ability to permanently ensure access to electronic government documents. Cowell, Jacobs and Peterson (2001, October) have brought to light some of those concerns. Their answers to frequently asked questions follow.

a. Since my patrons can access government documents on the Internet, why do I need a copy of those documents?

Ensuring access implies being able to control the existence, integrity, and location of an item. If someone other than you can move, replace, alter, or remove the copy you want to provide for your users, then you can not ensure access. The presence today of a document on the web is no guarantee of its presence tomorrow.

b. But doesn't GPO guarantee "permanent public access"?

This is GPO's goal, but there is a serious potential problem with it: GPO is not funded to do this. The ability of GPO to provide access to everything is only as assured as the next funding cycle and the whim of Congress. If Congress can, as it has, effectively remove vital materials from the depository system and allow GPO to change a depository system to system without deposits, how certain can we be that Congress will not change GPO's mandate, or abolish GPO, or privatize all or part of the "permanent public access" collection, or fund GPO at levels that make it impossible for GPO to keep everything online always? Given these circumstances, libraries need their own copies of digital documents in order to speak with any confidence of permanent public access.

c. Can't we rely on the government (individual agencies, NARA, etc.) to preserve for the long term? My library isn't an archive after all!

It is unlikely that agencies will preserve their materials in either the library sense or the archival sense. Agencies have balked when told to do so by NARA [National Archives and Records Administration] and have rarely shown an understanding of the need for older materials.

d. What do I do if my library director won't let me have a digital collection or won't fund anything new?

Emphasize the importance of the role of libraries in society. If libraries do not take responsibility for selecting, organizing and preserving digital information one of two things will certainly happen. Either information will be lost because no one will take on this role, or the private sector will do it for those items that are profitable.

Richard Wiggins (2001, Spring) illustrates the challenge of archiving digital information by revealing that with the inauguration of George Bush, Jr., the White House web site (http://www.whitehouse.gov) was completely changed and all of the Clinton administration's web collection disappeared overnight. Fortunately, the National Archives and Records Administration (NARA) had begun to preserve the content of the Clinton administration's contributions to the White House web site although some suspect that information has been lost anyway as it has been reported that agencies in the Executive Branch were not all successful in complying with NARA preservation requests. Wiggins also mentions the accompanying problem of 170,000 links to the site, many "deep links", that were suddenly broken, as reported by AltaVista, not to mention the possible millions of personal bookmarks that were instantly dead.

The preservation of electronic journals is also a concern for libraries. Wiggins (2001, Spring, inset article) notes the irony of the demise of CICNet Journal Archive (Committee on Institutional Cooperation) due to lack of funding. For six years, from 1991-1997, the group attempted to archive electronic journals. The archive has "vanished". "Ironic, indeed, to lose not a mere collection but an archive whose purpose was to prevent loss of electronic content. How many pioneering e-journals, many of them hosted on now defunct Gopher servers, were lost for eternity?" In a related issue, a message posted on October 5, 2001, to the Collection Development Listserv (colldv-l@usc.edu) notes an attempt to obtain an article beginning on page 415 of vol. 28, no. 4 of Catalyst Today. The online version, available via Science Direct, only shows articles in that volume up to page 389. The response to a query to Science Direct was that at least 2% of its electronic journal content is missing.

The problem stated by O'Mahony (1998, p. 114) continues. His specific concern was about electronic government information, but certainly relates to other forms of digital information.

Each day that the problems of electronic preservation and permanent public access go unresolved, alarming amounts of government information continue to be lost as databases come and go from agency websites, files are deleted from government computer servers, digital storage media deteriorate, and hardware and software become obsolete. The continuous and cumulative effects of this ongoing catastrophe are to deny taxpayers access to information they already paid for, to impair the public's ability to use government information already collected and compiled, to waste public and private resources in having to duplicate efforts to retrieve information previously available but now lost, and to allow the historical record of the nation to literally vanish before our eyes. Moreover, it severely undermines the potential promise and usefulness of new electronic technologies when the long-term consequence of their use is an ever-widening breach in our collected knowledge and information bank.

AN OVERVIEW OF THE PROBLEMS

Although there are groups working at the state, national and global levels to determine the best practices for digital archiving, the problems are complex and the stakeholders are many. Understanding the issue of digital archiving is important for librarians at all levels as local collection development and preservation decisions are being made. There are no standards and no agreed-upon solutions. One must know whom the stakeholders are, the technological problems involved in archiving and retrieving digital information, the current recommendations for archiving digital information, the costs involved, and some of the groups working for a solution. Although the concern of this author is primarily the preservation of government information, references to non-governmental collaborations will also be given.

The problems are not all technological, though the technological problems are many. Feeney (1999, p. 108) thoroughly describes the stakeholders as authors, publishers, libraries, archive centers, distributors, networked information service providers, IT suppliers, legal depositories, consortia, universities and research funders. Feeney also suggests considering the relationship of the stakeholder to the digital material: "initiators, who are involved in collection development; regulators, such as those bodies involved in the legal deposit system or copyright legislation; creators of digital records; rights owners; fund holders, who manage the funds available for preservation activity; providers of electronic publications and new or repackaged editions; readers, who require access to digital material; and archivists, who are concerned with conserving digital material and maintaining its integrity." Each stakeholder is involved at a different stage of the "life-cycle" of the digital resource and may not be considering the effect on a stakeholder at another stage, thus requiring a more coordinated effort on the part of the stakeholders. Feeney (p. 112-113) has summarized the main stages in the life-cycle concept developed by the Arts and Humanities Data Service (AHDS) as: 1. Data creation; 2. Data management and preservation (including Acquisition, retention or disposal; Data structure; Data description and documentation; Data Storage; Data preservation); 3. Data use; and 4. Rights management.

At the national level, the U.S. National Commission on Libraries and Information Science (NCLIS) has recognized problems regarding the preservation of government information that have been compounded by recent technological developments. Included in the problems is the lack of an overall organized plan for preserving digital government resources. A narrative description of the legislative proposal by NCLIS, The Public Information Resources Reform Act of 2001, appears in Appendix 11 of volume 2 of A Comprehensive Assessment of Public Information Dissemination (http://www.nclis.gov/govt/assess/assess.vol2.pdf). The proposed legislation advocates reforming the federal government's public information structure to bring together "in a systematic fashion all of the key elements necessary for a comprehensive public information resources management program and to elevate the importance of federal government public information resources to the status of a strategic national asset (NCLIS, 2001, p. 2-2)." NCLIS recommends harmonizing the information resources management policies, programs and practices at each stage of the information life cycle due to the inseparable inter-relationship of government agencies. This harmonization or collaboration directly relates to the technological issues to be discussed ahead because of the need for government agencies to provide uniform effectiveness across agencies within all of the digital resource life-cycle stages. Included in this detailed report are recommendations for surveying preferred user formats for data (including bibliographic, graphical, numerical, sound, spatial, textual, video and multimedia data types); for surveying patterns of user preference for format types (including database, spreadsheet, tagged markup, image, audio, video, text and word processing formats); and for tracking online approaches to information (user interfaces supported, web design approaches, bulletin board systems). NCLIS suggests that both the opportunities and the challenges of technological developments need to be approached from an inter-branch, intergovernmental, and interagency direction in order to ensure future interconnectivity.

Standards

There is quite a standards debate since "no computer technical standards have yet shown any likelihood of lasting forever" (Bearman, 1999). However, those recommending an adherence to standards use the rationale that standards "can assist by facilitating the transfer of information between hardware and software platforms as technologies evolve" and "resources which are encoded using open standards have a greater chance of remaining accessible after an extended period than resources encoded with proprietary standards" (PADI, Standards, 2001). A thorough discussion of standards, including a lengthy bibliography, can be found at PADI: Preserving Access to Digital Information, Standards, http://www.nla.gov.au/padi/topics/43.html.

Descriptive metadata has no agreed-upon standard. The Colorado Digitization Project (http://coloradodigital.coalliance.org/glossary.html#M) defines metadata as: "data about data or information known about the image in order to provide access to the image. Usually includes information about the intellectual content of the image, digital representation data, and security or rights management information." The OCLC/RLG Working Group on Preservation Metadata (http://www.oclc.org/digitalpreservation /presmeta_up.pdf) reviews the concerns that accompany transferring more traditional bibliographic cataloging practices into the electronic world. Typical metadata standards are US MARC and the emerging scheme, Dublin Core. Research is being conducted to attempt to develop a uniform standard (see OCLC/RLG Working Group on Preservation Metadata), which Bearman (1999) states must exist for any of the electronic preservation models to succeed. "Serious proposals for metadata encapsulation strategies need to address how the required metadata will be identified, created or captured at the time of the creation of the records; by what means it will be stored in inviolable conjunction with the record contents; how it will support the use of the record by authorized users over time; and by whom, where, and at what costs the infrastructure for record keeping will be constructed and maintained" (Bearman, p. 4).

Costs

Feeney (1999, p. 116-120) gives a thorough breakdown of cost considerations based on one of the studies commissioned by the Digital Archiving Working Group (DAWG) and a summary will be given here. "One clear message that has emerged is that a great deal of money can be wasted if digitization projects are undertaken without due regard to long-term preservation. It is now relatively easy to produce digital versions of texts or images. However, if there is no plan in place for archiving the digital files, long-term preservation will be expensive, or may even result in the work having to be repeated" (Feeney, p. 120). As it is difficult to isolate preservation costs within the life-cycle of a digital resource, costs associated with all elements in the life-cycle of the digital resource are considered. The following cost model summary defines seven key areas: data creation; data selection and evaluation; data management, including data documentation, validation, structure and storage; resource disclosure; data use; data preservation; and rights management.

Cost model summary: Data creation costs: A key to this stage is providing adequate documentation of the digital resource. Data selection and evaluation: Acquisition decisions include how easily a digital resource can be managed, catalogued, accessed and preserved. Data management including data documentation, validation, structure and storage: Documentation, the description of the "structure, contents, provenance and history" (Feeney, p. 117), must be checked, edited, added to if necessary, made available to users and kept up to date. Validation involves the periodic assessment of the resource and the copying and refreshing necessary for preservation. The structure refers to the original format of the resource and will determine the costs involved for providing future storage and access. Available resources determine storage, by data volume and by the choice of preservation and use. (Feeney, p. 118-120, also describes in detail the high costs of rescuing data, or "digital archaeology.") Resource disclosure: These costs, though not necessarily involved with preservation, involve "discovering, extracting and preparing the object for use" (p. 117). Data use: The structure of the digital resource will determine the costs of delivering the resource to end users and could involve online or CD-ROM access. Data preservation: The main costs include "agreeing on the preferred standard formats; testing the conversion for a specific category of resource; running the conversion as a batch process; testing a sample of converted resources; deleting the old versions if required; copying the resulting files" (p. 118). Rights management: Consideration must be given to intellectual property rights and the legal issues of data protection and confidentiality, which determine issues of access, use, and legal preservation. These potentially substantial costs can actually be the highest cost of digital archiving.

One cost model is the Yale University Libraries Project Open Book, designed to study the costs of converting the printed text and accompanying materials in 10,000 brittle books to digital image (Butler, 1997, p. 73-74). One of the "realities" that became clear following this analysis of digital storage costs is that "the digital world not only makes collaboration possible, it may make it economically imperative ... [forcing us] to think about the economics of digital libraries not as single institutions, each trying to build the digital mega collection, but as a system of digital libraries and archives that works collaboratively to acquire, describe, disseminate, preserve and store information resources which may be individually or jointly owned" (p. 74).

Project Open Book investigators expected to find that both digital storage and access costs would be cheaper than the costs of storage and access in a traditional paper-based library. However, the results of the study showed that unit costs for storage were more than 12 times higher, and for access 50% higher in the digital archive than in the traditional library. These results were true in the first year of operation and continued to be true for storage costs, though to a lesser degree projected over ten years, even when staff and overhead costs for the traditional library were taken into consideration. Clearly this economic analysis favors the traditional library. On the other hand, if we think about the digital library as a fundamentally different kind of organization which needs to be structured, organized, and managed in a different way, a different picture begins to appear.

When Yale modeled the costs for a distributed network-based system of archives rather than for a single institutional model, the cost comparisons begin to improve significantly. Access costs per volume evened out in the 4th year and favored the digital archive by 57% in year 10. Even then, however, the digital archive began to be less expensive than the traditional library for storage costs only in the 7th year.

Additional cost discussions that include actual monetary figures are found in Wiggins (2001, Spring) and Lee (2001, May).

RUSHING AHEAD BEFORE WE'RE READY

Karen Hunter, Senior Vice President, Elsevier Science, Inc., and an original member of the Task Force on Archiving of Digital Information convened by the Research Libraries Group and the Commission on Preservation and Access in 1994 (see Appendix), states that "there is no magic bullet in electronic archiving. Those of us who are spending large chunks of our professional time on the topic know that it will require a lot of trust and good-faith effort to continue to move things forward. It is too important and too expensive to be left to chance" (Hunter, 2000, final paragraph). David Bearman of Archives & Museum Informatics (1999, p. 1) is troubled by the suggestion that a magic bullet solution ("a simple, universally applicable, one-time fix") has even been proposed.

Digital Preservation Strategies

The U.S. National Archives and Records Administration

Deserving of special attention in the discussion of standards and digital preservation strategies is the particular challenge faced by the U.S. National Archives and Records Administration (NARA) because of its responsibility to "preserve and deliver authentic records to subsequent generations of users" (Thibodeau, 2001).

What differentiates records from documentary materials in general is not their form, but their connection to the activities in which they are made and received. If this link is broken, corrupted, or even obscured, the information in the record may be preserved, but the record itself is lost. This fundamental difference between records and documents can be readily illustrated empirically. For example, a map of Sarajevo is a document, but a map of Sarajevo known to have been used in making a targeting decision that led to the bombing of the Chinese Embassy is an essential record of that action. The key difference between the document and the record is the specification of the context of action in which the record was involved. To preserve authentic records entails preserving the documents themselves and also their connections to the activities in which they were used. ... To preserve records means to preserve them in their original order. To extend the National Archives of the United States into the digital era, then, entails being able to preserve the content, structure and context of the records. When any of these elements can only be expressed in digital form, the records must be preserved in that form. For NARA, as for other archival institutions, the difficulty of doing so is compounded by the commitment to preserve records permanently. ... The wholesale absence of proven methods for digital preservation presses acutely on NARA. But NARA is not only responsible for preserving unique historical materials, but also for guiding all other federal agencies in creating and managing all of the records they need in performing their functions. The requirements for managing active records in support of the specific needs of ongoing activities are significantly different from those entailed by the objective of preserving and delivering authentic records to future users whose interests, objectives, methods and tools are essentially unknowable. (Thibodeau, 2001, p. 1-2)

A discussion of the cooperative research into Persistent Object Preservation, being considered by NARA to address its unique situation, appears in the Appendix.

Migration

Hunter (2000) notes that in international discussions regarding archiving issues there is a presumption that for online journals, migration will be the methodology of choice. However, a great number of questions still need to be answered and she suggests that "until those questions are resolved, libraries will be understandably reluctant to make a permanent switch from paper to electronic collections. What should be archived? In what format? How many copies of the archive are needed? Who holds those copies? What is the access to the archive and who controls that access? How does licensing affect archive building? What can the scholarly community afford?" (Hunter, par. 3)

Keeping those questions in mind, migration is defined as the "periodic transfer of digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation" (PADI, 2001, Migration). For example, the information on a floppy disk may be transferred to a CD-ROM format, offering only a temporary preservation since the CD-ROM format must then be migrated when the technology changes again. The digital information must be refreshed without changing it and in a new operating environment the copy is not exactly the same as the original, requiring decisions about the aspects that need to be preserved. Metadata can assist here in providing information about migrations and the effect on the digital object. In some cases, software that is 'backwards compatible' can simplify the migration process (the most recent version of the software having the capability of decoding the files created in the earlier version). And systems that are interoperable will also help. However, there is no guarantee as to the compatibility over time as technological developments become increasingly complex and/or it is no longer financially worthwhile for a software manufacturer to support such compatibilities. Some question the practicality of migration while some point out that each new format will require a unique solution.

Migration discussions include the most basic strategy of changing media and transferring from the digital mode to a more stable, controlled environment, the most extreme version being the preservation on paper or preservation quality microfilm. Although an archival quality paper or microfilm record can last up to 500 years (Lyons, 2001), the advantage of preserving a digital record is that the print or microfilm record may not be able to adequately represent the original object as the digital functionality of the resource can be destroyed. Feeney (1999, p. 114) mentions the computation capabilities, graphic display or indexing that can be lost, citing the equations embedded in a spreadsheet, and the impossibility of printing out an interactive full motion video or preserving a multimedia document as a "flat file". Concerns over data loss and the loss of functionality or the 'look and feel' of the original platform are still of a concern regarding the migration method.

Emulation

Those concerned about the drawbacks to migration view emulation as the alternative, superior method. "The essential idea behind emulation is to be able to access or run original data/software on a new/current platform by running software on a new/current platform that emulates the original platform" (Granger, 2000, para. 2). Granger and Bearman (1999) provide thorough reviews of the emulation option, which is championed by Jeff Rothenberg (1998).

Encapsulation

This technique has been proposed as a strategy to be used in conjunction with other methods in order to interpret content using new systems over time. "Encapsulation can be achieved by using physical or logical structures called 'containers' or 'wrappers' to provide a relationship between all information components, such as the digital object and other supporting information such as a persistent identifier, metadata, software specifications for emulation" (PADI, 2001, Encapsulation).

Conclusion

Those of us on the DANJ State Documents Interest Group became concerned enough to educate ourselves more thoroughly about digital preservation issues and to produce a report of concern for the State Librarian. From around this country, others have responded to the usefulness of this report to assist them in their own digital preservation discussions. We continue to discuss the problem. This is an essential first step for any organization. In an effort to recognize all of the stakeholders and to successfully address all of the stages of the life-cycle of a digital resource, all members of the organization need to embark on the discussion. The Online Computer Library Center (OCLC) and Research Libraries Group (RLG) have developed the Digital Preservation Commons (DPC), intended to promote discussion about digital preservation and archiving issues. The goal of DPC is to identify best practices for preserving digital objects. The URL for DPC is http://www.oclc.org/digitalpreservation.

Become aware of regional, state, national and global partnerships that can be a model for working collaboratively on the issue of digital archiving. Soate (1997, p. 15) suggests stepping back, considering the organizational mission and "how it might best be served through digital preservation programs; to recall and understand what is known already in the organization about digital preservation -- the experiences everyone has had, for example, with migrating digital data from one system to another; to consider analogous experiences with preserving the print heritage -- what are the lessons to be learned?" Solutions to the problems of digital archiving are years away. Beware of the promises of electronic publishers and, yes, keep the print version; that is, if it's still available.

Appendix (reviews some of the partnerships underway)

Partnerships

State-level

An example of a partnership that attempts to recognize all of the stakeholders is the Find-It! Illinois Program, one of the state-level Government Information Locator Service (GILS) programs around the United States. A strength of the Illinois' program is its uniform metadata tagging, which was accomplished by establishing liaisons between the state agency librarians and their own agency webmasters. Through this liaison relationship the web content creators were informed of access issues from the users' viewpoint, the need for standardized metadata and the need for future information retrieval considerations in addition to immediate ones. A tool provided by the Illinois State Library, the Metadata Generator (http://www.finditillinois.org/metadata/ webmasters.htm), facilitates the process. Thus, the webmasters or authors embed the content description at the point of origin and have recognized the benefit of enabling users to find their content (Craig, 2001). Because established controlled language such as the Library of Congress Subject Headings (LCSH) did not accurately describe the content or structure of state government web documents, the Illinois State Library also provides controlled language via the "Jessica Tree" (see the above website), a common subject hierarchy created for the proposed purpose of enabling interstate government information access. At least the first three life-cycle stages summarized by Feeney (1999) have been addressed within this partnership.

National level

There are many national digital preservation efforts on a variety of scales. Butler (1997, p. 64-75) provides the background for the following partnerships: The Commission on Preservation and Access in partnership with the Council on Library Resources; the Digital Preservation Consortium, a partnership of eight of the nation's largest research libraries; the Online Computer Library Center, Inc. (OCLC); the Research Libraries Group; the National Digital Library Federation, now the Digital Library Federation (DLF); the Library of Congress National Digital Library Program (NLDP); JSTOR; a joint task force on digital archiving formed by the Commission on Preservation and Access (CPA) and the Research Libraries Group (RLG). The CPA/RLG task force produced a report, Preserving Digital Information: Report of the Task Force on Archiving of Digital Information (1996), which heightened awareness of the seriousness of digital preservation issues and encouraged international discussion of the issues. The Association of Research Libraries (ARL) Preservation Program website (http://www.arl.org/preserv) describes several collaborative efforts.

The list of national and international partnerships is lengthy. Soete (1997) describes the background of several, including Cornell University and the National Agricultural Library (NAL). Partnerships of the Federal Depository Library Program of the U.S. Government Printing Office (GPO) include the GILS programs mentioned above and partnerships with OCLC, e.g., the OCLC Electronic Archiving Pilot Project: FDLP/ERIC Digital Library Pilot Project.

NARA and Persistent Object Preservation

NARA has pursued collaborative relationships with six key partnerships forming the core of its Electronic Records Archives (ERA) Program. The Open Archival Information System (OAIS) Reference Model, an international effort, lays the foundation, originating for the purpose of addressing data requirements in the space science community and spearheaded by NASA, but now focusing on any system responsible for long-term preservation of information. The second foundation is the International Research on Permanent Authentic Records in Electronic Systems (InterPARES) project, which involves representatives of ten national archives. The core of the ERA program is the Distributed Object Computation Testbed (DOCT), originally an interagency collaboration between the Department of Defense's Advanced Research Projects Agency and the U.S. Patent and Trademark Office, with NARA joining the collaboration in 1998. NARA's specific concern is the long-term retention of records "created, communicated and managed in advanced, high-performance computing environments." In response to NARA's concern, the San Diego Supercomputer Center (SDSC), one of the primary research centers involved in DOCT, has invented a preservation method that is actually "an information management architecture built around the objective of preservation of arbitrarily structured sets of virtually any type of electronic record" (Thibodeau, 2001, p. 3). The architecture and preservation method were originally referred to as "Collection-Based Persistent Object Preservation," then "Persistent Object Preservation," described in D-Lib Magazine in 2000 (see Moore), and the now-enriched architecture have come to be known as "Knowledge-based Persistent Object Preservation." "The 'objects' that can be preserved under this approach can be any digital information that needs to be preserved. For archives, this ranges from individual records, to files of records, entire series of files, and ultimately to entire archival funds; that is, the totality of records created by a person or organization. Key to the ability of the persistent object approach is that it handles in a consistent fashion any arbitrarily complex object at any level of an arbitrary structure. The essential process is to transform the object to a persistent form. This entails identifying and characterizing all significant properties of the objects that are to be preserved. These properties are expressed in formal models. For example, individual records are modeled according to XML Document Type Definitions (DTDs)" (Thibodeau, 2001, p. 8-9). Based on successful repeated and consistent demonstrations, NARA regards the approach as "the most promising one ever suggested for preserving digital information in general, and electronic records in particular... Persistent Object Preservation offers substantial promise for the survival of information assets, whether they are needed for twenty-five years, seventy-five, or forever" (Thibodeau, 2001, p. 4). NARA has now joined the National Science Foundation as a cosponsor of its National Partnership for Advanced Computational Infrastructure (NPACI) program. NARA is now supporting additional research though NPACI into the development of Persistent Object Preservation. Other collaborative research projects are described by Thibodeau, including an approach for applicability in smaller institutions, such as state and university archives. Thibodeau stresses that the demonstrations and prototypes developed by the Electronic Records Archives collaborations are under development and predicts that it will without a doubt be several years before the archives of the future vision will actually become operational.

Global level

Globally, both the United Kingdom and Australia appear to be in the forefront of the preservation debate. In the UK, the Management Committee of the National Preservation Office (NPO) responded to the action points elicited from a 1995 national conference of the Joint Information Services Committee of the Higher Education Funding Councils (JISC) and the British Library. The result was a program of studies on digital archiving administered by the British Library Research and Innovation Centre and funded by JISC. The working group for the program of studies is the Digital Archiving Working Group (DAWG). The first study by DAWG was the analysis of the CPA/RLG task force report mentioned above to determine its applicability in the UK. The aim of the committee is "to persuade those who are about to embark on digitization projects to consider the long-term archiving of the files they are about to create, and to encourage those bodies that are about to fund digitization programs to ensure that all proposals include a workable archiving strategy" (Feeney, p. 108). JISC is also funding the CEDARS (CURL Exemplars in Digital ARchiveS) project, which is expected to produce recommendations, guidelines, and models for establishing digital archives.

The National Library of Australia provides current articles, excellent bibliographies and links to projects and case studies at the Preserving Access to Digital Information (PADI) website (http://www.nla.gov/au/padi). PADI receives advice and guidance from an international advisory group, whose members include representatives from RLG, the Library of Congress and the Andrew W. Mellon Foundation in the United States. In addition to providing a gateway to digital preservation resources, there is an associated discussion list for an idea exchange. One of the objectives of PADI is "to facilitate the development of strategies and guidelines for the preservation of access to digital information" (http://www.nla.gov.au/padi/about.html).

WORKS CITED

Bearman, D. (1999). Reality and chimeras in the preservation of electronic records [Online]. D-Lib Magazine 5 (4). Available: http://www.dlib.org/dlib/april99/bearman/04bearman.html [2001].

Butler, M. (1997). Issues and challenges of archiving and storing digital information: Preserving the past for future scholars. Journal of Library Administration 24 (4), 61-79.

Commission on Preservation and Access (CPA) and the Research Libraries Group (RLG) (1996). Preserving Digital Information: Report of the Task Force on Archiving of Digital Information. Washington, DC: Commission on Preservation and Access.

Cowell, Jacobs and Peterson (2001, October). Managing Digital Content in the FDLP: Frequently Asked Questions. (request by e-mail from either: ecowell@ucsd.edu, jajacobs@ucsd.edu or kapeterson@ucsd.edu)

Craig, A. (2001). Bridging the digital divide: State government as content provider, The Illinois experience. Institute of Museum and Library Services [Online]. Available: http://www.imls.gov/pubs/wbws01cp7.htm [2001].

Feeney, M. (1999). Towards a national strategy for archiving digital materials. Alexandria 11 (2), 107-122.

Graham, P. (date n.a ). Intellectual preservation and electronic intellectual property [Online]. Available: http://www.ifla.org/documents/infopol/copyright/graham.txt [2001].

Granger, S. (2000, October). Emulation as a digital preservation strategy [Online]. D-Lib Magazine 6 (10) Available: http://www.dlib.org/dlib/october00/granger/10granger.html [2001].

Hunter, K. (2000). Digital archiving [Online]. Serials Review 26 (3), 3 pp. Available: Academic Search Premier. EBSCOhost. Rider University Libraries. 13 June 2001 .

Lee, S. D. (2001, May). Digitization: Is it worth it? [Online]. Computers in Libraries 21 (5), p. 28, 4 pp. Available: Academic Search Premier. EBSCOhost. Rider University Libraries. 24 October 2001 .

Lyons, S., ed. (2001). Staying digital: Recommendations on preserving New Jersey government information in the digital age. Report of the State Documents Interest Group of the Documents Association of New Jersey. Available: http://www.danj.org/DANJ.

Moore, R. et al. (2000, March) Collection-based persistent digital archives [Online]. D-Lib Magazine 6 (3) [Part 1] Available: http://www.dlib.org/dlib/march00/moore/03moore-pt1.html; (4) [Part2] Available: http://www.dlib.org/dlib/april00/moore/04moore-pt2.html [2000].

NCLIS, U.S. National Commission on Libraries and Information Science (2001). A Comprehensive Assessment of Public Information Dissemination. vol. 1, containing the executive summary, the report and Appendices 1-10: Available in print and in electronic form: http://www.nclis.gov/govt/assess/assess.vol1.pdf; vol. 2, containing Appendices 11 and 12, the Legislative and Regulatory Proposals: Available in print and in electronic form: http://www.nclis.gov/govt/assess/assess.vol2.pdf; vol. 3, containing Appendices 13-34, the Supplementary Reference Materials: Available only in electronic form: http://www.nclis.gov/govt/assess/assess.vol3.pdf; vol. 4, containing Appendix 35, Compilation of Recent Statutes Relating to Public Information Dissemination: Available only in electronic form: http://www.nclis.gov/govt/assess/assess.vol4.pdf; other related documents: Available in electronic form: http://www.nclis.gov/govt/assess/assess.html; the Executive Summary: Available in print and in electronic form: http://www.nclis.gov/govt/assess/assess.execsum.pdf. Washington, DC: U.S. National Commission on Libraries and Information Science.

OCLC/RLG Working Group on Preservation Metadata (date n.a.). Preservation Metadata for Digital Objects: A Review of the State of the Art. [Online]. Available: http://www.oclc.org/digitalpreservation/presmeta_wp.pdf [2001].

O'Mahony, D. P. (1998). Here today, gone tomorrow: What can be done to assure permanent public access to electronic information? Advances in Librarianship 22, 107-21.

PADI (Preserving Access to Digital Information) (2001). About PADI [Online]. Available: http://www.nla.gov.au/padi/about.html [2001].

PADI (Preserving Access to Digital Information) (2001). Digital preservation strategies [Online]. Available: http://www.nla.gov.au/padi/topics/18.html [2001].

PADI (Preserving Access to Digital Information) (2001). Encapsulation [Online]. Available: http://www.nla.gov.au/padi/topics/20.html [2001].

PADI (Preserving Access to Digital Information) (2001). Migration [Online]. Available: http://www.nla.gov.au/padi/topics/21.html [2001].

PADI (Preserving Access to Digital Information) (2001). Standards [Online]. Available: http://www.nla.gov.au/padi/topics/43.html [2001].

Rothenberg, J. (1998). Avoiding technological quicksand: Finding a viable technical foundation for digital preservation [Online]. The Council on Library and Information Reports. Available: http://www.clir.org/pubs/reports/rothenberg/contents.html [2001].

Soete, G. (1997). Transforming libraries: Issues and innovations in preserving digital information. Systems and Procedures Exchange Center SPEC Kit 228.

Thibodeau, K. (2001, February). Building the archives of the future: Advances in preserving electronic records at the National Archives and Records Administration [Online]. D-Lib Magazine 7 (2). Available: http://www.dlib.org/dlib/february01/thibodeau/02thibodeau.html [2001].

Wiggins, R. (2001, Spring). Digital preservation paradox & promise [Online]. Library Journal 126 (7), p. 12, 4 pp. Available: Academic Search Premier. EBSCOhost. Rider University Libraries. 18 October 2001 .

 

Back to the Table of Contents for issue 19-20.
Back to the list of articles available on this site.

 


Copyright Progressive Librarian, 2002