Digital preservation is an emerging field in the digital age. It concerns itself with the long-term preservation of digital data beyond the current data structures, formats and storage media. The UK’s Digital Curation Centre organised an international conference in this field which provided me with a steep learning curve.
The term ‘ingest’ was totally new to me. It denotes all [human] processes that go into preservation. It’s front-loaded to a digital archive or repository, which is where the data goes. It is a summary term for all activities such as checking the integrity, duplicating, describing, applying metadata, cataloguing, etc, etc. the whole lot.
There is a cost even to taking a decision of whether to ingest or discard something. In the current environment discarding is always cheaper than ingesting. But what if it was cheaper to ingest than to make a decision? And what if there were analysis tools that make huge collections easy to search? The example that came to my mind was the Google Gmail approach of providing users with a super search tool instead of folder structures.
Maybe humanly applied metadata will eventually become a thing of the past as I outlined in an earlier posting. There are certainly efforts being undertaken of producing as much metadata as possibly in an automated way. New laboratory equipment has metadata creation functionality built in, advanced picture searches are beginning to analyse the semantic content of images rather than seeing only pixels. So maybe – just maybe – in future we need to supply less of these descriptive data ourselves.
The key standard in the area of preservation is the OAIS (Open Archival Information System) reference model for long-term preservation. It covers the most vital parts of the processes and defines key elements, such as that ‘information’ is not only a string of bits, but must be usable. Basically the OAIS model bundles the digital object with representation information that contains all the necessary explanations to understand the object.
The OAIS also describes the managing process required for preservation, starting with the creator submitting a SIP (Submission Information Package). This undergoes the ingest process into the archive where it is stored and managed. At the other end of the OAIS is the user retrieving a DIP (Dissemination Information Package). This is not necessarily identical to the SIP. What is in the archive is an AIP (Archival Information Package) as well as a PDI (Preservation Description Information) that is the documentation how it was preserved.
One question I put forward was how the model links different digital objects. Sometimes an object does not make sense unless it is in a complete set. Take for example an astronomical photograph of a quadrant in the sky. Only together with the shot taken 10 minutes later it becomes evident that there is a moving comet in the photograph. However, the answer to my question was that the OAIS does not link objects – perhaps a weakness?! The OAIS also does not specify a taxonomy to facilitate retrieval. It leaves this to the implementation.
However, I found out a little later that some data models that are mapped onto OAIS such as the PREMIS (Preservation Metadata Implementation Strategies) does reference Intellectual Entities that consist of multiple objects, e.g. a movie that consists of an audio and a video track which can be treated as individual objects. PREMIS is a spec that has widespread acceptance and is implemented a lot. Many projects aim to be PREMIS conformant.
An interesting and contrasting concept to other repository ideas is that a file under OAIS is not modifiable. Modification leads to a new file, i.e. a new object. So no versioning issues there but it requires a separate ingest from scratch.
I also found it interesting that curators look at their work from the perspective of a string of bits. If you were to unearth a string of bits in 20 years time, what tools would you have to be able to understand this ‘information’.