Formats for long-term preservation

In science, data should be archived in a format, that enables readability in the future.

Why is the format important for long-term preservation?

It's easy to create digital objects and there is almost no need for any background information about file formats. Accordingly, innumerable objects are generated everyday. Especially in science it's important to save these objects for as long as possible. Scientific data has to be re-usable in the future. If an object is lost, it's often impossible to re-create it. 

Since hard- and software is changing constantly, there is always the risk for data becoming unreadable.

Choosing the right format, that can be converted and migrated if necessary, is very important for archiving data for a long time period.

What is important for long-term preservation regarding file formats?

It's not possible to say which formats will be used in the future, but there are common guidelines, that are important for choosing the file format. 


It's recommended to avoid proprietary formats. Proprietary means, that the format is owned by a corporation or an individual or that it is under a trademark. In that case it depends on the developers or developing corporation, how long the format can be used. When a format is open, it means that the documentation is freely available so that it could be used in the future.

Sometimes corporations use open formats to add proprietary features. (f.e. PDF: When generating a PDF with Adobe software, some functions require the Acrobat Reader)


It's not about transparent background, but if it's possible to analyze an object directly, for example if it's possible to open/read it using a text-editor. Textual content should be encoded in UNICODE and archived in a way that is readable for humans. 


The goal is to make the content available for a large number of users. Therefor it's desirable, that a format can be used/ is used world-wide. It's helpful to gather information about what formats other organizations in science are using. 


Lossless formats contain all of the object's original data, while lossy formats often lose some of the date using compression (to make files smaller and easier to download). Even if it's not visible to the human eye, that data is lost. Regarding long-term preservation we want the data to be available for a long time. When technical surroundings change it can become necessary to migrate data,which may cause further data loss. 

To save data in the best possible condition is important for a long data-life.


A format should be documented or based on standards. In the best case, standards have a proper documentation and are widely used.

Support of Metadata

Metadata are describing the data. They are important for finding and understanding objects. Some formats allow metadata to be stored directly into the object.