Log in


Lars Wirzenius: New format to replace tar: some notes

Posted: September 3, 2012 / in: Linux / Comments Off

During lunch the other day, I discussed the shortcomings of the
tar file format with friend and co-worker Daniel. The tar file
format has a lot of legacy by now, and it’s not quite up to date
with the latest developments in file systems, such as extended
attributes. This makes tar badly suited for things such as backups
and other situations where precise reproduction of the input
data matters.

There are several variants of the tar file format, and various
more or less standard extensions to it. GNU tar, for example,
added support for pathnames longer than 100 bytes many years
ago, and it is now commonly supported.

Other problems in the tar file format:

  • It has no native support for compression. The Unix Way is to
    use an external compressor, which is nice, but it makes it
    necessary to decompress the entire file to get a list of its
    contents. For large archives, this is very time consuming.
  • Even when uncompressed, the file format works badly for some
    kinds of operations, such as deleting files from the archive,
    or updating them with new versions.
  • The file format is entirely linear. When creating a tar file, it
    would sometimes be possible to write data from multiple sources
    at the same time, perhaps compressing them separately, maybe with
    file type specific compressors. With a linear format, this is
    not possible without spooling some files into temporary files.
    An interleaved format, similar to multimedia files, which
    mix audio and video data into a single stream, would make it
    possible to be more efficient at writing.
  • The supported meta data for files is limited, and it’s hard to
    extend the support without breaking the file format.

This led us to discuss the possibility of a new file format. We
had a bit of fun exploring the solution space for a while.

However, almost all use of tar these days is for distributing
sets of files, where the filename and basic set of file permissions
is enough. In other words, for things such as source code, tar is
just fine. The archives are small enough, and the other limitations
are rarely a problem, but the pain of switching to a new format would
be great. Thus, with some reluctance, we concluded that a new format
would be a waste of time.

But I thought I’d write this up anyway, in case one of my readers
wants to start working on this.

Article source: Go to Source
Feed source: http://planet.debian.org/rss20.xml
License: The original licenses are retained

© Copyrights and Licenses, 2014 - Linux-Support.com The Professional Linux and OSS Services Portal