The /pkg Hierarchy

Introduction

This document describes a filesystem organizational technique that solves several problems associated with software package management and distribution under a Unix-like operating system. Though the document uses examples from development in a GNU/Linux (hereafter refered to simply as "Linux") environment, it is straightforward to mimic the process on other Unix systems.

Motivation

The original motivation for the /pkg hierarchy was to find a generic solution for situations such as this:

To install package A, I needed library L version n (L.n), but I only had version m (L.m) installed. So I download and installed L.n, but this overwrote L.m, which broke package B. In order to upgrade package B to work with library L.n, I had to perform a system-wide (distribution) upgrade, which left package C in an ususable state. So I downloaded the source to package C, but when I tried to compile it agains library L.n, it reported the following errors... [etc]

A brief search through the Web or Usenet reveals that this is hardly an uncommon situtation, and that no Linux distribution is entirely immune to this problem of "dependency management".* The approach Linux distributors have generally taken in solving this problem is to find a collection of software packages that more-or-less work together, and then version the collection (i.e. give a version number to the distribution). However, there are problems with this approach: The two most prominent problems being that (1) it is often difficult to integrate new software packages that were not in the original distribution, and (2) third-party library version upgrades can potentially put the entire system into an unstable state.

Problems Addressed

The /pkg hierarchy has its roots in being a solution to dependency management; however, it turns out to be an adequate solution for several common problems:

While many of these problems have already been solved independently, the advantage of the /pkg hierarchy is that it simultaneously addresses all of these problems in an elegant and comprehensive manner.

Technical Overview

The /pkg hierarchy derives its name from the way packages are installed on the system. Every time a package is compiled from source, it is installed in a unique location similar to the following:

/pkg/glibc/2.2.5/.karmaki686/.000

These path elements will be referred to in this document as:

It is beneath a path like this that all files related to a given package are confined. The traditional root-level directories are re-created as subdirectories here, giving something like:

/pkg/glibc/2.2.5/.karmaki686/.000/
                                 |-bin/
                                 |-etc/
                                 |-include/
                                 |-lib/
                                 |-var/

Once a package is installed using this technique, symlinks are created to the package subdirectories all the way up the hierarchy. The resulting structure looks like the following:

/pkg/glibc/
          |-bin -> 2.2.5/bin/
          |-etc -> 2.2.5/etc/
          |-lib -> 2.2.5/lib/
          |-2.2.5/
                 |-bin -> .karmaki686/bin/
                 |-etc-> .karmaki686/etc/
                 |-lib -> .karmaki686/lib/
                 |-.karmaki686/
                 |            |-bin -> .002/bin/
                 |            |-etc-> .002/etc/
                 |            |-lib -> .002/lib/
                 |            |-.001/
                 |            |-.002/
                 |
                 |-.johndoei386/
                               |-.000/
                               |-.001/

Directory Explanations

Symlinks

Consider the ldd output from the ping binary:

karmak@ariel$ ldd /bin/ping
    libm.so.6 => /pkg/glibc/2.2.5/.karmaki686/lib/libm.so.6 (0x40016000)
    libreadline.so.4.1 => /pkg/readline/4.3/.karmaki686/lib/libreadline.so.4.1 (0x40033000)
    libresolv.so.2 => /pkg/glibc/2.2.5/.karmaki686/lib/libresolv.so.2 (0x40059000)
    libnsl.so.1 => /pkg/glibc/2.2.5/.karmaki686/lib/libnsl.so.1 (0x40068000)
    libncurses.so.5 => /pkg/ncurses/5.2/.karmaki686/lib/libncurses.so.5 (0x4007f000)
    libc.so.6 => /pkg/glibc/2.2.5/.karmaki686/lib/libc.so.6 (0x400c1000)
    /pkg/glibc/2.2.5/.karmaki686/lib/ld.so => /pkg/glibc/2.2.5/.karmaki686/lib/ld.so (0x40000000)

What we see here is that packages in the /pkg hierarchy are not linked against the standard locations (/lib and /usr/lib), but instead are linked against the distribution directories. Thus it is possible to have different applications linked against different library versions, even when those libraries share the same name. By taking the linking as far as the distribution directory, we can support multiple distributions under the same hierarchy, and cross compilation becomes simply a matter of a few changes to the standard build scripts. Furthermore, by not linking against the build directories, we are free to rebuild a package as many times as necessary, and freely experiment with cross-distributor package compatibility.

The symlinks may appear to be a point of vulnerability in the system, but this is not the case. As the ldd output shows almost all of the symlinks are there for the user's convenience. The only exceptions are the symlinks to the build directories, which require only a statically linked version of 'ln' or 'sash' to repair. The alternative, overwriting files during an upgrade, is no any less error-prone and much harder to fix when things go wrong.

Benefits

Because of the highly structured layout, it is easy to write scripts that automate everything from the build procedure to nightly backups. In the long run, this structure is much more efficient than the traditional filesystem hierarchy. Some examples of the efficiency and power:

  1. Going from the author's source code on a remote server to a ready-to-redistribute binary build typically takes less than ten commands. Subsequent builds of the same source are fully automated. All build information is automatically embedded in the binary distribution. Recipients of the binary package can repeat the entire build with three commands.
  2. It is virtually impossible for anything on the system to break as a result of installing a package. So, no more dependency problems. Need three different versions of glibc installed? No problem.
  3. Assigning every package (at the name level) a unique user ID can be fully automated and dynamically managed. Thus ends the nobody/nogroup fiasco.

Michael Carmack
karmak@karmak.org