Russell Coker[HREF1], Coker Software russell@coker.com.au
This paper describes the problems related to shared directories such as /tmp and /var/tmp as well as problems related to having multiple SE Linux security contexts used for accessing a single home directory. It then provides detailed information on the solution to this problem that has been implemented with polyinstantiated directories by using the pam_namespace module.
It is a long-standing Unix tradition that the directories /tmp and /var/tmp are used for temporary storage by all programs and on behalf of all users. This used to not be considered a problem, however in recent times it has been recognised that the use of such a shared directory is vulnerable to race-condition attacks with symbolic links.
Another problem is that in some situations a file name may convey secret information. If the file in question is in a public directory such as /tmp or /var/tmp (which may be an unintended result of a command by the user) then this will represent an information leak if there are any less privileged processes running on the machine.
Past attempts to deal with these problems have included restrictions on creating sym-links and hiding file names, which have both been inadequate. The solution chosen for use with SE Linux (which is also designed to work without SE Linux) is to have polyinstantiated directories based on Unix account name and/or SE Linux context. This means that every user will see a different version of the directory in question based on their context.
In the past this feature has been implemented as part of Multi-Level Security (Dr. Rick Smith [HREF2]) systems under the name multi-level directories. I believe that the multi-level directory variant of this solution was based on file system support, while the Linux support for this type of operation that I will describe is based in the VFS layer and thus does not require modification to any of the file systems that may be used.
Each of the above four attack scenarios may occur with one of the following three attacks:
Another partial attempt at dealing with this problem is controlling the
ability to create hard-links and/or sym-links to try and prevent race
conditions. A well-known implementation of this is in the OpenWall kernel
patch [HREF3]
which prevents
the user from creating hard-links to files to which they have no write access
and from creating sym-links in a +t directory (a directory such as /tmp or
/var/tmp) which point to a file that they don't own. It also prevents writing
to named pipes in +t directories which are owned by a different user. This
deals with some of the issues related to race-condition attacks but there are
potential issues that it does not address, such as a hostile user creating
sym-links to their own files to divert output or creating a file with no write
permissions as a denial of service against a program that uses a fixed file
name.
But this only deals with the case of race conditions used to attack system
integrity. It does not prevent DOS attacks or protect secret data when it
is used in a file name.
The SE Linux strict and mls policies provide good protection against most race condition attacks. Most domains are not permitted to create hard links to privileged files (types such as etc_t). Daemons are all protected from sym-link attacks by each other due to being denied access to sym-links created by other daemons and by users, and users are given similar protection against attack by daemons (both root and non-root). The main benefit for PI directories in strict and mls SE Linux systems is for protection against users attacking other users, in most cases large numbers of users will have the same SE Linux domain and therefore there will not be any effective protection against such attacks in the domain-type model (the integrity protection part of SE Linux).
When a Unix account is associated with more than one SE Linux context it is necessary to have multiple instances of the home directory to match the SE Linux context. If there is only one instance of the home directory and different SE Linux contexts are used for user logins then one of the contexts may be denied access to shared files such as .bashrc and .bash_history, or they may serve as information leaks. This use creates a requirement for PI home directories in SE Linux that does not exist for non-SE systems.
The problem of multiple logins with different contexts can occur in the older version of SE Linux (known as the example policy) that was used in Red Hat Enterprise Linux 4 and Fedora Core versions 2 to 4 when running the strict policy that permits multiple roles to be allocated to a user. But this is more of an issue with the newer versions of SE Linux policy that have functional support for MLS labels and the new MCS policy that permits different sets of categories to be assigned to a user session.
A non-SE system has none of the above protections and only has the Unix UID to protect both system integrity and confidentiality of data.
The initial support for PI directories was via the CLONE_NEWNS flag to the clone() system call. This flag causes the child process to be allocated a separate name space. That process and each child process that it launched would have a separate name space to the process which called clone(), and to any process that resulted from another call to clone() with the CLONE_NEWNS flag. The problem with this was the requirement that applications be modified to use clone() with this flag instead of using fork().
To solve this problem a new system call sys_unshare [HREF4] was added to the Linux kernel. The unshare system call can create a separate name-space for mounted file systems among other things (the set of kernel datra structures that can be unshared has been steadily increased since the introduction of unshare).
The unshare system call requires the SYS_ADMIN capability but does not require a fork, exec, or other operation. So it can be called from a PAM module and thus work with unmodified login programs. Also it is possible for multiple PAM modules to unshare different kernel data structures.
The solution to this is a development known as Shared Subtrees [HREF5]. This gives the option of specifying that certain subtrees will not be shared. For example if the directories /tmp and /var/tmp are being instantiated then the following commands could be run from a system boot script to cause all other mount operations to propagate to all users:
mount --make-shared / mount --bind /tmp /tmp mount --make-private /tmp mount --bind /var/tmp /var/tmp mount --make-private /var/tmpThe above commands make the root of the name-space shared and then make /tmp and /var/tmp private. Note that the --make-private option to the mount command only applies to mount points. As on my test system both /tmp and /var/tmp are on the root file system I have to bind mount them to themselves to have a mount point that can be made private. Be aware that if you don't correctly exclude the PI directories from the shared name space then each user who logs in may get PI directories under another user's directories, and things generally won't work.
mount --bind /tmp/tmp.inst-rjc-rjc /tmpThe directory that was created was given the Unix permission mode 1777 (all users can create files and directories, but it is only permitted to remove files or directories that you own). This solved many of the problems related to users attacking users and users attacking daemons. But it does not solve the problem of a daemon attacking a user as the daemon has access to the parent of the PI directories. Also there is a configuration option to have a user excluded from the PI directory system, a user who is granted such access (either deliberately or accidentally) would also be able to attack other users. As all directories were created under /tmp with mode 1777 there was no protection of secret file names from daemons and users who were outside the PI system (for most systems I expect that there will be some users who will be excluded from the PI configuration).
Another problem with the initial implementation was that the directories were all created at login time, therefore a hostile process could guess the names and pre-allocate directories to allow taking over ownership and potentially allowing other race condition attacks. For example any privileged process which relies on files not being unlinked or renamed for correct operation would operate incorrectly (and possibly be subject to attack) if run in a situation where the /tmp directory did not have the mode 1777 to prevent such rename and unlink operations.
Finally the initial implementation did not have a fall-back case for when the desired name for a PI directory had been taken by a file and would cause the login process to abort, this could be used as a DOS attack against user login sessions.
I have identified two possible solutions to the problem of DOS attacks against the pam_namespace module. One solution is to have it check whether the PI directory already exists, if it exists but has the wrong permissions (either Unix or SE Linux) or if there is an object other than a directory using that name then it would try creating the directory under a different name (maybe the original name with ".1" appended) and keep trying different names until it finds one that is available. This solution does not solve the problem of protecting secret file names.
The other solution I have identified solves the problems of DOS attacks and race conditions as well as the leaks of secret data in file names. This requires that a directory be pre-allocated on the system to contain all PI directories. So instead of a PI directory having the name /tmp/tmp.inst-rjc-rjc it might have the name /tmp/.inst/tmp.inst-rjc-rjc. The /tmp/.inst directory would be created and/or verified at system boot time and would have Unix permission mode 000 (the capability dac_override which every login program posesses would be requred to access it) and would also have a SE Linux context that permits only very restrictive access. Therefore non-root daemons will be denied access to /tmp/.inst and therefore would not be able to launch attacks on users via the /tmp directory. On SE Linux systems root daemons will also be denied such access. If a user session is launched with a shared system name space (through misconfiguration or unusual requirements) then they would also be denied access to the instance of /tmp used by other users.
In the initial design of PI directories the aim was to confine users to prevent them from attacking the rest of the system, and such a confined user was still vulnerable to attack from outside. The second of the two solutions that I propose above is the one that I believe to be the best, it will protect users who have a PI version of a shared directory from attack by non-root daemons on a non-SE system and from attack by root daemons as well on a SE Linux system. It would be a viable option for the sys-admin to give a single user a PI version of /tmp to protect the files for that user while allowing all other users access to the system shared name space.
At the time of writing we have agreement on the concept of using a naming system somewhat like /tmp/.inst/tmp.inst-something where the directory /tmp/.inst will have Unix mode 000 and restrictive SE Linux access controls. This will prevent daemons and users that are not included in the PI configuration from attacking daemons and users that have it enabled. This makes PI protect the user who has such a PI directory as well as protecting the rest of the system from that user. Note that at the time of writing there was no final agreement on the directory names, while the concept of a two-level directory is agreed the actual name of the directory in the default configuration is still to be resolved.
A feature that has been discussed and agreed in concept is to have the pam_namespace.so module check the permissions of the /tmp/.inst directory and abort the login process if the directory does not have Unix permission mode 000, root ownership, and a suitable SE Linux label (if SE Linux is enabled). There will be a configuration option to disable this functionality as not all systems will need this level of protection (and not all administrators will want a system to fail-closed on such a minor security issue).
session required pam_namespace.soThe system will work if the pam_namespace object is not the last in the list, but the creation of the namespace may interfere with some other PAM modules (for example if a PAM module wanted to access files in the /tmp directory) and in general it is safest to have it last. The only situation in which you might not want to have pam_namespace as the last session module is if you are using pam_mkhomedir and also using pam_namespace to provide PI home directories. But currently pam_mkhomedir does not work correctly in situations where PI home directories are desired so this should not be an issue.
The most noteworthy parameter for the pam_namespace module is the optional parameter unmnt_remnt. This is used by programs that run from an unshared namespace and need to create another unshared namespace. The primary example of this is su, all other programs that perform actions which are similar in concept (IE they are run from a user session and launch a new session on behalf of another user) will have the same requirement.
The pam_namespace module uses the configuration file /etc/security/namespace.conf. This file currently has four parameters, the first gives a directory that should be instantiated (there is an option of $HOME for instantiating the user home directory). The second is the name of the real directory to be used for the instance which has variables $USER and $HOME to represent the user-name and the home directory of the user. The third parameter may have as it's value user, context, or both to indicate whether the instantiation should be based on user-name, SE Linux context, or both. The final parameter is a list of comma-separated user-names for accounts that are exempt from poly-instantiation of the directory in question. I believe that it will be standard practice to include root in this list of accounts (usually there will be no need for other users to be excluded).
In Fedora there is a new program called runuser that will start a daemon as a user other than root. It is linked against PAM and can be configured to call the pam_namespace module. When I finish the debugging then every time it launches a daemon as non-root it will be able to create a new unshared namespace. Non-root daemons that require the system shared name space will need to have their user-names specified in the namespace.conf file.
In Debian daemons are started via a program named start-stop-daemon. I plan to modify this program to have the necessary name-space functionality.
/tmp /tmp/.inst/tmp.inst-$USER- both rjc,root /var/tmp /tmp/.inst/var-tmp.inst-$USER- both rjc,root
Most operations that are described in this paper are usable in Fedora Rawhide as of the 31st of May 2006. The only operation that is not usable at this time is PI support for daemons. I hope to have PI working in Debian and have PI support for daemons in both Debian and Fedora Rawhide by the time this paper is published, I will describe my success in these efforts when I present this paper.
The System Administrators Guild of Australia© 2006. The authors assign to The System Administrator's Guild of Australia and other educational and non-profit institutions a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced. The authors also grant a non-exclusive licence to The System Administrators Guild of Australia to publish this document in full on the World Wide Web and on CD-ROM and in printed form with the conference papers and for the document to be published on mirrors on the World Wide Web.