2. UNIX
Part of
the 22C:169 Lecture Notes for Sppring 2006
|
Unix and its clones such as Linux illustrate many important operating system features that are of critical importance to system security. Here, we summarize these features, looking particularly at the kernel interface, that is, the set of services that an application process may call. Most users never directly use these kernel services; rather, they call various middleware routines that then call the kernel services for them.
To look up the documentation for a kernel service on a Unix or Linux system, use the man command. For example, to get documentation on the kernel service for reading from a file, the read service, type man 2 read. The numeral 2 is essential here! It names the section of the programmer's reference manual you want to look in. Section 2 is the kernel interface, which is what we care about here, while section 1 is shell commands and section 3 is the standard library.
Note, the man pages for the services mentioned here each end with a see also section that references other services. The services listed here only scratch the surface of the full Unix/Linux system, and you can follow chains of references from the see-also lists to get the big picture.
A Unix (or linux) process has direct access to the following resources:
memory resources | |
code segment | read only |
stack segment | read write |
data segment | read write |
other segments | ... |
open file resources | |
standard input | read only |
standard output | write only |
standard error | write only |
other files | ... |
Note that the definition of a segment under Unix has no necessary relationship to the use of the segment by the designers of the memory management unit used on the system. Unix defines a segment as a range of consecutive virtual addresses that are seen by the applications program as being consecutive, irrespective of how the memory management unit manages this job. Of course, whoever writes the virtual memory software for a particular memory management unit must figure out how to implement the Unix memory model using whatever tools that the host memory management unit provides, and they may well opt to use one hardware defined segments for each Unix segment, but many Unix implementations don't do this.
In any case, all memory resources of the process are accessed through the memory management unit, while all file resources are accessed through the file system software in the kernel.
A unix process has a user ID, by default, the ID of the person who ran the process, and a group ID, by default, the ID of the group to which the person running the process is associated.
A process can modify its memory resources using the following kernel calls
Note that most users never call sbrk(); rather, users use some kind of heap manager in the standard library for whatever programming language they are using. This manager is expected to call sbrk() when it needs to enlarge the heap. For example, C and C++ programmers may call malloc() and free() to allocate and deallocate objects on the heap.
If the open file is executable and begins with the "magic" characters #!, the file is interpreted as a file that is supposed to be submitted to an interpreter. The bytes immediately following the !# characters, up until the next blank or end of line, are taken as the name of an interpreter, and this interpreter is executed. The interpreter gets the name of the execved file as its first argment, so it may open that file and execute it. Other arguments provided by the caller to execve are shifted over to allow for this.
If the executed file has the SUID or SGID bits set in its mode, the process's effective user ID and or effective group ID are set to the user ID and group ID of the file.
execve() only returns to the caller if there was an error, for example, if the indicated file was not an executable object file, the indicated interpreter could not be found, or the file did not begin with the necessary magic characters that signified the start of an interpreter or loadable object file.
In fact, all of the above services can boil down to sequences of primitive executable instructions plus calls to the following two services:
The access rights will be set to prot, which can be some combination of PROT_READ, PROT_WRITE or PROT_EXEC, using the or operator to combine the desired rights.
The flags option controls how the region is shared. MAP_FILE and MAP_ANON are alternatives, depending on whether you want to map pages of a file or just pages with no connection to a file. MAP_PRIVATE and MAP_SHARED are alternatives, depending on whether you want your changes to the mapped region to be seen by other users or you want those changes to be private to you. Obviously, this is irrelevant unless you have PROT_WRITE access to the segment. MAP_FIXED forces the segment to start at addr, which had better be the address of the first byte of some memory page.
Just as the memory addresses in a program are essentially integers in the range from 0 to some maximum (usually 232-1), open files in a program are referenced by integer file descriptors in the range 0 up to some maximum (usually 31). The files numbered 0, 1 and 2 correspond to standard input, standard output and standard error, respectively.
The mode argument is used if the open command has to create the file. This gives the access rights for the file, as it is allocated on disk. Once a file is opened, the flags argument determines what operations may actually be performed on the file. An attempt to open a file for which the user has insufficient access will result in failure.
If the process's effective user ID matches the user ID of the file, the maximum rights the process may have are set by the owner rights. If this is not the case, but the process's effective group ID matches the group ID of the file, then the process's maximal rights are determined by the group rights. Finally, if neither of the former are the case, the process is entitled to the rights determined by the file's other rights.
On success, the open() call returns fd; this may be passed to the read() or write() system calls to directly read or write the file, or it may be used as an argument to mmap to map the file into the memory address space of the process.
The mode is a 12 bit string, arranged as follows:
special | owner | group | other | ||||||||
SUID | SGID | SVTX | R | W | X | R | W | X | R | W | X |
In the above, the R, W and X access rights correspond to the rights to read, write or execute the file, on behalf of the file owner, the users in the group associated with the file, or others who are neither the owner nor members of the group. Each file has, associated with it, an owner (by default, the user ID of the user who created the file), and a group, by default, (by default, the group ID taken from the directory in which the file was created).
If successful, dup() returns fd.
The two lowest numbered unused file descriptors are allocated as readfd and writefd, referring respectively to the read and write ends of the new pipe object. These are returned in the array filedes so that filedes[0]=readfd and filedes[1]=writefd.
All files that were open in the parent process at the time of the fork will be open in the child, and these files will all be shared between the parent and child.
All memory that was mapped into the address space of the parent will be mapped into the address space of the child. The read-only program segment will be shared. The read-write data and the stack segments will be copied, so that the parent and child have separate copies of all variables. Any files mapped into the address space by mmap will be mapped into both address spaces. If the file was inserted into the address space with the MAP_SHARED attribute, then the segment mapped to that file will be fully shared by the parent and child.
This little chunk of C code allows the running program to execute a program called myprogram and wait for it to terminate before the calling program continues.
{ pid_t pid; if (pid = fork()) { /* parent process with nonzero pid */ /* wait for child to terminate */ while (pid != wait( NULL )) /* do nothing */; } else { /* child process with returned pid set to zero */ (void) execve( "myprogram", NULL, NULL ); } }