The Unix/Linux System Interface
Part of
22C:169, Computer Security Notes
|
On any Unix or Linux system, the command man xxx will give you the "man page" for the xxx command, subroutine or system call. This is a page from the Unix Programmer's Reference Manual as customized for that system.
The manual is divided into sections. You can restrict your search to one section of the manual using commands such as the following:
Here, we're concerned with section 2 of the manual, documenting the system calls.
A process can modify its memory resources using the following kernel calls
Note that most users never call sbrk(); rather, users use some kind of heap manager in the standard library for whatever programming language they are using. This manager is expected to call sbrk() when it needs to enlarge the heap. For example, C and C++ programmers may call malloc() and free() to allocate and deallocate space for objects on the heap. Usually, C++ programmers don't even do this, because C++ automatically calls malloc() at the start of the initializer method for objects of each class.
If the open file is executable and begins with the "magic" characters #!, the file is interpreted as a file that is supposed to be submitted to an interpreter. The bytes immediately following the !# characters, up until the next blank or end of line, are taken as the name of an interpreter, and this interpreter is executed. The interpreter gets the name of the execved file as its first argment, so it may open that file and execute it. Other arguments provided by the caller to execve are shifted over to allow for this.
If the executed file has the SUID or SGID bits set in its mode, the process's effective user ID and or effective group ID are set to the user ID and group ID of the file.
execve() only returns to the caller if there was an error, for example, if the indicated file was not an executable object file, the indicated interpreter could not be found, or the file did not begin with the necessary magic characters that signified the start of an interpreter or loadable object file.
Note that, originally, Unix supported a simple exec command that didn't deal with parameters or environment variables. As the system evolved, these were eventually superceded and abandoned. In some cases, standard library routines that call the new system interface routines in order to provide support for old interfaces have been added, but this was only done if it was found that there were user programs that needed these. Few user programs directly call any of the exec services.
In fact, all of the above services can boil down to sequences of primitive executable instructions plus calls to the following two services. Note, however, that these services are relatively late additions to the Unix system interface. The original interface did not include these general mechanisms, and the original interface did not assume the availability of a memory management unit sufficiently flexible to implement these.
The access rights will be set to prot, which can be some combination of PROT_READ, PROT_WRITE or PROT_EXEC, using the or operator to combine the desired rights.
The flags option controls how the region is shared. MAP_FILE and MAP_ANON are alternatives, depending on whether you want to map pages of a file or just pages with no connection to a file. MAP_PRIVATE and MAP_SHARED are alternatives, depending on whether you want your changes to the mapped region to be seen by other users or you want those changes to be private to you. Obviously, this is irrelevant unless you have PROT_WRITE access to the segment. MAP_FIXED forces the segment to start at addr, which had better be the address of the first byte of some memory page.
On many systems, the code segment is implemented by giving access to the executable file as PROT_READ+PROT_EXEC, MAP_SHARED. The stack and static segments, in contrast, are PROT_READ+PROT_WRITE, MAP_PRIVATE.
Just as the memory addresses in a program are essentially integers in the range from 0 to some maximum (usually 232-1), open files in a program are referenced by integer file descriptors in the range 0 up to some maximum (usually 31). The files numbered 0, 1 and 2 correspond to standard input, standard output and standard error, respectively.
The mode argument is used if the open command has to create the file. This gives the access rights for the file, as it is allocated on disk. Once a file is opened, the flags argument determines what operations may actually be performed on the file. An attempt to open a file for which the user has insufficient access will result in failure.
If the process's effective user ID matches the user ID of the file, the maximum rights the process may have are set by the owner rights. If this is not the case, but the process's effective group ID matches the group ID of the file, then the process's maximal rights are determined by the group rights. Finally, if neither of the former are the case, the process is entitled to the rights determined by the file's other rights.
On success, the open() call returns fd; this may be passed to the read() or write() system calls to directly read or write the file, or it may be used as an argument to mmap to map the file into the memory address space of the process.
The mode is a 12 bit string, arranged as follows:
special | owner | group | other | ||||||||
SUID | SGID | SVTX | R | W | X | R | W | X | R | W | X |
In the above, the R, W and X access rights correspond to the rights to read, write or execute the file, on behalf of the file owner, the users in the group associated with the file, or others who are neither the owner nor members of the group. Each file has, associated with it, an owner (by default, the user ID of the user who created the file), and a group, by default, (by default, the group ID taken from the directory in which the file was created).
If successful, dup() returns fd.
The two lowest numbered unused file descriptors are allocated as readfd and writefd, referring respectively to the read and write ends of the new pipe object. These are returned in the array filedes so that filedes[0]=readfd and filedes[1]=writefd.
All files that were open in the parent process at the time of the fork will be open in the child, and these files will all be shared between the parent and child.
All memory that was mapped into the address space of the parent will be mapped into the address space of the child. The read-only program segment will be shared. The read-write data and the stack segments will be copied, so that the parent and child have separate copies of all variables. Any files mapped into the address space by mmap will be mapped into both address spaces. If the file was inserted into the address space with the MAP_SHARED attribute, then the segment mapped to that file will be fully shared by the parent and child.
This little chunk of C code allows the running program to execute a program called myprogram and wait for it to terminate before the calling program continues.
{ pid_t pid; if (pid = fork()) { /* parent process with nonzero pid */ /* wait for child to terminate */ while (pid != wait( NULL )) /* do nothing */; } else { /* child process with returned pid set to zero */ (void) execve( "myprogram", NULL, NULL ); } }