4 -- Shell Scripts

22C:112 Notes, Spring 2010

Part of the 22C:112, Operating Systems Notes
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

tcsh

Our lab systems run the tcsh shell, one of many Unix shells. The original, known as the Bourne shell (after its developer) was just sh. After the Bourne shell had developed into a rather ugly programming language, the C-shell csh was developed, trying to rationalize things while maintaining a degree of compatability. Other shells have emerged in the years since, such as the Bourne Again shell, bash (a really bad pun -- the Bourn shell born again). All of them have significant user communities.

This proliferation of shells was possible because, unlike most early operating systems and many later ones, the shell is not an integral part of the Unix system. Rather, under Unix, the shells are user programs, and as such, users who don't like the shells that come with the system are free to write their own. These days, some users even configure their systems to use Python as a command-line interpreter.

Because all of the shells are incompatible with each other to varying extents, every shell script should begin by designating its interpreter. This is done with an initial line that names the file to be executed as an interpreter for whatever follows:

        #!/bin/tcsh

The loader, on seeing this line at the head of an executable file, goes and loads the named interpreter instead of loading the remainder of the file, and it then passes the entire file to the interpreter as input. This mechanism was designed for shells, but it can be used for any interpreted language.

This mechanism is a kluge. File attributes really ought to be encoded as some kind of file type, with type attributes of a file quite distinct from the data in that file. The MacOS notion of having files that have a resource fork (type information) and a data fork (the file itself) is an attempt to find a more rational way of encoding file types. Some Unix applications also use extensions on file names. Consider, for example, the file name image.jpg, where the extension .jpg indicates that the file is in JPG image format. This is another kluge, dating back into the 1960's, and it is not used by any of the core components of Unix, but only by applications. Thus, it is perfectly legal to attempt to edit a JPG file with a text editor.

For a strong example to illustrate the foolishness of using the extension on a file name to indicate the type of the file, consider how you would react to a programming language that required you to indicate the type of each variable with an extension on the variable name. Where C lets you write i = i + 1, you would have to write i.int = i.int + 1 or even i.int = i.int + 1.int. This would get old very fast.

Making a file executable

Having created a file, perhaps named shellscript, that begins as required, we need to make that file executable. We do this with the command:

         chmod +x shellscript

The chmod command changes the access rights (the mode?) of the named file. The string +x means add the execute x right. The string -x would have removed execute right, and -w would remove the right to write the file, making it read-only.

To check the access rights on a file, do this:

         ls -l shellscript
The -l string indicates a desire for the long-form directory listing, in this case, listing only the directory entry for shellscript. The output will be something like this:
         -rwxr-xr-x 1 jones faculty 223 Feb  4 11:29 shellscript

This indicates that the access rights are rwxr-xr-x for this file. The first rwx applies to access by the owner, named jones, while the middle r-x applies to members of the group named faculty. The final r-x applies to all others. The 223 in the output is the size of the file, in bytes, which is followed by the date of last modification, followed by the file name itself.

For more information about the ls command (or any shell command) type man ls. (You can also use a web search engine to search for the keywords ls and "unix command".) There are two problems with this. First, the result is sometimes huge, you may get the feeling you are drinking from a fire hose. Second, if the command is a built-in command of some shell, you'll have to type, for example, man tcsh and then hunt around in that document for the command you want.

A Very Simple Example

Conisder this very simple shell script:

        #!/bin/tcsh
        # hello -- a hello world shell script

        echo Hello world!

Note that shell comments occur on lines starting with the # character. It is good form to document shell scripts just like any program, with a header indicating how to call it and, if it does anything of any complexity, what it does.

Create a file called hello in the current directory containing the above text and then make that file executable. Having done so, if you type the comand:

        ./hello

You will see the output Hello world! on the screen. In the above command, the leading dot means "in the current directory," since that's where the file is.

If you had typed the command echo Hello world! you would have gotten exactly the same output.

Shell Parameters

The Unix system allows parameters to be passed to any Unix command. The echo command simply outputs its parameters, with spaces delimiting them. When you typed echo Hello world!, you actually passed two parameters to the echo command, Hello followed by world!. You can test this by typing the commands with extra spaces:

        echo          Hello            world!

The output will be the same as it was without the spaces. The shell lets you include explicit spaces within a parameter if you put the parameter in quotes.

        echo "        Hello            world! "

(The space after the exclamation point is there because exclamation points have special meaning to the shell in some contexts, and including this space avoids such a context.)

Unix shells use an awful notation for referencing parameters. This is forced on the shell by the lack of anything resembling a subroutine heading on shell scripts. Had shell scripts required a subroutine heading of some kind, that could have provided formal parameter names, but lacking these, the shell forces formal parameters to begin with a dollar sign followed by the parameter number. Consider this shell script stored in the file parameters:

        #!/bin/tcsh
        # parameters $1 $2 $3

        echo parameter 1 is $1
        echo parameter 2 is $2
        echo parameter 3 is $3

If you run this with the command parameters this "is a" test you will see the output:

parameter 1 is this
parameter 2 is is a
parameter 3 is test

Dollar signs mean quite a bit to the shell. $1 is equivalent to $argv[1], for example, where $argv is logically the name of an array, the argument vector, containing all of the arguments. If you put $argv with no subscript, the entire argument vector will be substituted into the text. $$ is replaced with the current process number. In all cases, dollar sign is replaced by the text of the indicated value.

The short notation $1 is the original Bourne shell notation. As the shell matured, named shell variables were added, and then these were divided into arrays of components.

Shell Variables

The Unix shells allow variable creation. The mechanisms for this differ from shell to shell, so here, we will focus on the C-shell (and tcsh) variable mechanism.

To set a variable to a value that is a string, use the set command.

        set myvariable = "Hello world! "
        echo $myvariable

To create a numeric variable, or rather, a variable with a value that is the textual representation of an integer, use the @ shell command. The @ shell command requires that the right-hand side of the equals sign be an (integer) algebraic expression.

        @ myvariable = 1 + 1
        echo $myvariable

Quotes

The standard Unix shells support a number of different types of quotation marks. Consider these examples:

        set myvariable = "Hello    world! "
        echo $myvariable

Here, the output is Hello world!, with the extra spaces excluded because, the text of $myvariable is assigned first, and then parsed into parameters for the echo command. Add quotes, and this changes:

        set myvariable = "Hello    world! "
        echo "$myvariable"

Now, the output is Hello    world! , with the extra spaces included. This is because the quotes prevented the division of the variable, while still permitting the dollar sign to be recognised. Single quotes suppress interpretation of the dollar sign:

        set myvariable = "Hello    world! "
        echo '$myvariable'

The result is $myvariable, which is to say, the literal text, without any attempt to substitute the variable's value into place. On the other extreme, we can ask the shell to evaluate a command and substitute the output of that command for the text of that command. Consider:

        set myvariable = "Hello    world! "
        echo "Today is" `date "+%A %B %d, %Y."`

Here, the second argument to the echo command is the output of the date command with the argument "+%A %B %d, %Y.", an argument that specifies the date format. Note the inclusion of the double-quoted string within the single-quoted string.

Control Structures

The C shell and tcsh support a number of control structures, including the usual if and while statements. Any language including an assignment statement, arithmetic or string operators, and while loops conditional on arithmetic or string comparison is a general purpose programming language, although it should be immediately obvious that the Unix shells are very ugly programming languages, as illustrated in the following useless script:

        #!/bin/tcsh
        # shellscript <number> <text>
        #   outputs many copies of <text> 
        #   the number of copies is controlled by <number> 

        echo entering $$ $1 $2

        @ myvariable = $argv[1]
        while ( $myvariable > 0 )
                echo $argv[2]
                @ myvariable = $myvariable - 1
                ./shellscript $myvariable $argv[2]
        end

        echo exiting $$ $1 $2

This example illustrates both iteration with a while loop and recursion. Note that the shell variables are local. Each invocation of a shell script launches a new shell with a completely new environment, based on the environment of the shell that launched it.

As a result, this number of times this script echoes $argv[2] is large because of the combinationof recursion and iteration controlled by $argv[1]. The script includes debug output (the echo commands on entry and exit) to help you understand the pattern of the recursion.

A Security Disaster -- Shell Injection Attacks

The common Unix shells show plenty of evidence of evolution, and little evidence of intelligent design. It is true that the designers were intelligent, but features were added without thinking things through, and their evolution into general purpose programming languages was more of an accident than an intentional effort. The result is not only a language that is ugly, but a language that has some very dangerous properties. One of the worst of these is its vulnerability to what is known as an injection attack. Consider this script:

        #!/bin/tcsh
        # check arg
        # outputs "it matches" if the argument is 1

        if ($argv == 1) echo it matches

This appears to be a an uninteresting shell script. Call this script with an argument other than 1 and it will produce no output. Call it with an argument that evaluates to 1, and the output will be the string it matches. There is trouble, though. Consider the following call:

        ./check "1 ) echo"
The output here will be == 1 ) echo it matches. The reason is, our parameter included an end-paren, and this closed the if statement, so that everything after the paren within the argument was interpreted as the object of the if statement. So, the line that the shell saw, after it substituted the argument into the text, was this:
        if (1 ) echo == 1) echo it matches
This is called a shell injection attack, because we have injected a shell command into a shell script through a parameter that was not intended to accept parameters.

Shell injection attacks can be far more dangerous than mere injection of an echo command. Consider:

        ./check "1 ) rm -r * "

This deletes all the files in the current directory before it tries to parse the rest of the line as names of files to be deleted. Yes, there are lots of error messages, but they are output after the damage is done.

Defense against shell injection attacks in all of the standard Unix shells varies from extremely difficult to impossible. The problem is, the central element of the defense involves parsing the parameter and checking to see that it is safe, and you cannot write a parser in a typical Unix shell that is not, itself, vulnerable to injection attacks.

So, while the Unix shells are general purpose programming languages, secure applications typically avoid reliance on complex shell scripts, or if they do use them, they wrap protective code around the shell script in order to guarantee that the parameters to the shell script are safe.