6 -- Shell Scripts
22C:112 Notes, Spring 2008
Our lab systems run the tcsh shell, one of many Unix shells. The original, known as the Bourne shell (after its developer) was just sh. After the Bourne shell had developed into a rather ugly programming language, the C-shell csh was developed, trying to rationalize things while maintaining a degree of compatability. Other shells have emerged in the years since, such as the Bourne Again shell, bash. All of them have significant user communities.
This proliferation of shells was possible because, unlike most early operating systems and many later ones, the shell is not an integral part of the Unix system. Rather, under Unix, the shells are user programs, and as such, users who don't like the shells that come with the system are free to write their own.
Because all of the shells are incompatible with each other to varying extents, every shell script should begin by designating its interpreter. This is done with an initial line that names the file to be executed:
The loader, on seeing this line at the head of an executable file, goes and loads the named interpreter instead of loading the remainder of the file, and it then passes the entire file to the interpreter as input. This mechanism was designed for shells, but it can be used for any interpreted language.
This mechanism is a kluge. File attributes really ought to be encoded as some kind of file type, with type attributes of a file quite distinct from the data in that file. The MacOS notion of having files that have a resource fork (type information) and a data fork (the file itself) is an attempt to find a more rational way of encoding file types. Some Unix applications also use extensions on file names. Consider, for example, the file name image.jpg, where the extension .jpg indicates that the file is in JPG image format. This is another kluge, dating back into the 1960's, and it is not used by any of the core components of Unix, but only by applications.
Having created a file, perhaps named shellscript, that begins as required, we need to make that file executable. We do this with the command:
chmod +x shellscript
The chmod command changes the access rights for the named file. The string +x means add the execute x right. The string -x would have removed execute rights.
To check the access rights on a file, do this:
ls -l shellscriptThe -l string indicates a desire for the long-form directory listing, in this case, listing only the directory entry for shellscript. The output will be something like this:
-rwxr-xr-x 1 jones faculty 223 Feb 4 11:29 shellscript
This indicates that the access rights are rwxr-xr-x for this file. The first rxw applies to access by the owner, named jones, while the middle r-x applies to members of the group named faculty. The final r-x applies to all others.
Conisder this very simple shell script:
#!/bin/tcsh # hello echo Hello world!
Note that shell comments occur on lines starting with the # character. It is good form to document shell scripts just like any program, with a header indicating how to call it and, if it does anything of any complexity, what it does.
Create a file called hello in the current directory containing the above text and then make that file executable. Having done so, if you type the comand:
You will see the output Hello world! on the screen. In the above command, the leading dot means "in the current directory," since that's where the file is.
If you had typed the command echo Hello world! you would have gotten exactly the same output.
The Unix system allows parameters to be passed to any Unix command. The echo command simply outputs its parameters, with spaces delimiting them. When you typed echo Hello world!, you actually passed two parameters to the echo command, Hello followed by world!. You can test this by typing the commands with extra spaces:
echo Hello world!
The output will be the same as it was without the spaces. The shell lets you include explicit spaces within a parameter if you put the parameter in quotes.
echo " Hello world! "
(The space after the exclamation point is there because exclamation points have special meaning to the shell in some contexts, and including this space avoids such a context.)
Unix shells use an awful notation for referencing parameters. This is forced on the shell by the lack of anything resembling a subroutine heading on shell scripts. Had shell scripts required a subroutine heading of some kind, that could have provided formal parameter names, but lacking these, the shell forces formal parameters to begin with a dollar sign followed by the parameter number. Consider this shell script:
#!/bin/tcsh # parameters $1 $2 $3 echo parameter 1 is $1 echo parameter 2 is $2 echo parameter 3 is $3
If you run this with the command parameters this "is a" test you will see the output:
parameter 1 is this parameter 2 is is a parameter 3 is test
Dollar signs mean quite a bit to the shell. $1 is equivalent to $argv, for example, where $argv is logically the name of an array, the argument vector, containing all of the arguments. If you put $argv with no subscript, the entire argument vector will be substituted into the text. $$ is replaced with the current process number. In all cases, dollar sign is replaced by the text of the indicated value.
The short notation $1 is the original Bourne shell notation. As the shell matured, named shell variables were added, and then these were divided into arrays of components.
The Unix shells allow variable creation. The mechanisms for this differ from shell to shell, so here, we will focus on the C-shell (and tcsh) variable mechanism.
To set a variable to a value that is a string, use the set command.
set myvariable = "Hello world! " echo $myvariable
To create a numeric variable, or rather, a variable with a value that is the textual representation of an integer, use the @ shell command. The @ shell command requires that the right-hand side of the equals sign be an (integer) algebraic expression.
@ myvariable = 1 + 1 echo $myvariable
The standard Unix shells support a number of different types of quotation marks. Consider these examples:
set myvariable = "Hello world! " echo $myvariable
Here, the output is Hello world!, with the extra spaces excluded because, the text of $myvariable is assigned first, and then parsed into parameters for the echo command. Add quotes, and this changes:
set myvariable = "Hello world! " echo "$myvariable"
Now, the output is Hello world! , with the extra spaces included. This is because the quotes prevented the division of the variable, while still permitting the dollar sign to be recognised. Single quotes suppress interpretation of the dollar sign:
set myvariable = "Hello world! " echo '$myvariable'
The result is $myvariable, which is to say, the literal text, without any attempt to substitute the variable's value into place. On the other extreme, we can ask the shell to evaluate a command and substitute the output of that command for the text of that command. Consider:
set myvariable = "Hello world! " echo "Today is" `date "+%A %B %d, %Y."`
Here, the second argument to the echo command is the output of the date command with the argument "+%A %B %d, %Y.", an argument that specifies the date format. Note the inclusion of the double-quoted string within the single-quoted string.
The C shell and tcsh support a number of control structures, including the usual if and while statements. Any language including an assignment statement, arithmetic or string operators, and while loops conditional on arithmetic or string comparison is a general purpose programming language, although it should be immediately obvious that the Unix shells are very ugly programming languages, as illustrated in the following useless script:
#!/bin/tcsh # shellscript ≶number> text echo entering $$ $1 $2 @ myvariable = $argv while ( $myvariable > 0 ) echo $argv @ myvariable = $myvariable - 1 ./shellscript $myvariable $argv end echo exiting $$ $1 $2
This example illustrates both iteration with a while loop and recursion. Note that the shell variables are local. Each invocation of a shell script launches a new shell with a completely new environment, based on the environment of the shell that launched it.
As a result, this number of times this script echoes $argv is proportional to the square of the initial value of $argv.
The shell language evolved, it was not designed. Features were added, one by one, with incomplete understanding of how they interacted. The result is not only a language that is ugly, but a language that has some very dangerous properties. One of the worst of these is its vulnerability to what is known as an injection attack. Consider this script:
#!/bin/tcsh # check arg # outputs "it matches" if the argument is 1 if ($argv == 1) echo it matches
This appears to be a an uninteresting shell script. Call this script with an argument other than 1 and it will produce no output. Call it with an argument that evaluates to 1, and the output will be the string it matches. There is trouble, though. Consider the following call:
./check "1 ) echo"The output here will be == 1 ) echo it matches. The reason is, our parameter included an end-paren, and this closed the if statement, so that everything after the paren within the argument was interpreted as the object of the if statement. This is called a shell injection attack, because we have injected a shell command into a shell script through a parameter that was not intended to accept parameters.
Shell injection attacks can be far more dangerous than mere injection of an echo command. Consider:
./check "1 ) rm -r * ; echo"
This deletes all the files in the current directory before it echos the rest of the line.
Defense against shell injection attacks in all of the standard Unix shells varies from extremely difficult to impossible. The problem is, the central element of the defense involves parsing the parameter and checking to see that it is safe, and you cannot write a parser in a typical Unix shell that is not, itself, vulnerable to injection attacks.
So, wile the Unix shells are general purpose programming languages, secure applications typically avoid reliance on complex shell scripts, or if they do use them, they wrap protective code around the shell script in order to guarantee that the parameters to the shell script are safe.