There are some simple software libraries to facilitate writing CGI scripts.
cgi-lib.rxx is a REXX
library (available at SLAC by using the REXX
statement to include the library at execution time)and cgi-lib.pl is a similar library in Perl written by Steve Brenner. NCSA has a very useful set of Perl CGI handler subroutines that are available via anonymous FTP.Another set of Perl CGI Server Side Scripts written by Brigitte Jellinek is available under Gnu public license. There is also the Source code for www-leland scripts and programs and a project CGI.pm to create a Perl5 CGI Library. Finally there is the index to Perl programs and libraries associated with the Web by Eric Hood.
For more on Perl and CGI scripts see also the CGI and Perl Tutorial by Alan Richmond. Also see the WWW Virtual Library for more on Server Side CGI information. Carl Cordova has a page with Mac Web Development Resources. Finally there is Yahoo's Common Gateway Interface Information page where you can also find links for support for writing CGI scripts in C and for Macs, Amigas and other platforms.
Since there are security and other risks associated with executing user scripts in a WWW server, the reader may wish to first view a document providing information on a SLAC Security Wrapper for users' CGI scripts. Besides improving security, this wrapper also simplifies the task of writing a CGI script for a beginner.
Before embarking on writing a script, you may also want to check out some rough notes on SLAC Web Utilities Provided by CGI Scripts.
The CGI is an interface for running external programs, or gateways, under an information server. Currently, the supported information servers are HTTP (the Transport Protocol used by WWW) servers.
Gateway programs are executable programs (e.g. UNIX scripts) which can be run by themselves (but you wouldn't want to except for debugging purposes). They have been made executable to allow them to run under various (possibly very different) information servers interchangeably. Gateway programs conforming to this specification can be written in any language, including REXX or Perl, which produces an executable file
QUERY_STRING is defined as anything which follows the first ? in
the URL used to access your gateway. This information could be
added by an HTML
ISINDEX document, or by an HTML
Form (with the GET action). It could also be manually embedded
in an HTML hypertext link, or anchor,
which references your gateway. This string will
usually be an information query, e.g. what the user wants to
search for in databases, or perhaps the encoded results
of your feedback Form. It can be accessed in REXX by using
or in Perl by using
This string is encoded in the standard URL format which changes spaces to +, and encoding special characters with %xx hexadecimal encoding. You will need to decode it in order to use it. You can review the REXX or Perl code fragments giving an example of how to decode the special characters.
If your server is not decoding results from a Form, you will also
get the query string decoded for you onto the command line. This
means that the query string will be available in REXX via the
PARSE ARG command,
or in the Perl
For example, if you have a URL
and you use the REXX command
PARSE ARG Arg1 Arg2
Arg1 will contain
"world" (i.e. the + sign is replaced with a space).
If you choose to use the command line to access the input, you need to do
less processing on the data before using it.
Much of the time, you will want to send data to your gateways which the client shouldn't muck with. Such information could be the name of the Form which generated the results they are sending.
CGI allows for extra information to be embedded in the URL for
your gateway which can be used to transmit extra context-specific
information to the scripts. This information is usually made
available as "extra" information after the path of your gateway in
the URL. This information is not encoded by the server in any
way. It can be
accessed in REXX by using
or in Perl by using
To illustrate this, let's say I have a CGI script which is
accessible to my server with the name
When I access foo from a
particular document, I want to tell foo that I'm currently in
the English language directory, not the Pig Latin directory. In
this case, I could access my script in an HTML document as:
When the server executes foo, it will give me
/language=english, and my
program can decode this and act accordingly.
The PATH_INFO and the QUERY_STRING may be combined. For example, the
will cause the server to run the script called
It would pass remaining path information
/usr/www/img/map" to htimage in the PATH_INFO
environment variable, and pass "
405,451" in the
QUERY_STRING variable. In this case,
htimage is a
script for implementing active maps supplied with the CERN HTTPD.
If your Form has METHOD="POST" in its FORM tag, your CGI program
will receive the encoded Form input on standard input
stdin in Unix). The server will NOT
send you an EOF on the end of the data, instead you should use the
environment variable CONTENT_LENGTH to determine how much data you
should read from stdin. You can accomplish this in REXX by using
In=CHARIN(,1,GETENV('CONTENT_LENGTH')), or in Perl by using
If you wish to pass the standard input onto another script that you will call later, then you may wish to review this REXX Code Fragment.
Form data is a stream of name=value pairs separated by the & character. Each name=value pair is URL encoded, i.e. spaces are changed into plusses and some characters are encoded into hexadecimal.
You can review the REXX or the Perl code fragment giving examples of decoding the Form input.
In order to tell the server what kind of document you are sending back, CGI requires you to place a short header on your output. This header is ASCII text, consisting of lines separated by either linefeeds or carriage returns followed by linefeeds. Your script must output at least two such lines before its data will be sent directly back to the client. These lines are used to indicate the MIME type of the following document
Some common MIME types relevant to WWW are:
"text"Content-Type which is used to represent textual information in a number of character sets and formatted text description languages in a standardised manner. The two most likely subtypes are:
text/plain: text with no special formatting requirements.
text/html: text with embedded HTML commands
"application"Content-Type, which is used to transmit application data or binary data. Two frequently used subtypes are:
application/postscript: The data is in PostScript, and should be fed to a PostScript interptreter.
application/binary: the data is in some unknown binary format, such as the results of a file transfer.
"image"Content-Type for transmitting still image (picture) data. There are many possible subtypes, but the ones most often used on WWW are:
image/gif: an image in the GIF format.
image/xbm: an image in the X Bitmap format.
image/jpeg: an image in the JPEG format.
type/subtypeis the MIME type and subtype for your output.
Next, you have to send the second line. With the current specification, THE SECOND LINE SHOULD BE BLANK. This means that it should have nothing on it except a linefeed. Once the server retrieves this line, it knows that you're finished telling the server about your output and will now begin the actual output. If you skip this line, the server will attempt to parse your output trying to find further information about your request and you will become very unhappy.
You can review a
REXX Code Fragment
giving an example of handling the
After these two lines have been outputted, any output to
stdout (e.g. a
REXX SAY command) will be included in the document sent to the client.
stdoutis included in the document sent to the, diagnostics diagnostics outputted with the SAY command will appear in the document. This output will need to be consistent with the
Content-type: type/subtypementioned above.
You can review a REXX Code Fragment giving an example of diagnostic reporting.
If errors are encountered (e.g. no input provided, invalid characters found, too many arguments specified, requested an invalid command to be executed, invalid syntax in the REXX exec) the script should provide detailed information on what is wrong etc. It may be very useful to provide information on the settings of various WWW Environment Variables that are set.
You can review a REXX Code Fragment giving an example of error reporting and Typical Output Generated from such a code fragment.
cgi1.rxxand save it in our home bin directory (e.g. in
chmod o+x /u/sf/cottrell/bin/cgi1.rxx
chmod u+x /u/sf/cottrell/bin/cgi1.rxx
Also tune into the newsgroup comp.infosystems.www.authoring.cgi which covers discussion of the development of Common Gateway Interface (CGI) scripts as they relate to Web page authoring. Possible subjects include discussion how to handle the results of forms, how to generate images on the fly, and how to put together other interactive Web offerings.
Marc Hedlund keeps a CGI Frequently Asked Questions list READ THIS FIRST.
The World Wide Web (Frequently Asked Questions, with Answers) answers many, many questions about the World Wide Web in general.
There is also the Yahoo Forms Collection which shows many entries for forms information.
If you are using Perl and you have a general Perl question that isn't really a CGI-specific question, check out the Perl FAQ.