Programming options in Unix
Last revision August 5, 2004
If you do a lot of data analysis or computation in your research or classes, eventually you may need to write some kind of program of your own.
- You may need to do things that cannot be done with existing programs.
- You may find that existing programs can do what you want in a pipeline, but it is too long or cumbersome to use by typing on the command line.
- You may find yourself needing to do a repetitive task that involves putting together existing programs to operate on different files. Here, you do not want to have to retype the pipeline or set of commands every time.
There are three basic alternatives for writing your own programs.
Shell scripts
- Simply a set of normal shell commands collected together into a file, or script, that can be executed by merely typing the name of the script as a command.
- The ability to specify arguments to the script and to substitute their values within the script makes it possible to generalize a sequence of commands.
- Shell scripts also have testing and flow-control statements (if-then-else; loops) that make them able to conditionally or repetitively execute programs.
Interpreted languages, such as awk or perl scripts:
- awk is a pattern-scanning and processing language. It is very good for writing small programs to transform a data file from one format to another. awk scripts are often executed from shell scripts that have correctly set up the file arguments first.
- The current (1985) version of awk has extensions that allow it to process arguments and call other programs, like a shell script, and use functions like a compiled programming language.
- awk can also be used to prototype programs that can then be re-written in a compiled language (like C) to execute more efficiently.
- Another good interpreted language that is becoming very popular is perl. This language combines the functions of shell scripts and awk and many functions of the C language. It is very powerful, but correspondingly complex.
Compiled languages
- Use a high-level compiled language. The basic choices are C, C++, and Fortran . We also have Pascal and java compilers on pangea. We do not support other languages such as Basic or Modula-2 on pangea. Translator programs are sometimes available to convert the lesser used languages to C, which can then be compiled, for example, p2c, which translates Pascal programs into C programs.
- These languages offer more power than the scripting languages. Complicated programs also execute much faster because they have been compiled, or translated, into the native machine language of the computer.
- There are well developed tools for maintaining large programming projects in these languages (e.g., make) and for debugging executing programs (e.g., dbx).
- The C language was really developed in conjunction with Unix by many of the same people. It is the language of choice for Unix because it has the most power, flexibility, and portability.
- The C Programming Language, by Brian W. Kernighan and Dennis M. Ritchie, is the standard reference book for the language written by its inventors. The latest version of this book describes the ANSI standard C. This book is considered difficult to read by many. There is a wide selection of other books available in the bookstore.
- The 1977 ANSI standard Fortran (called Fortran 77) is also implemented on Unix in a way that produces object code compatible with C language routines.
- Fortran is known and preferred by many because much existing software has been written in Fortran and because it was the first high level language implemented on many machines.
- Any good book that describes Fortran 77 can be used as a reference for syntax and techniques. A standard reference used by many here is Fortran 77, by Harry Katzan. This is more of a reference than tutorial. Another reference for the Fortran language that looks good is the Professional Programmer's Guide to Fortran 77, by Clive Page, which has been made freely available on the web by its author, along with links to many other Fortran references and resources, at http://www.star.le.ac.uk/~cgp/fortran.html