Systems-level frequently asked questions
How can I securely connect to other computers or transfer files using secure shell (ssh), secure copy (scp) or secure ftp (sftp)?
You should always try to avoid using telnet or ftp for non anonymous connections. When you use telnet or ftp, your password is transmitted in clear text (unencrypted form) and is thus more vulnerable to being observed by unsavory characters :-). Secure shell/copy/ftp are a set of protocols that encrypt your entire session so that both your passwords and other data are protected.Secure shell (ssh) allows you to run terminal sessions to other machines. The secure shell FAQ provides an extensive resource for learning about ssh. Secure copy/ftp (scp/sftp) permit you to transfer files to or from the client machine. The sftp protocol is perhaps slightly preferred over scp as it will permit you to restart a failed file transfer although it has been reported to be slightly slower.
Command line versions of the secure shell suites come with UNIX/Linux and MacOS distributions. Manuals for the traditional command line interfaces can be found on the respective systems or in Open SSH's FAQ. You can easily add command line versions to Windows with CYGWIN, a free POSIX interface for Windows which has lots of nice things available such as an X Windows server.
Many of the Windows & Mac OS clients include GUI support. For Windows, many of my students have had good success with PuTTY and WinSCP (which support sftp). A very nice Windows client with both ssh and GUI sftp support which is not in the above lists is an older version of the SSH Communications Society's ssh client which may be used free of charge by academic users.
How do I compile a program under UNIX?
The following compilers exist on most UNIX systems:
- cc - system provided C compiler
- CC - system provided C++ compiler
- gcc - GNU C and C++ compiler
rohan> cc foo.cThis would compiler the program and create a file called a.out. To run the program:
rohan> a.outNote that on some systems, you may have to type "./a.out". If we want to call the program foo, we can specify the output option:
rohan> cc -o foo foo.c
My program compiled correctly, but when I try to execute I receive
a file not found error.
On UNIX systems, shells search for program names which do not have
fully qualified path names (i.e. date versus /usr/bin/date) by using
the PATH (path for csh, tcsh) variable. The special directory "." is
an alias for your current directory. If the . directory is not in
your path, the shell cannot find your executable. You can either
qualify the program name by placing a ./ in front of it,
i.e. "./a.out" or set the PATH. You can add the . to your search path, but this is considered slightly dangerous
as someone could place a Trojan horse in a directory with the same
name as a common command and you might execute it instead of the
desired command.
What is the PATH and how do I set it?
The path is a set of directories that the shell will search when a
user types a program name without a fully qualified path name,
i.e. "mvregexp" versus "~mroch/bin/mvregexp". Setting your path depends
upon the shell that you use. You can identify your shell with "echo
$SHELL".
For shells which are descended from the Bourne shell (sh), which include the Korn shell (ksh) and the Born Again Shell (bash), you can add to your path with the following:
PATH=$PATH:~mroch/binIf you use the C shell (csh) or the visual C shell (tcsh), path is set as follows:
rohan% set path = ( $path ~mroch/bin )Both of these will add the directory ~mroch/bin to your search path for the duration of the login. To make the change permanent, you must add this to the following shell dependent file which is executed each time you log in:
| Shell | File |
| sh | .profile |
| ksh | .kshrc |
| bash | .bashrc |
| csh,tcsh | .cshrc |
What is separate compilation and how do I use it?
Separate compilation is a technique to take multiple source files and eventually integrate them into a program. It makes compiling large programs much faster and can be automated with a makefile.
Suppose that we have written three files, huey.C, louie.C, and
dewie.C. (These are the names of the cartoon character Donald Duck's
nephews and have no particular significance.) If we wanted to compile
them together, we could just type:
rohan> cc -o duckprogram huey.C louie.C dewie.CAlternatively, we can use the -c option to compile each file separately:
rohan> cc -c huey.C rohan> cc -c louie.C rohan> cc -c dewie.CThis would create three object files (machine code): huey.o, louie.o, and dewie.o; none of which can be executed. To execute them, we must link them into a single executable:
rohan> ln -o duckprogram huey.o louie.o dewie.oSuppose that I am working on the program with may pair programming partner and we have been making modifications to the file louie.C.
If the other files have been compiled previously, we don't need to
recompile them. We just compile louie and relink:
rohan> cc -c louie.C rohan> ln -o duckprogram huey.o louie.o dewie.oNote that cc/CC/gcc/g++ is smart enough to invoke the linker for us and we could have done:
rohan> cc -c louie.C rohan> cc -o duckprogram huey.o louie.o dewie.o
This may seem like a lot of hassle, but fortunately it can be automated with a makefile. To tantalize you, once we have a makefile, all we'll have to do is type "make" and the make program will automatically determine what needs to be recompiled and link for us.
How can I process command line arguments?
#include <iostream.h>
// Small C++ program to show the command line arguments
// A C program would be similar except it would use:
// printf()/stdio.h as opposed to cout/iostream.h
// /* */ style comments only instead of /* */ and //
void main(int argc, char *argv[]) {
/* argc contains the number of entries in argv: 0 to argc-1
* argv[] contains pointers to the strings
* Note that the program name is argv[0]
*/
int i; // loop variable
// Print the program name and arguments
for (i=0; i < argc; i++) {
cout << argv[i] << '\n';
}
}
Note - If you would like to do things with command line switches, you
can use the C getopt function to avoid writing everything from
scratch. You can read about this with "man -s 3c getopt". The '3c'
specifies chapter 3c of the manual pages. Normally you do not need to
to this, but there is also a shell command called getopt used for
writing shell scripts. By specifying chapter 3 which are operating
system interfaces for application programs, we tell the manual page
command which version we want. Many C++s have a getopt type object as
well.
What is an appropriate level of commenting?
/*
* factorial - A recursive function to compute factorials of integer n.
* if n < 0, the result is undefined.
*/
int factorial(int n)
{
int result;
if (n <= 1) {
/* base case: n! = 1 */
result = 1;
} else {
/* recursive case: n! = n * (n-1)! */
result = n * factorial(n-1);
}
return result;
}
Sample C++ code fragment from memory allocation routine in a fictional
operating system. Assume all variables, functions, objects,
enumerated types have been defined earlier. (Potential pitfall for
Java/C++ programmers - there is no boolean type in C.)
...
// Check to see if a large enough block exists by stepping through
// free memory segments
Found = false;
Segment = FreeList.First();
// Loop until we find large enough block or Segment set to NULL
while (! Found && Segment) {
if (Segment->Size() >= RequestedSize) {
Found = true;
} else {
Segment = Segment->Next();
}
}
if (! Found) {
// No free segments of the appropriate size available.
// See if consecutive segments can be merged to satisfy
// request.
/* more code... */
Makefiles are parameter files which will let you automate the
compilation of programs which are parts of a large program. The make
program examines the timestamps on each of your source files and only
recompiles files which have been modified. For people not working in
an integrated development environment and with multiple files, this is
a must have tool. You can also look at the man page for make or the
O'Reilly Nustshell book Managing Products With Make by Andrew
Oram. There are numerous tutorials on line. One excellent make
tutorial has been created by the Ben Voshino at the College of
Engineering, University of Hawaii.
One caveat, make is picky about spacing between items in a makefile. Leave a blank line between each target (targets are explained in the tutorial).
How can I understand UNIX system & library calls?
Note that if you are an emacs user, you can type M-x man<Return> fork<Return> (don't forget that in emacs-speak M-x is Escape followed by x) and get a nicely formatted manual page that you can treat like any other emacs buffer.
Each manual page is divided into labeled sections. The first is simply the name of the command and what it does. Next a synopsis is presented which gives very basic information on how to use the command. For instance, with fork you will see:
System Calls fork(2)
NAME
fork, fork1 - create a new process
SYNOPSIS
#include <sys/types.h>
#include <unistd.h>
pid_t fork(void);
pid_t fork1(void);
DESCRIPTION
The fork() and fork1() functions create a new process. The
new process (child process) is an exact copy of the calling
process (parent process). The child process inherits the
following attributes from the parent process:
...
This indicates that you will need to include the header files sys/types.h and unistd.h. In addition, the fork(void) means that there are no arguments required for this call. pid_t indicated that it returns a variable of type "pid_t". For now, just think of pid_t as an integer. If you wanted to see what pid_t was, you could trace through the include files which for UNIX usually reside in /usr/include/, but reading through the description would tell you that this is an integer indicating the return status of the fork call. Read the description section to find out what the return codes mean. There are other sections of the manual page which discuss standards conformance, caveats (things you can get into trouble with), and other related manual pages.
The answer to the next question discusses in greater detail how to interpret what functions require and return.
How can I interpret function signatures (prototypes)?
void *memcpy(void *s1, const void *s2, size_t n);The ANSI C memcpy function is used to copy a block of memory from one place to another. Recall that the type (void *) is ANSI C/C++ notation for a pointer to an object of arbitrary type. Thus, memcpy expects pointers to two objects s1 and s2 as well as something of type size_t. size_t is simply any type of integer. We can also tell that memcpy returns a pointer to some type of object as its return type is (void *). By reading the man page for memcpy, we can learn that a block of memory starting at the address s2 and of length n will be copied into a similar block starting at s1. Here's a small example of how we might use it:
#define MAXSIZE 1024 double SourceAry[MAXSIZE]; double DestinationAry[MAXSIZE]; void *ResultAddr; /* * code to populate SourceAry omitted... */ /* copy source to destination */ /* The previous line is a good enough comment, but * if you are new to C/C++, there are several things * going on here: * 1) When you have an array, typing the name of * the array without an index returns a * pointer to the beginning of the array. * 2) We could have done the same thing by taking * the address of the first element: * &SourceAry[0] * 3) The sizeof operator returns the number of bytes * in a type or structure. */ ResultAddr = memcpy(DestinationAry, SourceAry, MAXSIZE * sizeof(double));
What editors can I use to write my program?
If you don't want to spend time learning an editor, pico is a very easy to use editor (it is the editor used in the e-mail client pine). Many branded UNIXs also provide their own custom editors.
On a personal note, having used a number of editors over the years, I believe that emacs is worth learning. Once you get past the learning curve, it has a number of useful features which do not exist in other editors. If you want to learn to use emacs, there is a built-in tutorial. Look for "Emacs Tutorial" on the help menu, or type C-h (emacsspeak for control h) t if you are using a telnet client.
Why do duplicate lines appear in forked stdout which has been
redirected to a file?
When output is to a terminal, SunOS UNIX flushes the buffer before after the printf/cout. However, when output is to a file, it does not, resulting in two copies of the output. You can avoid this by calling flush (see man flush) before executing the fork.
Alternatively, you can use script to capture the output rather doing the redirect.
How can I capture all output to a file (including output from more than one process)?
If you would like to capture all output to a terminal (i.e. stdout, stderr from any process that writes to your terminal), you can use a nifty tool called script. Read the script manual page for details on how to use this.
How can I have a thread sleep for a specified amount of time?
Sometimes, you would like to delay for less than a second. The POSIX nanosleep function lets you specify a delay in nanoseconds up to a limit (see "man nanosleep" for details) To use nanosleep you must link with the realtime library "-lrt" and it should be specified after the thread library.
Sample invocation:
#include/* One million nanoseconds per millisecond */ #define NSPERMS 1000000 /* One thousand milliseconds per second */ #define MSPERSEC 1000 { struct timespec SleepTime; ... initialize DelayMS to desired time to sleep ... /* Set up delay */ SleepTime.tv_sec = DelayMS / MSPERSEC; /* # secs */ SleepTime.tv_nsec = (DelayMS % MSPERSEC) * NSPERMS; /* # nanosecs */ /* In this case, we don't care whether or not we fail, but be * sure to check the return code if you do */ nanosleep(&SleepTime, NULL); ... rest of code }
How can I use a symbolic debugger in UNIX?
There are a number of debugers for UNIX and linux. One of the most widely avaiable is gdb, the GNU debugger. When using gdb, you will need to compile with gcc or g++ and provid an extra flag "-g". If you comiple and link as separate steps, you will need to use -g for both steps. Gdb lets you set breakpoints, conditional breakpoints, modify variables and even attach to a running program or core dump.
The gdb user manual is available by typing "info gdb" from most UNIX/linux/cygwin prompts. Before doing this, you may want to type "info info" to learn how to use the GNU info browser. Alternatively, you browse the documentation on the web. There are several good gdb tutorials on the Internet. Ryan Michael Schmidt has a very nice basic introduction. Peter Jay Salzman has a more detailed introduction that will teach you more about the technical aspects like how memory is layed out, which is actually quite helpful for understanding some of your errors.
If you are an emacs user, gdb has a very nice interface to emacs where you can see the code in half of the window and the code in the other. Alternatively, if you have an X Windows terminal, you can use ddd a well done GUI front end to gdb and several other debuggers.
How can I use POSIX threads (pthreads)?
You should start by reading materials from the chapter on POSIX threads that has been put on electronic course reserve. (Access is limited to the class, use the password that is posted on the course Blackboard site). If you would like to look at the complete text, it is UNIX Systems Programming: Communications, Concurrence, and Threads by Robbins & Robbins and is available at the reserve desk. If you plan on continuing to program with POSIX, I would highly recommend purchasing a copy of this text.Here's a short sample program:
- You must include the header file pthread.h.
- You must link the library (-lpthread) at the end of the compilation line. If you do not do this, your program will not work although it will compile without any warnings.
- If you use sched_yield, you must include sched.h and link the real time library (-lrt). The real time library should be linked after the thread library: -lpthread -lrt
- If you use C++, you should provide a C linkage for the function that you plan on starting in the thread. This should be in a header file or someplace before the call to pthread_create(). In the following example, it would be: extern "C" void * print_message_function(void *);
#include <stdio.h>
#include <pthread.h>
#include <sched.h> /* Only necessary for sched_yield */
void * print_message_function( void * VoidPtr )
{
char *message;
message = (char *) VoidPtr;
printf("%s", message);
fflush(stdout); /* Make sure we see it right away */
/* If we wanted to return something, we would return a pointer
* to the data that we wanted to return.
*
* Instead of simply using return, we could also call
* pthread_exit.
*/
return NULL;
}
int main()
{
pthread_attr_t pthread_attributes;
pthread_t thread1, thread2;
char *message1 = "Hello";
char *message2 = " World";
/* Populate attributes with defaults
* If we wanted to customize this, we would probably
* first set the defaults and then change as need.
*/
pthread_attr_init(&pthread_attributes);
/* Start threads to write out messages
* Please note that we are not checking the return,
* this is something that you should figure out how
* to do on your own.
*/
pthread_create( &thread1, &pthread_attributes,
&print_message_function, (void *) message1);
/* Provide N secs for completion (very bad idea to rely on this)
* Use pthread_join instead, but you should learn to do this
* on your own.
*/
sleep(5);
/* Note that this time we don't pass the pthread_attributes
* structure. Instead we pass NULL which tells POSIX threads
* to just use the defaults.
*/
pthread_create(&thread2, NULL,
&print_message_function, (void *) message2);
/* Provide N secs for completion (very bad idea) */
sleep(5);
exit(0);
}
Assuming that this program was saved as threadtest.c, we could compile
it with: How can I use POSIX unnamed semaphores?
POSIX unnamed semaphores are designed to provide coordination between threads in the same process. Named semaphores are used coordinate between processes and are beyond the scope of this assignment. If you wish to learn to use named semaphores, consider looking at the Robbins and Robbins text on reserve in the library. As it is understood that we are discussing unnamed semaphores, we will drop the "unnamed" and simply write semaphores.To use a POSIX semaphore, you must include semaphore.h and declare a variable of type sem_t. We will consider three operations on the semaphores although POSIX defines a few others: sem_init, sem_wait, and sem_post. In addition, you must include the POSIX real time library -lrt while compiling It should go after the POSIX thread library.
As an example, suppose that you are doing separate compilation and have objects for demofiles 1 through N that you want to link:
rohan> gcc -o semaphore_demo demofile1.o ... demofileN.o -lpthread -lrtThe following is a subset of the operations POSIX supports for semaphores:
- sem_init(sem_t *SemaphorePtr, int pshared, unsigned int Value) - Initializes the semaphore pointed to by SemaphorePtr to the count contained in Value. Parameter pshared is not used for unnamed semaphores and should always be set to zero.
- sem_wait(sem_t *SemaphorePtr) - Performs a down on the semaphore pointed to by SemaphorePtr.
- sem_post(sem_t *SemaphorePtr) - Performs an up on the semaphore (post is yet another pseudonym for up/v/signal)
#include "semaphore.h"
void foo() {
sem_t MutualExclusion;
/*
* Initialize semaphore.
* Conceptually, this is: Semaphore MutualExclusion = 1
*/
if (sem_init(&MutualExclusion, 0, 1) == -1)
unable to initialize semaphore, report failure;
/* launch threads bar_none, bar_chocolate
* These routines are passed a pointer to MutualExclusion
* and any other parameters they might need. As POSIX
* threads only permits one argument, this will have to
* be packaged into a data structure.
*/
/* wait for threads to exit */
}
void bar_none(parameter structure) {
/*
* Assume SemaphorePtr is of type sem_t *
* and points to the MutualExclusion semaphore
* passed in via the parameter structure
*/
control structure (i.e. loop) {
/* remainder */
sem_wait(SemaphorePtr); /* entry */
/* mutual exclusion - do what we need to do */
sem_post(SemaphorePtr); /* exit */
/* remainder */
}
}
It is possible for sem_post and sem_wait to fail, but only when using signal handlers. As we will not be doing this, you may ignore the sem_wait and sem_post return codes for this assignment, but you should remember that they are a possibility. The man pages may be helpful for all three of these operations.
In addition, I have written a toy program which implements a critical region for the x++ and x-- example that we have done in class. You may find it useful to look at this program. Note that as the time slices per thread are reasonably large, you will need to put large values for the number of times that you increment or decrement before you start to see interleaving between the threads.
I have two structures/classes that I would like to point to one another. How can I do this?
In order to declare this, the compiler must know about the class/structure to which you are pointing to before you declare the pointer. We can use one of two ways to let the compiler know that the entity to which we wish to point is a class that will be defined later:
class foo {
private:
class bar *BarPtr; // Compiler will not complain as it knows how many
// bytes a pointer is and has been reassured that
// you will define bar someplace else.
// It is very important that we used the word "class"
// in the class bar declaration as bar does
// not exist yet.
// As an alternative, we could have placed the
// line "class bar;" before the class foo
// declaration, and the compiler would have
// been satisfied with a "bar *BarPtr;" declaration.
// Note that the declaration:
// class bar Bar;
// would be illegal as the compiler would not know
// how much memory to allocate (since bar has not
// been defined yet.
// other things to make this class useful
};
class bar {
private:
class foo *FooPtr; // Pointer to class foo.
// The word class is optional.
// other things to make this class useful
};
When dealing with structures, the situation is similar:
struct foo {
struct bar *BarPtr; /* Pointer to a bar structure */
/* other information... */
};
struct bar {
struct foo *FooPtr; /* Pointer to a foo structure
* Note that we could have simply
* typed foo here as foo has already
* been declared. (Although I think
* most C programmers would probably
* write the words struct to make this
* obvious.)
*/
/* other information... */
};
How can I have an array of unspecified size be part of a structure/class?
Your structure or class should contain a pointer to the array which you will need to initialize to memory that you allocate at run time. As an example, suppose that variable "pointer" points to an instance of the following structure:
struct xyzzy {
struct great_underground_empire *zork;
};
To allocate an array of N items, we would do the following:
struct xyzzy a_struct; /* we could also use calloc which has slightly * different syntax, see man pages for details */ a_struct.zork = malloc(sizeof(struct great_underground_empire) * N); /* ... check if pointer allocated okay ... */ /* use item. We'll assume name is a valid field */ a_struct->zork[x].name = "Flood Control Dam";In C++, you would use new great_underground_empire[N] instead of the malloc.
Just what exactly goes in a header file?
Header files should only include type definitions (including function prototypes) and macros. Never declare a function in a header file (with the exception of an in-line C++ function, and then, only if you know what you are doing).Many students have a tendency to include functions in a header file, i.e. "kitchen_ops.h" and then include those functions into another file. This is a bad programming habit.
Header files are frequently included by more than one file. Suppose leonidas.c and neuhaus.c are both part of the same program and both need to use the functions whose signatures are defined by the prototypes in kitchen_ops.h. If we were to put actual code in kitchen_ops.h, the compiler would generate two different sets of functions. One for the object code associated with leonidas.c and the other for the neuhaus object code. When the linker tries to combine these two object files, errors will result due to the duplication of functions.
The correct way to do this is to put the prototypes in the header file and the functions in a .c file such as kitchen_ops.c. While our example was for the C language (or so one would assume by the .c file extension), the same is true of C++.
How do I determine the number of bytes (size) of various data structures?
The size of built-in and user-defined types can be determined using the sizeof() operator. Try compiling this small C/C++ program which shows you a few of the subtler issues. (You will need to look at the program along with its output.)How can I use pointers to functions?
Sometimes, we would like to pass a handle to a function which is to be called inside another function. As an example of this, suppose that you implemented quicksort in C. You could implement a quicksort for each data type that you wanted to sort, or you could code it once and pass the a function which does comparisons to the generic quicksort routine (This is what the Standard C Library function qsort does, man qsort for details).In order to do this, we need to declare pointers to functions. In C and C++, the type of a function is defined by its signature.
Suppose that we wished to compare two integers and return -1 of the first was smaller, 0 if they are the same, and 1 if they are different. We might write a function like this:
/* Avoid sprinkling our code with magic numbers */
#define X_LESS_THAN__Y -1
#define X_EQUALS_Y 0
#define X_GREATER_THAN_Y 1
int compare_ints(int x, int y) {
if (x > y)
return X_GREATER_THAN_Y;
else if (x < y)
return X_LESS_THAN_Y;
else
return X_EQUALS_Y;
}
It has a function signature of: int compare_ints(int, int).
Let us declare a pointer to this function:
int (* Pointer)(int int);We now have a variable named Pointer which can point to a function which returns an int and takes two ints as arguments. We can set it to the address of any function which matches the signature. Since compare_ints matches this signature, have Pointer point to compare_ints.
/* Set Pointer to the address of compare_ints using the * address operator & */ Pointer = &compare_ints;Because of Pointer's type (a pointer to a function) and the fact that you can only call functions or take their addresses, the designers of C and C++ decided to allow the following syntax as well.
/* Also sets Pointer to the address of compare_ints */ Pointer = compare_ints;We can call compare_ints as follows:
/* Assume other variables of the appropriate type... */ /* call compare_ints(Array[i], Array[j] */ Result = (* Pointer) (Array[i], Array[j]); /* For similar reasons as to why we permitted dropping the * address operator (&), we can also omit the dereferencing * operator (*). */ /* Another way to do it... */ Result = Pointer(Array[i], Array[j]);
This is pretty neat, but it does not solve the problem that we originally had, how to pass in a generic comparison routine. C/C++'s solution to this is the pointer to void type. A variable of type pointer to void (void *) points to something without knowing the type.
So in our quicksort routine, we could declare the following as one of the arguments:
int (*ComparisonFn)(void *, void *)meaning that the comparison function has to pass in pointers to something, but we don't know what. We could then rewrite our integer comparison routine as follows:
int compare_ints(void * xVoidPtr, void * yVoidPtr) {
int *xIntPtr, *yIntPtr;
/* type cast the void pointers to int pointers */
xPtr = (int *) xVoidPtr;
yPtr = (int *) yVoidPtr;
/* dereference and compare */
if (*xPtr > yPtr)
return X_GREATER_THAN_Y;
else if (*xPtr < *yPtr)
return X_LESS_THAN_Y;
else
return X_EQUALS_Y;
}
Now we can use the same function signature to compare any two items, and pass the function name to our quicksort routine. Note that for this example, we should have declared the arguments to be const as we are not modifying them, but this was omitted for pedagogical reasons.
