Thursday, February 9, 2012

Unix Tools - Introduction To Unix Signals Programming


Table Of Contents


  1. What Are Signals?
  2. What Are Signals Used For?
  3. Sending Signals To Processes
  4. Catching Signals - Signal Handlers
  5. Installing Signal Handlers
  6. Avoiding Signal Races - Masking Signals
  7. Implementing Timers Using Signals
  8. Summary - "Do" and "Don't" inside A Signal Handler

What Are Signals?

Signals, to be short, are various notifications sent to a process in order to notify it of various "important" events. By their nature, they interrupt whatever the process is doing at this minute, and force it to handle them immediately. Each signal has an integer number that represents it (1, 2 and so on), as well as a symbolic name that is usually defined in the file /usr/include/signal.h or one of the files included by it directly or indirectly (HUP, INT and so on. Use the command 'kill -l' to see a list of signals supported by your system).
Each signal may have a signal handler, which is a function that gets called when the process receives that signal. The function is called in "asynchronous mode", meaning that no where in your program you have code that calls this function directly. Instead, when the signal is sent to the process, the operating system stops the execution of the process, and "forces" it to call the signal handler function. When that signal handler function returns, the process continues execution from wherever it happened to be before the signal was received, as if this interruption never occurred.
Note for "hardwarists": If you are familiar with interrupts (you are, right?), signals are very similar in their behavior. The difference is that while interrupts are sent to the operating system by the hardware, signals are sent to the process by the operating system, or by other processes. Note that signals have nothing to do with software interrupts, which are still sent by the hardware (the CPU itself, in this case).

What Are Signals Used For?

Signals are usually used by the operating system to notify processes that some event occurred, without these processes needing to poll for the event. Signals should then be handled, rather then used to create an event notification mechanism for a specific application.
When we say that "Signals are being handled", we mean that our program is ready to handle such signals that the operating system might be sending it (such as signals notifying that the user asked to terminate it, or that a network connection we tried writing into, was closed, etc). Failing to properly handle various signals, would likely cause our application to terminate, when it receives such signals.

Sending Signals To Processes


Sending Signals Using The Keyboard

The most common way of sending signals to processes is using the keyboard. There are certain key presses that are interpreted by the system as requests to send signals to the process with which we are interacting:
Ctrl-C
Pressing this key causes the system to send an INT signal (SIGINT) to the running process. By default, this signal causes the process to immediately terminate.
Ctrl-Z
Pressing this key causes the system to send a TSTP signal (SIGTSTP) to the running process. By default, this signal causes the process to suspend execution.
Ctrl-\
Pressing this key causes the system to send a ABRT signal (SIGABRT) to the running process. By default, this signal causes the process to immediately terminate. Note that this redundancy (i.e. Ctrl-\ doing the same as Ctrl-C) gives us some better flexibility. We'll explain that later on.

Sending Signals From The Command Line

Another way of sending signals to processes is done using various commands, usually internal to the shell:

kill
The kill command accepts two parameters: a signal name (or number), and a process ID. Usually the syntax for using it goes something like:

 kill -<signal> <PID>
 
For example, in order to send the INT signal to process with PID 5342, type: kill -INT 5342 This has the same affect as pressing Ctrl-C in the shell that runs that process. If no signal name or number is specified, the default is to send a TERM signal to the process, which normally causes its termination, and hence the name of the kill command.
fg
On most shells, using the 'fg' command will resume execution of the process (that was suspended with Ctrl-Z), by sending it a CONT signal.

Sending Signals Using System Calls

A third way of sending signals to processes is by using the kill system call. This is the normal way of sending a signal from one process to another. This system call is also used by the 'kill' command or by the 'fg' command. Here is an example code that causes a process to suspend its own execution by sending itself the STOP signal:

#include <unistd.h>     /* standard unix functions, like getpid()       */
#include <sys/types.h>  /* various type definitions, like pid_t         */
#include <signal.h>     /* signal name macros, and the kill() prototype */

/* first, find my own process ID */
pid_t my_pid = getpid();

/* now that i got my PID, send myself the STOP signal. */
kill(my_pid, SIGSTOP);
An example of a situation when this code might prove useful, is inside a signal handler that catches the TSTP signal (Ctrl-Z, remember?) in order to do various tasks before actually suspending the process. We will see an example of this later on.

Catching Signals - Signal Handlers


Catchable And Non-Catchable Signals

Most signals may be caught by the process, but there are a few signals that the process cannot catch, and cause the process to terminate. For example, the KILL signal (-9 on all unices I've met so far) is such a signal. This is why you usually see a process being shut down using this signal if it gets "wild". One process that uses this signal is a system shutdown process. It first sends a TERM signal to all processes, waits a while, and after allowing them a "grace period" to shut down cleanly, it kills whichever are left using the KILL signal.
STOP is also a signal that a process cannot catch, and forces the process's suspension immediately. This is useful when debugging programs whose behavior depends on timing. Suppose that process A needs to send some data to process B, and you want to check some system parameters after the message is sent, but before it is received and processed by process B. One way to do that would be to send a STOP signal to process B, thus causing its suspension, and then running process A and waiting until it sends its oh-so important message to process B. Now you can check whatever you want to, and later on you can use the CONT signal to continue process B's execution, which will then receive and process the message sent from process A.
Now, many other signals are catchable, and this includes the famous SEGV and BUS signals. You probably have seen numerous occasions when a program has exited with a message such as 'Segmentation Violation - Core Dumped', or 'Bus Error - core dumped'. In the first occasion, a SEGV signal was sent to your program due to accessing an illegal memory address. In the second case, a BUS signal was sent to your program, due to accessing a memory address with invalid alignment. In both cases, it is possible to catch these signals in order to do some cleanup - kill child processes, perhaps remove temporary files, etc. Although in both cases, the memory used by your process is most likely corrupt, it's probable that only a small part of it was corrupt, so cleanup is still usually possible.

Default Signal Handlers

If you install no signal handlers of your own (remember what a signal handler is? yes, that function handling a signal?), the runtime environment sets up a set of default signal handlers for your program. For example, the default signal handler for the TERM signal calls the exit() system call. The default handler for the ABRT is to dump the process's memory image into a file named 'core' in the process's current directory, and then exit. Note that the 'core' file might actually have some suffix (such as 'core.567') on some systems.

Installing Signal Handlers

There are several ways to install signal handlers. We'll use the most basic form here, and refer you to your manual pages for further reading.

The signal() System Call

The signal() system call is used to set a signal handler for a single signal type. signal() accepts a signal number and a pointer to a signal handler function, and sets that handler to accept the given signal. As an example, here is a code snippest that causes the program to print the string "Don't do that" when a user presses Ctrl-C:


#include <stdio.h>     /* standard I/O functions                         */
#include <unistd.h>    /* standard unix functions, like getpid()         */
#include <sys/types.h> /* various type definitions, like pid_t           */
#include <signal.h>    /* signal name macros, and the signal() prototype */

/* first, here is the signal handler */
void catch_int(int sig_num)
{
    /* re-set the signal handler again to catch_int, for next time */
    signal(SIGINT, catch_int);
    /* and print the message */
    printf("Don't do that");
    fflush(stdout);
}

.
.
.
/* and somewhere later in the code.... */
.
.

/* set the INT (Ctrl-C) signal handler to 'catch_int' */
signal(SIGINT, catch_int);

/* now, lets get into an infinite loop of doing nothing. */
for ( ;; )
    pause();

The complete source code for this program is found in the catch-ctrl-c.c file.
Notes:
  • the pause() system call causes the process to halt execution, until a signal is received. it is surely better than a 'busy wait' infinite loop.
  • the name of a function in C/C++ is actually a pointer to the function, so when you're asked to supply a pointer to a function, you may simply specify its name instead.
  • On some systems (such as Linux), when a signal handler is called, the system automatically resets the signal handler for that signal to the default handler. Thus, we re-assign the signal handler immediately when entering the handler function. Otherwise, the next time this signal is received, the process will exit (default behavior for INT signals). Even on systems that do not behave in this way, it still won't hurt, so adding this line always is a good idea.

Pre-defined Signal Handlers

For our convenience, there are two pre-defined signal handler functions that we can use, instead of writing our own: SIG_IGN and SIG_DFL.
SIG_IGN:
Causes the process to ignore the specified signal. For example, in order to ignore Ctrl-C completely (useful for programs that must NOT be interrupted in the middle, or in critical sections), write this: signal(SIGINT, SIG_IGN);
SIG_DFL:
Causes the system to set the default signal handler for the given signal (i.e. the same handler the system would have assigned for the signal when the process started running): signal(SIGTSTP, SIG_DFL);

Avoiding Signal Races - Masking Signals

One of the nasty problems that might occur when handling a signal, is the occurrence of a second signal while the signal handler function executes. Such a signal might be of a different type than the one being handled, or even of the same type. Thus, we should take some precautions inside the signal handler function, to avoid races.
Luckily, the system also contains some features that will allow us to block signals from being processed. These can be used in two 'contexts' - a global context which affects all signal handlers, or a per-signal type context - that only affects the signal handler for a specific signal type.

Masking signals with sigprocmask()

the (modern) "POSIX" function used to mask signals in the global context, is the sigprocmask() system call. It allows us to specify a set of signals to block, and returns the list of signals that were previously blocked. This is useful when we'll want to restore the previous masking state once we're done with our critical section.
Note: each process on a unix system has its own signals mask, which is used by the operating system to specify which signals should be delivered to the proces, and which should be blocked. The sigprocmask system call is used to take a signals mask we created in user space, and update the one in stored in the kernel, using this user-space mask. The mask stored in the kernel is the one later considered by the operating system, when deciding whether to deliver a signal to the process, or block it.
sigprocmask() accepts 3 parameters:


int how
defines if we want to add signals to the current process's mask (SIG_BLOCK), remove them from the current mask (SIG_UNBLOCK), or completely replace the current mask with the new mask (SIG_SETMASK).
const sigset_t *set
The set of signals to be blocked, or to be added to the current mask, or removed from the current mask (depending on the 'how' parameter).
sigset_t *oldset
If this parameter is not NULL, then it'll contain the previous mask. We can later use this set to restore the situation back to how it was before we called sigprocmask().


Note: Older systems do not support the sigprocmask() system call. Instead, one should use the sigmask() and sigsetmask() system calls. If you have such an operating system handy, please read the manual pages for these system calls. They are simpler to use than sigprocmask, so it shouldn't be too hard understanding them once you've read this section. You probably wonder what are these sigset_t variables, and how they are manipulated. Well, i wondered too, so i went to the manual page of sigsetops, and found the answer. There are several functions to handle these sets. Lets learn them using some example code:


/* define a new mask set */
sigset_t mask_set;
/* first clear the set (i.e. make it contain no signal numbers) */
sigemptyset(&mask_set);
/* lets add the TSTP and INT signals to our mask set */
sigaddset(&mask_set, SIGTSTP);
sigaddset(&mask_set, SIGINT);
/* and just for fun, lets remove the TSTP signal from the set. */
sigdelset(&mask_set, SIGTSTP);
/* finally, lets check if the INT signal is defined in our set */
if (sigismember(&mask_set, SIGINT)
    printf("signal INT is in our set\n");
else
    printf("signal INT is not in our set - how strange...\n");
/* finally, lets make the set contain ALL signals available on our system */
sigfillset(&mask_set)

Now that we know all these little secrets, lets see a short code example that counts the number of Ctrl-C signals a user has hit, and on the 5th time (note - this number was "Stolen" from some quite famous Unix program) asks the user if they really want to exit. Further more, if the user hits Ctrl-Z, the number of Ctrl-C presses is printed on the screen.


/* first, define the Ctrl-C counter, initialize it with zero. */
int ctrl_c_count = 0;
#define CTRL_C_THRESHOLD 5

/* the Ctrl-C signal handler */
void catch_int(int sig_num)
{
    sigset_t mask_set; /* used to set a signal masking set. */
    sigset_t old_set; /* used to store the old mask set.   */

    /* re-set the signal handler again to catch_int, for next time */
    signal(SIGINT, catch_int);
    /* mask any further signals while we're inside the handler. */
    sigfillset(&mask_set);
    sigprocmask(SIG_SETMASK, &mask_set, &old_set);
    
    /* increase count, and check if threshold was reached */
    ctrl_c_count++;
    if (ctrl_c_count >= CTRL_C_THRESHOLD) {
 char answer[30];

 /* prompt the user to tell us if to really exit or not */
 printf("\nRealy Exit? [y/N]: ");
 fflush(stdout);
 gets(answer);
 if (answer[0] == 'y' || answer[0] == 'Y') {
     printf("\nExiting...\n");
     fflush(stdout);
     exit(0);
 }
 else {
     printf("\nContinuing\n");
     fflush(stdout);
     /* reset Ctrl-C counter */
     ctrl_c_count = 0;
 }
    }
    /* no need to restore the old signal mask - this is done automatically, */{{/COMMENT_FONT}*/
    /* by the operating system, when a signal handler returns.              */{{/COMMENT_FONT}*/
}

/* the Ctrl-Z signal handler */
void catch_suspend(int sig_num)
{
    sigset_t mask_set; /* used to set a signal masking set. */
    sigset_t old_set; /* used to store the old mask set.   */

    /* re-set the signal handler again to catch_suspend, for next time */
    signal(SIGTSTP, catch_suspend);
    /* mask any further signals while we're inside the handler. */
    sigfillset(&mask_set);
    sigprocmask(SIG_SETMASK, &mask_set, &old_set);

    /* print the current Ctrl-C counter */
    printf("\n\nSo far, '%d' Ctrl-C presses were counted\n\n", ctrl_c_count);
    fflush(stdout);

    /* no need to restore the old signal mask - this is done automatically, */{{/COMMENT_FONT}*/
    /* by the operating system, when a signal handler returns.              */{{/COMMENT_FONT}*/
}

.
.
/* and somewhere inside the main function... */
.
.

/* set the Ctrl-C and Ctrl-Z signal handlers */
signal(SIGINT, catch_int);
signal(SIGTSTP, catch_suspend);
.
.
/* and then the rest of the program */
.
.

The complete source code for this program is found in the count-ctrl-c.c file.
You should note that using sigprocmask() the way we did now does not resolve all possible race conditions. For example, It is possible that after we entered the signal handler, but before we managed to call the sigprocmask() system call, we receive another signal, which WILL be called. Thus, if the user is VERY quick (or the system is very slow), it is possible to get into races. In our current functions, this will probably not disturb the flow, but there might be cases where this kind of race could cause problems.
The way to guarantee no races at all, is to let the system set the signal masking for us before it calls the signal handler. This can be done if we use the sigaction() system call to define both the signal handler function AND the signal mask to be used when the handler is executed. You would probably be able to read the manual page for sigaction() on your own, now that you're familiar with the various concepts of signal handling. On old systems, however, you won't find this system call, but you still might find the sigvec() call, that enables a similar functionality.
One final note - signal handling for processes is done quite differently then it is done for threads. with threads, the situation is more complicated (e.g. signals masked by one thread could still be sent to another thread). How signals are handled for threads, is out of the scope of this tutorial.

Implementing Timers Using Signals

One of the weak aspects of Unix-like operating systems is their lack of proper support for timers. Timers are important to allow one to check timeouts (e.g. wait for user input up to 30 seconds, or exit), check some conditions on a regular basis (e.g. check every 30 seconds that a server we're talking to is still active, or close the connection and notify the user about the problem), and so on. There are various ways to get around the problem for programs that use an "event loop" based on the select() system call (or its new replacement, the poll() system call), but not all programs work that way, and this method is too complex for short and simple programs.
Yet, the operating system gives us a simple way of setting up timers that don't require too much hassles, by using special alarm signals. They are generally limited to one timer active at a time, but that will suffice in simple cases.

The alarm() System Call

The alarm() system call is used to ask the system to send our process a special signal, named ALRM, after a given number of seconds. Since Unix-like systems don't operate as real-time systems, your process might receive this signal after a longer time than requested. Combining this system call with a proper signal handler enables us to do some simple tricks. Lets see an example of a program that waits for user input, but exits if none was given after a certain timeout.


#include <unistd.h>    /* standard unix functions, like alarm()          */
#include <signal.h>    /* signal name macros, and the signal() prototype */

char user[40];  /* buffer to read user name from the user */

/* define an alarm signal handler. */
void catch_alarm(int sig_num)
{
    printf("Operation timed out. Exiting...\n\n");
    exit(0);
}

.
.
/* and inside the main program... */
.
.

/* set a signal handler for ALRM signals */
signal(SIGALRM, catch_alarm);

/* prompt the user for input */
printf("Username: ");
fflush(stdout);
/* start a 30 seconds alarm */
alarm(30);
/* wait for user input */
gets(user);
/* remove the timer, now that we've got the user's input */
alarm(0);

.
.
/* do something with the received user name */
.
.

The complete source code for this program is found in the use-alarms.c file.
As you can see, we start the timer right before waiting for user input. If we started it earlier, we'll be giving the user less time than promised to perform the operation. We also stop the timer right after getting the user's input, before doing any tests or processing of the input, to avoid a random timer from setting off due to slow processing of the input. Many bugs that occur using the alarm() system call occur due to forgetting to set off the timer at various places when the code gets complicated. If you need to make some tests during the input phase, put the whole piece of code in a function, so it'll be easier to make sure that the timer is always set off after calling the function. Here is an example of how NOT to do this:


int read_user_name(char user[])
{
    int ok_len;
    int result = 0; /* assume failure */

    /* set an alarm to timeout the input operation */
    alarm(3); /* Mistake 1 */
    printf("Enter user name: "); /* Mistake 2 */
    fflush(stdout);
    if (gets(user) == NULL) {
 printf("End of input received.\n");
 return 0;  /* Mistake 3 */
    }
    /* count the size of prefix of 'user' made only of characters */
    /* in the given set. if all characters in 'user' are in the   */
    /* set, then ok_len with be equal to the length of 'user'.    */
    ok_len = strspn(user, "abcdefghijklmnopqrstuvwxyz0123456789");
    if (ok_len == strlen(user)) {
 /* check if the user exists in our database */
 result = find_user_in_database(user); /* Mistake 4 */
    }
    alarm(0);

    return result;
}

Lets count the mistakes and bad programming practices in the above function:
  1. * Too short timeout: 3 seconds might be enough for superman to type his user name, but a normal user obviously needs more time. Such a timeout should actually be tunable, because not all people type at the same pace, or should be long enough for even the slowest of users.
  2. * Printing while timer is ticking: Norty Norty. This printing should have been done before starting up the timer. Printing may be a time consuming operation, and thus will leave less time than expected for the user to type in the expected input.
  3. * Exiting function without turning off timer: This kind of mistake is hard to catch. It will cause the program to randomally exit somewhere LATER during the execution. If we'll trace it with a debugger, we'll see that the signal was received while we were already executing a completely different part of the program, leaving us scratching our head and looking up to the sky, hoping somehow inspiration will fall from there to guide us to the location of the problem.
  4. * Making long checks before turning off timer: As if it's not enough that we gave the poor user a short time to check for the input, we're also inserting some database checking operation while the timer is still ticking. Even if we also want to timeout the database operation, we should probably set up a different timer (and a different ALRM signal handler), so as not to confuse a slow user with a slow database server. It will also allow the user to know why we are timing out. Without this information, a person trying to figure out why the program suddenly exits, will have hard time finding where the fault lies.

Summary - "Do" and "Don't" inside A Signal Handler

We have seen a few "thumb rules" all over this tutorial, and there are quite many of those, as the area of signals handling is rather tricky. Lets try to summarize these rules of thumb here:
  • Make it short - the signal handler should be a short function that returns quickly. Instead of doing complex operations inside the signal handler, it is better that the function will raise a flag (e.g. a global variable, although these are evil by themselves) and have the main program check that flag occasionally.
  • Proper Signal Masking - don't be too lazy to define proper signal masking for a signal handler, preferably using the sigaction() system call. It takes a little more effort than just using the signal() system call, but it'll help you sleep better at night, knowing that you haven't left an extra place for race conditions to occur. Remember - if some bug has a probability of 1/10,000 to occur, it WILL occur when many people use that program many times, as tends to be the case with good programs (you write only good programs, no?).
  • Careful with "fault" signals - If you catch signals that indicate a program bug (SIGBUS, SIGSEGV, SIGFPE), don't try to be too smart and let the program continue, unless you know exactly what you are doing (which is a very rare case) - just do the minimal required cleanup, and exit, preferably with a core dump (using the abort() function). Such signals usually indicate a bug in the program, that if ignored will most likely cause it to crush sooner or later, making you think the problem is somewhere else in the code.
  • Careful with timers - when you use timers, remember that you can only use one timer at a time, unless you also (ab)use the VTALRM signal. If you need to have more than one timer active at a time, don't use signals, or devise a set of functions that will allow you to have several virtual timers using a delta list of some sort. If you've no idea what I'm talking about, you probably don't need several simultaneous timers in the first place.
  • Signals are NOT an event driven framework - it is easy to get carried away and try turning the signals system into an event-driven driver for a program, but signal handling functions were not meant for that. If you need such a thing, use some framework that is more suitable for the application (e.g. use an event loop of a windowing program, use a select-based loop inside a network server, etc.).

No comments:

Post a Comment