SICS Trouble Shooting


Check Server Status

One of the first things to do is to check the server status with: monit status.


Inspecting Log Files

Suppose something went wrong over the weekend or during the night and you are not absolutely sure what the problem was. In such a case it is helpful to look at the SICS log files. They live in the log directory of the instrument account. For each day (or after each restart of the SICS server) a new log file is created. They are named according to the following convention:

autoYYYY-mm-dd@hh-MM-ss.log
with YYYY denoting the year, mm the month, dd the day, hh the hour of creation, MM the minute of creation and ss the seconds of creation. The most recent log file can be looked at with the sicstail command. sicstail num shows the last num lines of the log file. Within SICS and especially in the SICS command line client, the last 1000 lines of the log are accessible through the commandlog tail num command. The command log is also accessible through the WWW at lns00. The log file is equipped with hourly time stamps which allow to find out when exactly a problem began to appear.

There is also another log file, log/monit.log, which logs messages from the monit daemon. This can be used to determine when server processes were restarted or when hardware failed.

Quite often the inspection of the log files will indicate problems which are not software related such as:

Restarting SICS


monit restart sicsserver


Restart Everything

If nothing seems to work any more, no connections can be obtained etc, then the next guess is to restart everything. This is especially necessary if mechanics or electronics people were closer to the instrument then a nautical mile.


Starting SICS Manually

In order to find out if some hardware is broken or if the SICS server initializes badly it is useful to look at the SICS servers startup messages. The following steps are required:

Replace inst by the name of the instrument, as usual. Look at the screen output in order to find out why SICS does not initialize things or where the initialization hangs. Do not forget to kill the SICServer thus started when you are done and to issue the command: monit start sicsserver in order to place the SICS server back under monits control again.


Test the SerPortServer Program

Sometimes the SerPortServer program hangs and inhibits the communication with the RS-232 hardware. This can be diagnosed by the following procedure: Find out at which port either a EL734 motor controller or a E737 counter box lives. Then type:asyncom localhost 4000 portnumber This yields a new prompt at which you type ID. If all is well a string identifying the device will be printed. If not a large stack dump will come up. The asyncom program can be exited by typing quit. If there is a problem with the SerPortServer program type: monit restart SerPortServer in order to restart it.


Trouble with Environment Devices

The first stop for trouble with temperature or other environment devices is Markus Zolliker. A common problem is that old environment controllers have not be deconfigured from the system and still reserve terminal server ports. Thus take care to deconfigure your old devices when swapping.


HELP debugging!!!!

The SICS server hanging or crashing should not happen. In order to sort such problems out it is very helpful if any available debugging information is saved and presented to the programmers. Information available are the log files as written continously by the SICS server and posssible core files lying around. They have just this name: core. In order to save them create a new directory (for example dump2077) and copy the stuff in there. This looks like:

/home/DMC> mkdir dump2077
/home/DMC> cp log/*.log dump2077
/home/DMC> cp core dump2077
The /home/DMC> is just the command prompt. Please note, that core files are only available after crashes of the server. These few commands will help to analyse the cause of the problem and to eventually resolve it.