One of the first things to do is to check the server status with: monit status.
Suppose something went wrong over the weekend or during the night and you are not absolutely sure what the problem was. In such a case it is helpful to look at the SICS log files. They live in the log directory of the instrument account. For each day (or after each restart of the SICS server) a new log file is created. They are named according to the following convention:
autoYYYY-mm-dd@hh-MM-ss.logwith YYYY denoting the year, mm the month, dd the day, hh the hour of creation, MM the minute of creation and ss the seconds of creation. The most recent log file can be looked at with the sicstail command. sicstail num shows the last num lines of the log file. Within SICS and especially in the SICS command line client, the last 1000 lines of the log are accessible through the commandlog tail num command. The command log is also accessible through the WWW at lns00. The log file is equipped with hourly time stamps which allow to find out when exactly a problem began to appear.
There is also another log file, log/monit.log, which logs messages from the monit daemon. This can be used to determine when server processes were restarted or when hardware failed.
Quite often the inspection of the log files will indicate problems which are not software related such as:
If nothing seems to work any more, no connections can be obtained etc, then the next guess is to restart everything. This is especially necessary if mechanics or electronics people were closer to the instrument then a nautical mile.
In order to find out if some hardware is broken or if the SICS server initializes badly it is useful to look at the SICS servers startup messages. The following steps are required:
Sometimes the SerPortServer program hangs and inhibits the communication with the RS-232 hardware. This can be diagnosed by the following procedure: Find out at which port either a EL734 motor controller or a E737 counter box lives. Then type:asyncom localhost 4000 portnumber This yields a new prompt at which you type ID. If all is well a string identifying the device will be printed. If not a large stack dump will come up. The asyncom program can be exited by typing quit. If there is a problem with the SerPortServer program type: monit restart SerPortServer in order to restart it.
The first stop for trouble with temperature or other environment devices is Markus Zolliker. A common problem is that old environment controllers have not be deconfigured from the system and still reserve terminal server ports. Thus take care to deconfigure your old devices when swapping.
The SICS server hanging or crashing should not happen. In order to sort such problems out it is very helpful if any available debugging information is saved and presented to the programmers. Information available are the log files as written continously by the SICS server and posssible core files lying around. They have just this name: core. In order to save them create a new directory (for example dump2077) and copy the stuff in there. This looks like:
/home/DMC> mkdir dump2077 /home/DMC> cp log/*.log dump2077 /home/DMC> cp core dump2077The /home/DMC> is just the command prompt. Please note, that core files are only available after crashes of the server. These few commands will help to analyse the cause of the problem and to eventually resolve it.