SICS Trouble Shooting


There is no such thing as bug free software. There are always bugs, nasty behaviour etc. This document shall help to solve these problems. The usual symptom will be that a client cannot connect to the server or the server is not responding.

An essential prerequisite of SICS is that the server is up and running. The system is configured to restart the SICServer whenever it fails. Only after a reboot or when the keepalive processes were killed (see below) the SICServer must be restarted. This is done for all instruments by typing:

startsics 
at the command prompt. startsics actually starts two programs: one is the replicator application which is responsible for the automatic copying of data files to the laboratory server. The other is the SICS server. Both programs are started by means of a shell script called keepalive. keepalive is basically an endless loop which calls the program again and again and thus ensures that the program will never stop running.

When the SICS server hangs, or you want to enforce an reinitialization of everything the server process must be killed. This can be accomplished either manually or through a shell script.

Stopping SICS

All SICS processes can be stopped through the command:

killsics
given at the unix command line. You must be the instrument user (for example DMC) on the instrument computer for this to work properly.

Finding the SICS server

The first thing when killing the SICS server manually is to find the server process. Log in as Instrument user on the instrument computer (for instance DMC on lnsa05). Type the command:

/home/DMC> ps -A
Note the capital A given as parameter. The reward will be listing like this:
  PID TTY      S           TIME CMD
    0 ??       R       01:56:28 [kernel idle]
    1 ??       I        1:24.44 /sbin/init -a
    3 ??       IW       0:00.20 /sbin/kloadsrv
   24 ??       S       40:39.58 /sbin/update
   97 ??       S        0:04.87 /usr/sbin/syslogd
   99 ??       IW       0:00.03 /usr/sbin/binlogd
  159 ??       S        1:43.70 /usr/sbin/routed -q
  285 ??       S        1:00.45 /usr/sbin/portmap
  293 ??       S        6:03.45 /usr/sbin/ypserv
  299 ??       I        0:00.37 /usr/sbin/ypbind -s -S psunix,lnsa05.psi.ch
  307 ??       I        0:00.52 /usr/sbin/mountd -i
  309 ??       I        0:00.07 /usr/sbin/nfsd -t8 -u8
  311 ??       I        0:00.09 /usr/sbin/nfsiod 7
  317 ??       S        5:51.54 /usr/sbin/automount -f /etc/auto.master -M /psi
  370 ??       I        0:28.58 -accepting connections (sendmail)
  389 ??       S        1:41.15 /usr/sbin/xntpd -g -c /etc/ntp.conf
  419 ??       S        6:00.16 /usr/sbin/snmpd
  422 ??       S        1:00.91 /usr/sbin/os_mibs
  438 ??       S       34:29.67 /usr/sbin/advfsd
  449 ??       I        3:16.29 /usr/sbin/inetd
  482 ??       IW       0:11.53 /usr/sbin/cron
  510 ??       IW       0:00.02 /usr/lbin/lpd
  525 ??       I        5:31.67 /usr/opt/psw/psw_agent -x/dev/null -f/usr/opt/psw/psw_agent.conf
  532 ??       I        0:00.74 /usr/opt/psw/psw_sensor_syswd 1 -x/dev/null
  555 ??       I        0:00.58 /usr/bin/nsrexecd
  571 ??       I        0:20.27 /usr/dt/bin/dtlogin -daemon
  583 ??       S        1:38.27 lpsbootd -F /etc/lpsodb -l 0 -x 1
  585 ??       IW       0:00.04 /usr/sbin/getty /dev/lat/620 console vt100
  586 ??       IW       0:00.03 /usr/sbin/getty /dev/lat/621 console vt100
  587 ??       I       35:59.85 /usr/bin/X11/X :0 -auth /var/dt/authdir/authfiles/A:0-aaarBa
  657 ??       I        0:01.46 rpc.ttdbserverd
 4705 ??       IW       0:00.05 dtlogin   -daemon
 9127 ??       I        0:00.37 /usr/bin/X11/dxconsole -geometry 480x150-0-0 -daemon -nobuttons -verbose -notify -exitOnFail -nostdin -bg gray
 9317 ??       IW       0:00.73 dtgreet -display :0
14412 ??       S        0:39.71 netscape
15524 ??       I        0:00.57 rpc.cmsd
21678 ??       S        0:00.11 telnetd
31912 ??       S        0:10.65 /home/DMC/bin/SICServer /home/DMC/bin/dmc.tcl
  584 console  IW +     0:00.21 /usr/sbin/getty console console vt100
21978 ttyp1    S        0:00.63 -tcsh (tcsh)
22269 ttyp1    R  +     0:00.10 ps -A
This is a listing of all running processes on the machine where this command has been typed. Note, in this case, at the bottom in the line starting with 31912 ?? an entry for the SICS server. In this example the server is running. If the server is down, no such entry would be present.

Killing a hanging SICS server

Suppose, the situation is that the SICS server does not respond anymore. It needs to be forcefully exited. Please note, that it is always better to close the server via the Sics_Exitus command typed with manager privilege in one of the command clients. In order to kill the server it is needed to find him first using the scheme given above. The information needed is the number given as first item in the same line where the server is listed. In this case: 31912. Please note, that this number will always be different. The command to force the server to stop is:

/home/DMC> kill -9 31912
Note, the second parameter is the number found with ps -A. The SICServer will be restarted automatically by the system. Occasionally, it may happen, that you cannot connect to the SICS server after such an operation. This is due to some network buffering problems. Doing the killing again usually solves the problem.

Shutting The SICS Server Down Completely

This is done for you by the killsics shell script. Just type

killsics
at the unix command line. Here is what killsics does for you: In order to completely shutdown the SICS server two process must be killed: the actual SICS server and the process which automatically restarts the SICServer. The latter must be killed first. It can be found in the ps -A listing as a line reading keepalive SICServer . Kill that one as described above, then kill the SICServer. For restarting SICS after this, use the startsics command.

Restart Everything

If nothing seems to work any more, no connections can be obtained etc, then the next guess is to restart everything. This is especially necessary if mechanics or electronics people were closer to the instrument then 400 meters.

  1. Reboot the Macintosh PC by switching it off at the silver button on the left. Press deep and a few seconds to achieve an effect. The LED right to the button should be off, before you press again to boot the Macintosh.
  2. Reboot the histogram memory. It has a tiny button labelled RST. That' s the one. Can be operated with a hairpin, a ball point pen or the like.
  3. Wait 5 minutes. The Macintosh may take that time to come up again.
  4. Restart the SICServer. Watch for any messages about things not being connected or configured.
  5. Restart and reconnect the client programs.
If this fails (even after a second) time there may be a network problem which can not be resolved by simple means.

Getting New SICS Software

Sometimes you might want to be sure that you have the latest SICS software. This is how to get it:

  1. Login to the instrument account.
  2. If you are no there type cd to get into the home directory.
  3. Type killsics at the unix prompt in order to stop the SICS server.
  4. Type sicsinstall exe at the unix prompt for copying new SICS software from the general distribution area.
  5. Type startsics to restart the SICS software.

Hot Fixes

When there is trouble with SICS you may be asked by one of the SICS programmers to copy the most recent development reason of the SICS server to your machine. This is done as follows:

  1. Login to the instrument account.
  2. cd into the bin directory, for example: /home/DMC/bin.
  3. Type killsics at the unix prompt in order to stop the SICS server.
  4. Type cp /data/koenneck/src/sics/SICServer . at the unix prompt.
  5. Type startsics to restart the SICS software.
!!!!!! WARNING !!!!!!!. Do this only when advised to do so by a competent SICS programmer. Otherwise you might be copying a SICS server in an instable experimental state!

HELP debugging!!!!

The SICS server hanging or crashing should not happen. In order to sort such problems out it is very helpful if any available debugging information is saved and presented to the programmers. Information available are the log files as written continously by the SICS server and posssible core files lying around. They have just this name: core. In order to save them create a new directory (for example dump2077) and copy the stuff in there. This looks like:

/home/DMC> mkdir dump2077
/home/DMC> cp log/*.log dump2077
/home/DMC> cp core dump2077
The /home/DMC> is just the command prompt. Please note, that core files are only available after crashes of the server. These few commands will help to analyse the cause of the problem and to eventually resolve it.