136 lines
5.1 KiB
HTML
136 lines
5.1 KiB
HTML
<html>
|
|
<head>
|
|
<title>SICS Trouble Shooting</title>
|
|
</head>
|
|
<body>
|
|
|
|
<h1>SICS Trouble Shooting </h1>
|
|
<hr size=4 width="66%">
|
|
<H2>Check Server Status</h2>
|
|
<p>
|
|
One of the first things to do is to check the server status with:
|
|
monit status.
|
|
</p>
|
|
<hr size=4 width="66%">
|
|
<H2>Inspecting Log Files</h2>
|
|
<p>
|
|
Suppose something went wrong over the weekend or during the night and
|
|
you are not absolutely sure what the problem was. In such a case it is
|
|
helpful to look at the SICS log files. They live in the log directory
|
|
of the instrument account. For each day (or after each restart of the
|
|
SICS server) a new log file is created. They are named according to the
|
|
following convention:
|
|
<pre>
|
|
autoYYYY-mm-dd@hh-MM-ss.log
|
|
</pre>
|
|
with YYYY denoting the year, mm the month, dd the day, hh the hour of
|
|
creation, MM the minute of creation and ss the seconds of
|
|
creation. The most recent log file can be looked at with the
|
|
<b>sicstail</b> command. <b>sicstail num</b> shows the last num lines
|
|
of the log file. Within SICS and especially in the SICS command line
|
|
client, the last 1000 lines of the log are accessible through the
|
|
<b>commandlog tail num</b> command. The command log is also accessible
|
|
through the WWW at lns00. The log file is equipped with hourly time
|
|
stamps which allow to find out when exactly a problem began to
|
|
appear.
|
|
</p>
|
|
<p>
|
|
There is also another log file, log/monit.log, which logs messages from
|
|
the monit daemon. This can be used to determine when server processes
|
|
were restarted or when hardware failed.
|
|
</p>
|
|
<p>
|
|
Quite often the inspection of the log files will indicate problems
|
|
which are not software related such as:
|
|
<ul>
|
|
<li>Communication problems (usually network)
|
|
<li>Positioning problems of motors.
|
|
<li>BAD_EMERG_STOP: the motor emergency stop was engaged. It must be
|
|
released before the motors move again.
|
|
<li>BAD_STP: a motor had been switched off.
|
|
</ul>
|
|
</p>
|
|
<h2>Restarting SICS</h2>
|
|
<hr size=4 width="66%">
|
|
<p>
|
|
<dl>
|
|
<dt>monit restart sicsserver
|
|
</dl>
|
|
</p>
|
|
<hr size=4 width="66%">
|
|
<h2>Restart Everything</h2>
|
|
<p>
|
|
If nothing seems to work any more, no connections can be obtained etc, then
|
|
the next guess is to restart everything. This is especially necessary if
|
|
mechanics or electronics people were closer to the instrument then a
|
|
nautical mile.
|
|
<uL>
|
|
<LI> Reboot the histogram memory. It has a tiny button labelled RST. That' s
|
|
the one. Can be operated with a hairpin, a ball point pen or the like.
|
|
<li>Restart all of SICS with the sequence: monit stop all; monit quit; monit
|
|
<li>Wait for a couple of minutes for the system to come up.
|
|
</ul>
|
|
</p>
|
|
<hr size=4 width="66%">
|
|
<h2>Starting SICS Manually</h2>
|
|
<p>
|
|
In order to find out if some hardware is broken or if the SICS server
|
|
initializes badly it is useful to look at the SICS servers startup messages.
|
|
The following steps are required:
|
|
<ul>
|
|
<li>monit stop sicsserver
|
|
<li>cd ~/inst_sics
|
|
<li>./SICServer inst.tcl | more
|
|
</ul>
|
|
Replace inst by the name of the instrument, as usual. Look at the screen
|
|
output in
|
|
order to find out why SICS does not initialize things or where the
|
|
initialization hangs. Do not forget to kill the SICServer thus started when
|
|
you are done and to issue the command: <b>monit start sicsserver</b> in order
|
|
to place the SICS server back under monits control again.
|
|
</p>
|
|
<hr size=4 width="66%">
|
|
<h2>Test the SerPortServer Program</h2>
|
|
<p>
|
|
Sometimes the SerPortServer program hangs and inhibits the communication with
|
|
the RS-232 hardware. This can be diagnosed by the following procedure: Find
|
|
out at which port either a EL734 motor controller or a E737 counter box
|
|
lives. Then type:<b>asyncom localhost 4000 portnumber</b> This yields a
|
|
new prompt at which you type <b>ID</b>. If all is well a string identifying
|
|
the device will be printed. If not a large stack dump will come up.
|
|
The asyncom program can be exited by typing <b>quit</b>. If there is
|
|
a problem with the
|
|
SerPortServer program type: <b>monit restart SerPortServer</b> in order to
|
|
restart it.
|
|
</p>
|
|
<hr size=4 width="66%">
|
|
<h2>Trouble with Environment Devices</h2>
|
|
<p>
|
|
The first stop for trouble with temperature or other environment devices
|
|
is Markus Zolliker. A common problem is that old environment controllers
|
|
have not be deconfigured from the system and still reserve terminal server
|
|
ports. Thus take care to deconfigure your old devices when swapping.
|
|
</p>
|
|
<hr size=4 width="66%">
|
|
<h2> HELP debugging!!!!</h2>
|
|
<p>
|
|
The SICS server hanging or crashing should not happen. In order to sort such
|
|
problems out it is very helpful if any available debugging information is
|
|
saved and presented to the programmers. Information available are the log
|
|
files as written continously by the SICS server and posssible core files
|
|
lying around. They have just this name: core. In order to save them create a
|
|
new directory (for example dump2077) and copy the stuff in there. This looks
|
|
like:
|
|
<pre>
|
|
/home/DMC> mkdir dump2077
|
|
/home/DMC> cp log/*.log dump2077
|
|
/home/DMC> cp core dump2077
|
|
</pre>
|
|
The <tt>/home/DMC> </tt> is just the command prompt. Please note, that core
|
|
files are only available after crashes of the server. These few commands
|
|
will help to analyse the cause of the problem and to eventually resolve it.
|
|
</p>
|
|
</body>
|
|
</html>
|
|
|