PSI sics-cvs-psi_pre-ansto
This commit is contained in:
209
doc/user/trouble.htm
Normal file
209
doc/user/trouble.htm
Normal file
@@ -0,0 +1,209 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>SICS Trouble Shooting</title>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<h1>SICS Trouble Shooting </h1>
|
||||
<hr size=4 width="66%">
|
||||
<p>
|
||||
There is no such thing as bug free software. There are always bugs, nasty
|
||||
behaviour etc. This document shall help to solve these problems. The usual
|
||||
symptom will be that a client cannot connect to the server or the server is
|
||||
not responding. Or error messages show up. This section helps to solve such
|
||||
problems.
|
||||
</p>
|
||||
<h2>Looking at Log Files</h2>
|
||||
<p>
|
||||
The first thing to do, especially when confronted with confusing statements
|
||||
from either users or instrument scientists, is to look at the SICS servers
|
||||
log files. The last 1000 lines of the instrument log are accessible from
|
||||
any SICS client or through the WWW interface. The SICS commands:
|
||||
<dl>
|
||||
<dt>commandlog tail
|
||||
<dd> shows the last 20 lines of the log.
|
||||
<dt>commandlog tail n
|
||||
<dd>shows the last n lines of the log.
|
||||
</dl>
|
||||
will show you the information available. In order to see more, log in to the
|
||||
instrument account. There the following unix commands might help:
|
||||
<ul>
|
||||
<li><b>sicstail</b> shows the last 20 lines of the current log file and its
|
||||
name
|
||||
<li><b>sicstail n</b> shows the last n lines of the current log file.
|
||||
</ul>
|
||||
In order to see some more, cd into the log directory of the instrument
|
||||
account. In there are files with names like:
|
||||
<pre>
|
||||
auto2001-08-08@00-01-01.log
|
||||
</pre>
|
||||
This means the log file has been started at August, 8, 2001 at 00:01:01.
|
||||
There is a new log file daily. Load appropriate files into the editor and
|
||||
look what really happened.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The log files show you all commands given and all the responses of the system.
|
||||
Additionally there are hourly time stamps in the file which allow to narrow
|
||||
in when the problem started. Things to watch out for are:
|
||||
<dl>
|
||||
<dt>MOTOR ALARM
|
||||
<dd>This message means that the motor failed to reach his position for a
|
||||
couple of times. This is caused by either a concrete shielding element
|
||||
blocking the movement of the instrument, badly adjusted motor parameters,
|
||||
mechanical failures or the air cushions not operating properly.
|
||||
<dt>EL734__BAD_EMERG_STOP
|
||||
<dd>Somebody has pushed the emergency stop button. This must be released
|
||||
before the instrument can move again. Moreover the motor controller will
|
||||
not respond to further commands in this mode. Thus restarting SICS on this
|
||||
error message will make SICS fail to initialize the motors affected!
|
||||
<dt>EL***__BAD_PIPE, BAD_RECV, BAD_ILLG, BAD_TMO, BAD_SEND
|
||||
<dd>Network communication problems. Can generaly be solved by restarting
|
||||
SICS.
|
||||
<dt>EL737__BAD_BSY
|
||||
<dd>A counting operation was aborted while the beam was off. Unfortunately,
|
||||
the counter box does not respond to commands in this state and ignores the
|
||||
stop command sent to it during the abort operation. This can be resolved by
|
||||
the command:
|
||||
<pre>
|
||||
counter stop
|
||||
</pre>
|
||||
when the beam is on again.
|
||||
</dl>
|
||||
</p>
|
||||
<h2>Starting SICS</h2>
|
||||
<p>
|
||||
An essential prerequisite of SICS is that the server is up
|
||||
and running. The system is configured to restart the SICServer whenever it
|
||||
fails. Only after a reboot or when the keepalive processes were killed (see
|
||||
below) the SICServer must be restarted. This is done for all instruments by
|
||||
typing:
|
||||
<pre>
|
||||
startsics
|
||||
</pre>
|
||||
at the command prompt. startsics actually starts two programs: one is
|
||||
the replicator application which is responsible for the automatic
|
||||
copying of data files to the laboratory server. The other is the SICS
|
||||
server. Both programs are started by means of a shell script called
|
||||
<b>keepalive</b>. keepalive is basically an endless loop which calls
|
||||
the program again and again and thus ensures that the program will
|
||||
never stop running.
|
||||
</p>
|
||||
<p>
|
||||
When the SICS server hangs, or you want to enforce an reinitialization of
|
||||
everything the server process must be killed. This can be accomplished either manually or through a shell script.
|
||||
</p>
|
||||
<h2>Stopping SICS</h2>
|
||||
<p>
|
||||
All SICS processes can be stopped through the command:
|
||||
<pre>
|
||||
killsics
|
||||
</pre>
|
||||
given at the unix command line. You must be the instrument user
|
||||
(for example DMC) on the instrument computer for this to work properly.
|
||||
</p>
|
||||
|
||||
<h2>Restart Everything</h2>
|
||||
<p>
|
||||
If nothing seems to work any more, no connections can be obtained etc, then
|
||||
the next guess is to restart everything. This is especially necessary if
|
||||
mechanics or electronics people were closer to the instrument then 400 meters.
|
||||
<OL>
|
||||
<LI> Reboot the histogram memory. It has a tiny button labelled RST. That' s
|
||||
the one. Can be operated with a hairpin, a ball point pen or the like.
|
||||
<LI> Wait 5 minutes.
|
||||
<LI> Restart the SICServer. Watch for any messages about things not being
|
||||
connected or configured.
|
||||
<LI> Restart and reconnect the client programs.
|
||||
</OL>
|
||||
If this fails (even after a second) time there may be a network problem which
|
||||
can not be resolved by simple means.
|
||||
</p>
|
||||
<h2>Checking SICS Startup</h2>
|
||||
<p>
|
||||
Sometimes it happens that the SICServer hangs while starting up or hardware
|
||||
components are not properly initialized. In such cases it is useful to
|
||||
look at the SICS servers startup messages. In order to do so, both the
|
||||
SICServer and its keepalive process must be killed first. On the instrument
|
||||
acount issue the command:
|
||||
<pre>
|
||||
ps -A | grep SICS
|
||||
</pre>
|
||||
A message like this will be printed:
|
||||
<pre>
|
||||
23644 ?? I 0:00.00 ksh keepalive SICServer focus.tcl
|
||||
23672 ?? R 59:24.05 SICServer focus.tcl
|
||||
7119 ttyp6 S + 0:00.00 grep SICS
|
||||
</pre>
|
||||
Remember the numbers in the first columns (the PID's) and kill both
|
||||
programs by issuing the command:
|
||||
<pre>
|
||||
kill -9 pid pid
|
||||
</pre>
|
||||
Example:
|
||||
<pre>
|
||||
kill -9 23644 23672
|
||||
</pre>
|
||||
Note, the numbers are those displayed with the ps -A command.
|
||||
Then cd into the bin directory of the instrument account and issue
|
||||
the unix command:
|
||||
<pre>
|
||||
SICServer inst.tcl | more
|
||||
</pre>
|
||||
Replace inst.tcl with the name of the appropriate instrument initialisation
|
||||
file. This allows to page through SICS startup messages and will help to
|
||||
identify the troublesome component. The proceed to check the component and
|
||||
the connections to it.
|
||||
</p>
|
||||
|
||||
<h2>Getting New SICS Software</h2>
|
||||
<p>
|
||||
Sometimes you might want to be sure that you have the latest SICS software.
|
||||
This is how to get it:
|
||||
<ol>
|
||||
<li>Login to the instrument account.
|
||||
<li>If you are no there type cd to get into the home directory.
|
||||
<li>Type <b>killsics</b> at the unix prompt in order to stop the SICS server.
|
||||
<li>Type <b>sicsinstall exe</b> at the unix prompt for copying new
|
||||
SICS software from the general distribution area.
|
||||
<li>Type <b> startsics</b> to restart the SICS software.
|
||||
</ol>
|
||||
</p>
|
||||
<h2>Hot Fixes</h2>
|
||||
<p>
|
||||
When there is trouble with SICS you may be asked by one of the SICS
|
||||
programmers to copy the most recent development reason of the SICS server
|
||||
to your machine. This is done as follows:
|
||||
<ol>
|
||||
<li>Login to the instrument account.
|
||||
<li>cd into the bin directory, for example: /home/DMC/bin.
|
||||
<li>Type <b> killsics</b> at the unix prompt in order to stop the SICS server.
|
||||
<li>Type <b>cp /data/koenneck/src/sics/SICServer .</b> at the unix prompt.
|
||||
<li>Type <b> startsics</b> to restart the SICS software.
|
||||
</ol>
|
||||
<b>!!!!!! WARNING !!!!!!!. Do this only when advised to do so by a competent
|
||||
SICS programmer. Otherwise you might be copying a SICS server in an
|
||||
instable experimental state!</b>
|
||||
</p>
|
||||
<h2> HELP debugging!!!!</h2>
|
||||
<p>
|
||||
The SICS server hanging or crashing should not happen. In order to sort such
|
||||
problems out it is very helpful if any available debugging information is
|
||||
saved and presented to the programmers. Information available are the log
|
||||
files as written continously by the SICS server and posssible core files
|
||||
lying around. They have just this name: core. In order to save them create a
|
||||
new directory (for example dump2077) and copy the stuff in there. This looks
|
||||
like:
|
||||
<pre>
|
||||
/home/DMC> mkdir dump2077
|
||||
/home/DMC> cp log/*.log dump2077
|
||||
/home/DMC> cp core dump2077
|
||||
</pre>
|
||||
The <tt>/home/DMC> </tt> is just the command prompt. Please note, that core
|
||||
files are only available after crashes of the server. These few commands
|
||||
will help to analyse the cause of the problem and to eventually resolve it.
|
||||
</p>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user