skip to content
 
ASCA Guest Observer Facility

GIS3 CPU trouble

An e-mail from Dr. Maxima and the GIS team on Apr 20, 1994


From maxima@miranda.phys.s.u-tokyo.ac.jp Wed Apr 20 06:02:18 1994
Received: by legacy.gsfc.nasa.gov (5.65/DEC-Ultrix/4.3)
	id AA22983; Wed, 20 Apr 1994 06:09:42 -0400
Date: Wed, 20 Apr 94 19:05:34 +0900
From: maxima@miranda.phys.s.u-tokyo.ac.jp (Kazuo Makishima)
Message-Id: <9404201005.AA25436@miranda.phys.s.u-tokyo.ac.jp>
To: astrodteam@astro.isas.jaxa.jp
Status: RO

Dear Asca Colleagues:

************************************************************************  
* We regret that we must inform you a trouble in one of the two GIS    *
* CPU memories, of which you may have already heard from Nagase-san    *
* via e-mail. The trouble caused the pulse hieght data of GIS-S3 to be *
* edited somewhat in a wrong shape. This lasted from February 10 till  *
* April 8, for a length of two months. Below we describe the nature of *
* the trouble, its solution, its cause, its impact upon the GO data    *
* acquired in this period, and proposed action for prevention of       *
* similar problems in future.                                          *
************************************************************************

1. The Trouble

    On April 9, we were noticed by the KSC duty scientists that the
routine software for monitoring the GIS gain fails to fit the GIS-S3
isotope line. A quick investigation at U. Tokyo revealed that the
S3 pulse height spectrum is in a wrong shape: normally the spectrum
is output in 1024 (10 bits) channels covering up to 12 keV, but the
spectrum obatined exhibited events only every 8 channels. Apparently
the lower three bits of the pulse height information was fixed at
[101] (or 5 in decimal) for all the S3 events.

    A backword search indicated that the problem persisted for a
considerable length of time, without being noticed by anybody.
We now understand that the trouble happened on February 10,
between 22:05 and 22:35 UT, after passing through the SAA.
The target being observed was AWM7.

   The reason why the problem was not noticed for two months is
twofold. One is that the event rate is rather low, so that the
spectrum obtained in the KSC quick look usually contains too
few counts to reveal the problem. The other is of courese that
the trouble happened in the GO phase, when the data could not be
accessed by the hardware team.


2. Solution

   Such a trouble can be caused either by the hardware failure of
the ADC (analog-to-digital converter) or related circuit elements
in GIS-E, or by the software error.

   To discriminate between these two possibilities, on April 8
(first pass occured at JST 21:59) we performed GIS memory check
(MEMCHK), and telemetered the contents of the GIS memory
(two 32kB RAMs) down to the ground. Note that GIS MEMCHK
can be done by a single discrete command, and takes 32 seconds.
We found one errornous word in the CPU3 memory.  The word
was immediately corrected from KSC by a block command (called
RAM-patch action), and the S3 pulse height spectrum returned normal.

   We admit that another problem happened during these operations,
namely the rise time (RT) spectrum of S3 became strange. However
this is nothing to do with the hard-wired processing, and was solved
until the 5th contact (in early morning of April 9). For simplicity
we will skip on this subject today.


3. Cuause of the Trouble

   On analyzing the memory-check results, we found that the faulty
8bit word was in the program area of the CPU3 memory. It exactly
corresponds to the address pointer, pointing to the address where
lower 5 bits of the hard-wire processed 12-bit PH information
and lower 2 bits of the S3 event timing is stored. (Note that the
GIS uses 12-bit ADCs, but we always discard lower 2 bits to get
10-bit PH information.) As this word was destroyed, the CPU was
reading in some irrelevant information from some wrong (but harmless
in view of the CPU operation) address. This completely explaines
what had happened for the two months.

   The GIS memory has 4-bit error correction code (Hamming code),
and 1-bit errors must be corrected by the CPU. However if 2-bit error
occurs in a single word, the CPU will no longer operate normally.
Actually the faulty word discovered in the memory check was [C230]
in hex, while it should normally be [C201]. Therefore this is a 3-bit
error in a word. This is very likely to be caused initially by a
2-bit error, i.e. [C201]-->[C231]. Then after an automatic
Hamming-code correction, [C231] is modified into [C230].

   In summary, a 2-bit error happened in a word in the CPU3 memory
on February 10, probably within the SAA. This error did not cause
the CPU hang-up, but instead made the CPU3 to read wrong information
as to lower 5 (effectively lower 3) bits of the S3 PH information,
as well as lower 2 bits of the S3 timing information.


4. Impact upon the GO Data

   The impact of the trouble on the GO data acquired between February 10
and April 9 is twofold. First, the lower 3 bits of the PH output information
for all the S3 events are wrong. However if we bin up the 1024-ch (10 bit)
of PH into 128-ch (7 bit), the resulting spectrum is completely normal.
We therefore believe that the S3 data can still be utilized, especially
for faint sources or sources without significant spectral feature.

   Another impact is on the timing information.  The GIS events can be
tagged with up to 10 bits of timing information in non-standard operation,
and the lower 2 bits are lost due to the trouble. This do not affect any
time information in normal-mode operation, or high-time-resolution data
with timing bits equal to or less than 8 bits. As far as we are aware,
there are a few observations during the period conducted with 10-bit
time information. It has yet to be examined wheter the loss of lower 2 bits
is fatal or not to these observations.


5. Future Prevention of Similar Troubles

  It seems that 2-bit error occurs more frequently than is inferred
from the ocurrence rate of the 1-bit errors, although we do not yet
know whether this is specific to GIS or not. Since the 2-bit error
occurred on the logically adjacent bits, we suspect that it is not
due to a chance coincidence of two 1-bit errors, but rather due to
a single impact by an energetic particle. For the prevention of
similar problems in future, we will caryy out the following two
actions.

  Firstly, we have asked the operation taem to conduct the GIS
memory check typically once every day. If errors are found, the
KSC duty scientists and operators will immediately perform the
RAM-patch action to correct the errors. We believe that this has
already started at KSC.

  The other is to conduct the quick-look at University of Tokyo,
typically once or twice a week, to see any strange behavior in
the data. This means that a very small fraction of the GO data will
be stored at Univ. Tokyo, but off course, we will not do any
scientific research.

   Finally we, the GIS team, aplogize to all the colleagues
for this problem and any inconvenience that might be caused
by it. We would appreciate your thoughtful understanding.

Regards,                               April 20

K. Makishima and M. Tashiro : University of Tokyo
  (maxima@miranda.phys.s.u-tokyo.ac.jp, tashiro@miranda.phys.s.u-tokyo.ac.jp)
T. Ohahsi : Tokyo Metroploitan University (ohashi@phys.metro-u.ac.jp)
M. Ishida : ISAS (ishida@astro.isas.jaxa.jp)
and the GIS team

If you have any questions concerning ASCA, visit our Feedback form.

This file was last modified on Monday, 27-Sep-2004 16:14:12 EDT

NASA Astrophysics

  • FAQ/Comments/Feedback
  • Education Resources
  • Download Adobe Acrobat
  • A service of the Astrophysics Science Division (ASD) at NASA/ GSFC

    ASCA Project Scientist: Dr. Nicholas E. White

    Responsible NASA Official: Phil Newman

    Privacy Policy and Important Notices.