NRAO Home  >  Green Bank  |  Wiki Topic:    GB > Software > ModificationRequest9C706
   Changes | Index | Contents | Search | Statistics | Go

Investigate & Fix Cause of Astrid Hang-ups

Modification Request #9 (C7 2006)



1. Introduction

Astrid sessions "lock up" on users requiring Astrid and/or the turtle daemon to be restarted.

2. Background

A general problem of a high number of interactions via MySQL queries, GUI-events, and Pyro-events has been noted. This is likely to be a major factor in the problem seen to date, however we have found the hangup condition to be very difficult to replicate. Due to the complexity of the astrid application and its large number of interactions with other parts of the system, a procedural confirmation of the problem resolution would be very costly. Instead of 48-hour runs of interactive testing, our approach is to fix all problems related to or discovered during investigation which may cause deadlock. A final testing session will comfirm that our changes have not degraded the system.

3. Requirements

Reduce turtle and astrid restarts on average to less than once a week.

4. Design

We plan to reproduce hang-ups by hook or crook and walk through all suspect code. All problems and fixes will be tracked in this MR.

Summary of Problems Reported by Users:

  1. ToneyMinter reported a problem where a break statement put up a dialog box, and he answered it. The dialog box disappeared, and afterward astrid seemed to hang and was unresponsive.
  2. Eric was not able to stop his script by using the stop or abort button on astrid. The operator took control and then did an abort. This stopped the script, but then it hung the operator's astrid session. The operator then restarted the turtle daemons.

Problems Found During Investigation:

  1. EditPanel.py's OnSBItemActivated() method unconditionally creates a dialog, then conditionally destroys it. (Refactored the code to only create the dialog when needed and destroy it thereafter.) DONE
  2. The same method had condition variable which was being held too long and may have been the source of deadlock. (Refactored to release the condition mutex earlier.) DONE
  3. Investigate source of 'double' events when a SB is picked. (Fixed PopulateSBList() method by clearing out selection on project change.) DONE
  4. MySQL query of the list of projects is repeated over and over for each scheduling block. DONE
  5. Unnecessary query of script contents. DONE
  6. The LD_ASSUME_KERNEL environment variable is still in the daemon scripts. It should be removed. DONE
  7. "Unknown" dates on run tab. Caused by transition to accepting invalid scheduling blocks in the database. DONE
  8. Fix the long list of symbolic links which traverse what we thought was an unused installation. DONE
  9. Removed a unconditional 'return True' from the GFMPanel.CanAdd() method. DONE
  10. Fixed update of subsystem select in ScanCoordinatorDevice. DONE
  11. Added GetName() method to BackEndDevice. DONE
  12. Fixed situations where stdout was redirected and exception prevented restoration of stdout. DONE
  13. Test operation using VNC display. No significant differences found while using vnc, with one exception: If a user kills his vncserver with an astrid session running, astrid does not properly reset the security keys. There is no proposed fix except for the operator to reset control for the next user. DONE
  14. Make code exception proof in cases where a threading lock is set. (DONE)
  15. Fix authorization keys to check to allow previously authorized users to reconnect if astrid exit uncleanly. (DONE)
  16. A single global port is used for callbacks from Grail to Astrid. The port should be dynamically generated. (Confirmed code is correct. DONE)

Items In Work:

Proposed Fixes:

  1. When tabbing between Edit and Run panels, the project appears blank, yet the correct list of SB's appears in the SBlist.
  2. Change some dialogs to be non-modal.

Proposed Cleanups, currently considered to be out of scope:

  1. Fix printing.
  2. Upgrade/duplicate python installation.
  3. Proposed Copy scheduling block between projects via a right click menu.
  4. It is possible to create a projectId (in the Ygor sense) which is longer than 14 characters (the current limit in Ygor, will be fixed in next Ygor release).

Final steps:

  1. Remove debugging statements.
  2. Change 'monctrl' strings back to 'gbtops'.
  3. Test inter-astrid interactions and control.

5. Deployment Checklist

Normal build and installations should be sufficient. We will patch this into the system as soon as it is test in-system. A summary of user-interface changes will be compiled and sent to gbtlocal when the patch is released.

6. Test Plan

6.1 Internal Testing

Attempts to reproduce the targeted behavior on the simulator by mimicing any anecdotal evidence collected.

6.2 Sponsor Testing

Seven hours of test time is scheduled for the end of November which will consist of observing using VnC. The purpose of this testing is not to verify that the requirements have been met (not possible in 7 hours), but rather to confirm that our changes have not broken anything and the revised code may become part of the production telescope system. Only normal use of astrid and turtle can test the requiremet. Of course, the requirement can fail during the test session if a hang up occurs.

6.3 Integration/Regression Tests

See Integration test report.


Signatures

APPROVED: I acknowledge that my request is fully contained in this MR, and if the SDD delivers exactly what I specified, I will be happy.

ACCEPTED: I acknowledge that I have validated the completed code according to the acceptance tests, and I am happy with the results.

Written DONE MarkClark - 03 Nov 2006
Checked DONE JoeBrandt - 03 Nov 2006
Approved by Sponsor DONE JimBraatz - 14 Nov 2006
Approved by CCC DONE AmyShelton - 16 Nov 2006
Accepted/Delivered by Sponsor DONE JimBraatz - 8 Dec 2006

Symbols:


CCC Discussion Area

Topic ModificationRequest9C706 . { Edit | Attach | Ref-By | Printable | Diffs | r1.9 | > | r1.8 | > | r1.7 | More }
Revision r1.9 - 08 Dec 2006 - 18:09 GMT - JimBraatz Content copyright © 1999-2007 by the contributing authors.
All material on this collaboration platform is the property of the contributing authors.