NRAO Home  >  Green Bank  |  Wiki Topic:    GB > Data > DataQualityNotes
   Changes | Index | Contents | Search | Statistics | Jump to Topic

Data Quality Notes


Nicole's thoughts are now on a separate page.


To view my notes from a specific day, click below:


NOTE: Most updates in relation to this topic now take place on other wiki pages.

24 Jun 2004:

After initial review of pertinent data quality issues, I can begin to elaborate on the above items.

My first goal is to combine (from current ideas) and complete an abstract model by which we can implement an all-in-one data quality monitor . Ideally, this model is to focus only on what the final result is, not how it is to be implemented. This abstract model will then serve as a guideline for implementation of our aforementioned data quality monitor .

Overall, this will require an investigation of what data quality issues are to be of relative importance in our monitor. There is strong emphasis in this project to cover all data quality issues, whereas current processes require an operator to access multiple screens to do a complete assessment of the system. The investigation will be both qualitative and quantitative in nature:

Further additions to this document will be produced as the project continues to develop.

25 Jun 2004:

The overall goal of this project is to create a quality diagnostics application that monitors the device health and analyzes the quality of the input and output data. The application will present the user with these results on a graphical display that summarizes the health status and provides a quality assessment of the data (raw and interpreted). The application will allow the user to browse all of the results, but the user will be alerted specifically to items that may require attention.

The data quality application could be built into the Astrid (Astronomer's Integrated Desktop) prototype under a Quality Diagnostics window, with nested tabs:

Each tab will clearly display alerts to the user that indicate attention may be needed for specific areas within the operation. Where alerts are present, the display will include suggestions for correcting the situation and links to any fixit scripts that may be available on the network.

The vision of this application is to provide information from the top-down, where the user is first given the summary of the quality assessment and any resulting alerts. From here, the user is furnished with the most pertinent information in regards to these alerts; the user will then be able to view lower level information by accessing the complete results of the application, which is linked to each tab.

Establishing terms used in this project (for consistency):

term definition/use
Quality Diagnostics Possible title of the application, under Astrid interface.
alert Signal to the user that attention may be necessary in order to produce high quality data from the operation. When an alert is present, it is recommended that the system is corrected before output data is analyzed.

28 Jun 2004:

I have created an image of what the quality diagnostics application might look like, refer to '/home/scratch/rduplain/QDabstract1_1.jpg'.

29 Jun 2004:

The Device Health portion of the quality diagnostics application will produce an assessment according to the following categories:

This classification is based on my conversation today with Nicole, where we reviewed a list of managers and she matched each device to a category, with the exception of the Scan Coordinator, which will be placed in the summary of this portion.

I finished a new image model of the application (rev 1.2), implementing a status color scheme, adding simple graphics, and using the categories outlined above. Refer to '/home/scratch/rduplain/QDabstract1_2.jpg' or '/home/users/nradziwi/share/QDabstract1_2.jpg'. This graphic has also been attached to the end of this document.

30 Jun 2004:

At this point, I have a very strong idea of what the final application will accomplish, and I have created an initial screenshot of how the application may look, showing the high-level content that it may include. Currently, I am focused on compiling notes to bring the application to reality, first probing the Device Health section. My short-term goals are complete the following:

Tonight, I will sit in on integration testing, where I will continue to familiarize myself with various aspects of the telescope operations and take note of key quality checks and opportunities.

06 Jul 2004:

I had a discussion with Amy Shelton today about a variety of data quality measures, but we mostly touched on the Spectrometer. She explained that the Spectrometer is able to run a simple self test that will determine whether it is working in general, but smaller issues can arise despite positive test results. She pointed out that the self test is not a complete diagnostics and that it would be difficult to fully test this device (because of its complexity), but duty cycles will provide a good means to test this device. Many errors that may occur with the Spectrometer can be pinpointed during the raw data check, so it is still possible to provide accurate alerts in regards to this device.

There are three main symptoms that occur with the Spectrometer:

Another topic within our discussion covered the Scan Coordinator, which can provide a great deal of information to guide quality checks of the entire device. The Scan Coordinator lists a subset of managers that are used in a particular observation; these managers should be queried and checked for a READY state. This will help to save time in running the final application and will help in providing a valid system check.

An additional topic of significance concerned the network health within the entire device. Some simple checks would be to ping the machines involved, especially those running on VxWorks, and ensure that the systems are running on a synchronous clock. Amy points out that the issue of network security falls in the “same category as power outages.” Although the system is highly dependent on the network, it is difficult to pinpoint all relevant network issues that may occur.

Further notes will be documented regarding device quality checks as I talk to more experts...

08 Jul 2004:

I talked to Ray Creager today about Grail and its relationship to SOAP, RPC, and the managers. SOAP (and RPC) is a protocol for data transfer that is used by the network, and Grail provides an interface for it. Grail is more or less a pipeline; it does not know specifically what each manager does. Instead, it relies on the managers to follow the current model of abstraction. Grail treats each manager independently, upon the assumption that the managers have similar interfaces.

Grail is frequently mistaken for the cause of symptoms that occur in the telescope operations, but the root cause of such symptoms is often caused by the managers.

Ray explained that there are two categories of problems that occur in relation to Grail:

  1. Abstraction of managers is not followed. <---- (Most problems occur here)
  2. Problems occur with Grail itself.

Ray continued to say that most problems correspond with #1 above and most significant issues from this category occur when:

Grail operates on the assumption that all managers will use a given set of parameters, in relation to what is described above with the managers' model of abstraction. If Grail tries to request a parameter from a manager that the manager does not use, then Grail may momentarily hang (for about 5 seconds); currently, when Grail requests a parameter, it causes a “wait” for a return (Joe is currently working on fixing this bug.)

If a manager hangs, or is unresponsive, then an error message results in Grail. It seems that nearly all diagnostic checks for Grail will result from diagnostic checks for the managers, which are under continuing investigation for this project.

Ray suggested that the quality diagnostics application should be more graphically oriented on the top-level page (summary tab) of information; text is often ignored by the user.

09 Jul 2004:

Joe Brandt provided me with some great information about the Antenna health and trends, Antenna FITS, and network health.

There are many broad symptoms that occur with the antenna, and it may be difficult to fully check the various aspects of the telescope. However, my investigation currently is focused on symptoms and their root causes, not upon how we would check for these symptoms. (Though, it is helpful to collect notes for how checks could be done.)

Problems and issues with the antenna can described as go / no go situations, antenna status, or antenna health trends.

These problems above can often be related; nonetheless they can prevent successful operation of the telescope. Hence, they will affect collected data, even though they may not be specifically aligned to our definition of data quality. It is possible to check for some of these symptoms with a software application; for example, the gateway file can be checked to ensure that a user has permission to control the antenna.

Currently, position Az/El vs commanded Az/El is “not checked rigorously,” and if it was, the check could use too many resources because current procedures use high sample rates for data. Additionally, the check would result in high error when the position is compared to a command to move the antenna. The procedure would need to have good representation in order to be easy on resources (low sample rate), and would need to account for when the antenna is “allowed” (by intention) to be moving. The position vs commanded check could be implemented for: subreflector, prime focus, active surface, and any device that operates on actuators or servos, in addition to Az/El.

Checking the antenna for following the state diagram procedure is perhaps outside the realm of this application, but problems in following the state diagram may be caused, however, by a user not having gateway permission.

A procedure to report ID feedback that will show who has been doing what in regards to the device may eventually be implemented; this may be helpful in data quality diagnostics.

The network health is another important concern in the overall operation of the telescope.

To seems to me that Joe backs stronger network monitoring procedures in order to clear up some performance issues and improve operations.

19 Jul 2004:

A few project notes collected during my trip to Charlottesville this past weekend:

Health of the IF Path:

I talked to Melinda Mello about checking the IF path and its components to ensure proper functionality. First, we need to know:

Melinda suggests that the backend port is checked first for each path, because if it seems to be working properly, then it is likely (and possibly assumed) that the IF path up to the backend is fully functional. Each device has a parameter (spectrum,exist) that will indicate if the path exists through the device. To pinpoint a problem, it is best to work your way back to determine which path is down; this is currently checked manually.

The config tool sets the path for many devices (the IF rack sets some others), and the config tool can run a check to see if things are set as commanded, i.e. “Are you what I told you to be?” -- this does not always determine a problem. Problems may arise where a user doesn't know about feedback or doesn't look at feedback or where a user sets the config tool by hand, selecting the first available paths (which can result in the wrong paths).

Other things to check in the IF area:

There are numerous devices, items, and parameters to check within the IF path, but these mentioned above are the most pertinent.

21 Jul 2004:

A new wiki page is up and running for the Quality Diagnostics application.

QualityDiagnostics

27 Jul 2004:

I talked with Mark Clark yesterday about symptoms experienced with telescope operations. Our discussion focused on the CLEO message window and the alarms that are produced. Messages are logged in: '/home/gbtlogs/Messages/'. I accessed the message log and have created a list of unique messages from January 2004 to present. These will help to identify problems that occur during telescope use and maintenance.

Each message has a unique ID number, which resembles an IP address: # . # . # . #. The ID number is set according to: device . submanager . library . messageID. The library value is typically 0, and the device numbering standard can be found in various header files. Some of the device numbering is telescope-dependent and/or site-dependent, so there are three locations for the headers.

/home/gbt1/ygor/libraries/Headers/MsgDef.h
/home/gbt1/gb/headers/GbMsg.h
/home/gbt1/gbt/headers/GbtMsg.h

Message logs are stored with twelve columns of information:


Another topic of discussion with Mark covered the Spectral Processor quick test, which is a hardware test that takes about 3 to 5 minutes, checking to see that the Spectral Processor is working as it should.
Data Flow of Spectral Processor: Analog to Digital (AtoD) ----> Boards ----> Accumulator ----> Disk

Test data is run through the Spectral Processor to compare to the expected output. According to Mark, there has only been one time when the quick test returned positive results when there was actually a problem, so it is a very safe test.

The Spectral Processor also has a RAM test to check the memory and a Board test to check the hardware boards individually. For instance, if there are 10 boards on the Spectral Processor, the test will run 10 times, each time skipping a different board. This test can be used to pinpoint if a specific board is having problems, but can take somewhere around 30 minutes.

05 Aug 2004:

A definitive sequence of devices has been established, where there is emphasis on how each device is dependent upon others; Nicole and I have organized a logical succession to check the system that is important to the observer (found here). Furthermore, the Device Health section has been expanded to Observing System Health. This section covers all of the devices that are important to an observation, in addition to the application and observing constructs that are essential to the procedure. Essentially, once the Observing System Health diagnostics have been run, then the system used in the observation can be deemed as being in "a good state." From here, the Raw Data check can be conducted.

In summary:
Two dimensions on a diagnostics matrix have been clearly defined:

Device Health has been renamed Observing System Health for the purposes of this project.

09 Sep 2004

Recent Highlights:

Here is a summary of FITS files provided by Bob Garwood:

"The actual FITS file written during an observation are in /home/gbtdata under each project ID usually with trailing _## (i.e. _01, _02, etc) to indicate an specific session.

Within that directory is one FITS file that indicates what FITS files were produced during each scan (and the start and stop times, I think). Thats called ScanLog.fits.

Underneath that directory there are a number of sub-directories. There is one for each backend used (DCR, SpectralProcessor, Spectrometer). There is an Antenna directory that contains the specific pointing information and beam information for each scan. There is an IF and LO1A (and possibly LO1B) that contain FITS files that describe the IF and LO1 settings during an observation. Those are the most important FITS files. There is also Rcvr_* directory for each receiver used that contains primarily lab measurements of the TCAL values for that receiver (the Ka receiver will contain additional files after this development cycle). Finally, there are any number of additional device FITS files depending on what was selected for that session. Mostly these are informative (e.g. the weather station information or, I think, settings on the active surface, perhaps) and not necessary for reducing the data.

All of the files associated with a given scan (spread out over the several directories) will have a name following this pattern:

YYYY_MM_DD_HH:MM:SS.fits

i.e. the timestamp with a trailing ".fits". All of the files for a given scan should have the same timestamp. The two exceptions are the Rcvr_* files - where there is one file produced from lab measurements that is appropriate to many scans (usually all of the scans in that session for that receiver) and the Spectrometer backend files, which have an additional character for the specific bank that that data came from - A, B, C, or D. That character comes right after the timestamp, e.g. 2003_11_16_07:12:56B.fits.

Periodically, the data are moved off of /home/gbtdata to the archive: /home/archive/science-data/tape-000*/"




Important documentation for this project:


E-mail me with any feedback to this project.

-- RonDuPlain - 09 Sep 2004



Time:   08:35:32
Date:   30 Aug, 2008

Topic DataQualityNotes . { Edit | Attach | Ref-By | Printable | Diffs | r1.29 | > | r1.28 | > | r1.27 | More }
Revision r1.29 - 04 Jan 2007 - 21:11 GMT - AmyShelton Content copyright © 1999-2007 by the contributing authors.
All material on this collaboration platform is the property of the contributing authors.