NRAO Home  >  Green Bank  |  Wiki Topic:    GB > Data > DataQuality
   Changes | Index | Contents | Search | Statistics | Jump to Topic

Data Quality Issues

Why do we care? Programs (Python scripts, gbtmsfiller, etc.) will crash if they are expecting input data to be of a certain quality, and then those expectations are not met. It's not fair to pin the blame on a program if the input data is of insufficient quality for that program to go on its mission. Rather than placing detailed error checking separately in all those programs, we should be able to sniff out data quality and then identify whether or not we should be running those downstream programs in the first place. This will save lots of programming effort, plus, we can have a nice "data quality dashboard". Imagine a world where you can wake up in the morning, come into the office, and see just how good/bad the data was from the previous night's astronomy - then direct your operational support efforts easily?

We want to understand:


Specific Quality Issues to Check For

I started with a list that came mainly from Glen and Bob. Please add more!!!

  1. Scanlog says we're supposed to have n files... do we?
  2. Do the FITS files share a common header?
  3. Do all expected FITS files exist?
  4. Are they of nonzero size?
  5. Do # of expected scans in a procedure equal the number of scans expected from that procedure, or was there an early abort?
  6. Time consistency in FITS files. Bob G ran into this error where the ScanLog? inverted two files, and they ended up out of order.


Subj: M&C output FITS file Validator Design Date: 2003 May 15 From: Glen Langston

This note describes a tool to automatically validate the output files from a GBT observation (a SCAN). This tool is intended to aid in immediate problem diagnosis and also aid in later data reduction.

The M&C fits files produced from during an observing session usually contain all of the critical information needed to calibrate an observation. However occasionally software, network or hardware problems cause a scan to fail. Often the observer and operator will be unaware of these failures until long after an observation is complete. (Almost every session has a few bad files, often more than 10 % of all scans will have some component missing).

It is intended that this tool will run during observing sessions and periodically check the latest output files. The tool will produce messages indicating that the observing files are consistent, and will also estimate the system temperature for that most recent scan (if possible).

Because the M&C system generally produces consistent files, the occasional occurrence of bad files takes the observers and commissars by surprise. This makes later problem diagnosis difficult. Often the reduction system is blamed for problems in the GBT files. A good Validator will make the life of the entire support staff easier.

Design philosophy =============

It is difficult to accurately check for data quality without an active data reduction goal. However the data reduction goal must be sufficiently simple that it can be achieved with minimum complexity. Therefore the in addition to a simple set of file checks, the Validator will attempt to calculate a "representative" system temperature for each polarization and beam used during a single scan.

No attempt will be made to compare data from different scans, so the "validator" will NOT be aware of GO observing procedures. In particular, the system temperature numbers WILL include a source contribution.

The Validator will be a standard part of the GBT observing system.

Development Goals =============

Below are the development goals must be met for a successful Validator program:

1) The Validator will not interfere with GBT observations; a GBT observation should continue to run whether or not the Validator is running. 2) The Validator should automatically find and check the latest observations. 3) The Validator will check for the presence of all FITS files in the ScanLog?.fits file. 4) The validator will check for the presence of DATA in all FITS files 4.1) The Validator will check for consistency in some FITS file headers, including the matching SCAN numbers. 4.2) The Validator will check for DATA files in the directory that are not present in the ScanLog?.fits file. 5) The Validator will confirm that a minimum amount of data are present to calculate an average antenna pointing location. 6) The Validator produces a system temperature that represents a "median" of the integrations and channels present in a scan. In this manner RFI and sources during scans will have a minimum contribution to the system temperature values. The DCR, Spectrometer (ACS) and Spectral Processor (SP) will be supported. In the case of no Cal-On/Off switching states, no T_sys calculation will be possible, and this will be reported. 7) The Validator will write one "validator.log" which will be an ASCII file summarizing the scans. This log will be written to the project directory. 8) The observer may optionally add comments to this log.

Implementation Strategy ===================

The Validator make only one active query to the M&C observing system: the Validator will query the directory were the current observations are being placed.

The Validator will monitor all files in the observing project directory. The Validator will periodically query the M&C system for project directory changes.

The programs will be written in C using as much existing infrastructure as possible from the M&C software. The program will use existing NRAO software also, in particular including the VLBA and OVLBI software for antenna control and data reduction.

-- NicoleRadziwill - 29 Aug 2003

Topic DataQuality . { Edit | Attach | Ref-By | Printable | Diffs | r1.2 | > | r1.1 | More }
Revision r1.2 - 14 Sep 2003 - 17:59 GMT - NicoleRadziwill Content copyright © 1999-2007 by the contributing authors.
All material on this collaboration platform is the property of the contributing authors.