NRAO Home  >  Green Bank  |  Wiki Topic:    GB > Data > FlaggingAndBlanking
   Changes | Index | Contents | Search | Statistics | Jump to Topic
Items in red are unanswered questions that we need to resolve.

Previous Version of Document



Data Editing for GBT Standard Observing Modes

1. Introduction

It is often necessary to be selective when analyzing a data set, that is, to account for missing, bad, or irrelevant data prior to applying algorithms. Data quality issues can be caused by collecting data when it should not be collected (e.g. the antenna is off source, or a switching signal is transitioning), by radio frequency interference (RFI), or other faults that produce intermittent time or frequency-dependent bad data.

Policy and requirements for both automatic and manual data editing are critical not only for today's scientists and data analysis, but also in support of a pipeline heuristics system that will guide the production of high-quality data from a future GBT postprocessing pipeline. To obtain science data products which are of the highest quality, pipeline production requires that invalid data is handled appropriately and automatically, and that good data is not thrown out. This process can be complicated because data quality objectives will vary depending upon frequency and observing mode, observer's intent, and other factors. As a result, the scientist's constraints on a Scheduling Block, pipeline heuristics, and the flagging/blanking strategy are closely intertwined.

The purpose of this document is to define the requirements, and outline design issues and constraints for the implementation of data editing. Once complete, this document will be formally reviewed. Critical success factors for this document are as follows:

In this document we give a general description of the concepts associated with data editing as they pertain to the GBT. The attached document, "Design of Flagging and Blanking for GBTIDL," provides the model to be employed in GBTIDL for the final stage of the process where the researcher manually generates flags to use with the processed data.

Automatic determination of the flagging criteria, depending on the data quality and project's goals, is an important long term goal but beyond the scope of this analysis. Similarly, this analysis does not currently address data editing for VLBI, radar, or pulsar data.

2. Workflow through GBT System

The workflow for flagging and blanking through the GBT will be very similar to that for the EVLA and ALMA, with one exception: the GBT currently uses a file-based control system, as opposed to operating a full data capture process. In the steps below, "data" refers to one of the data objects referred to in Section 1.3.

  1. Control System Blanking: The M&C system determines that data is unusable for all scientific intentions, and should not be stored as raw data.
  2. Pre-Processing Flagging: The M&C system determines that an error is present, but the data should still be stored as raw data. These flags are used by the quick look system.
  3. Data Capture/Pipeline-Driven Flagging: The data capture process determines that an error is present, but the data should still be stored.
  4. User Flagging/Blanking: The researcher determines that there is an issue with the data, and attaches appropriate flagging information to the exported data, or alternatively eliminates sections from the export data which are unusable for his or her scientific intent.

INSERT SCHEMATIC HERE

3. Storing Data Quality Information

4. Impacts to Analysis Procedures

One of the main drivers for masking and flagging is to be able to correctly average data together. For example, when flagging data on the basis of whether the antenna is on source, the integration time must be adjusted appropriately. Then, when different integrations are averaged together, this must be done with the correct weights (proportional to Tsys*sqrt(integ time)) applied. This section describes how ancilliary data (effective integration times, weight arrays, etc.) will be modified when data editing is performed, and how data processing operations will correctly take this into account.

Who can put in additional info?

5. Other Schemes

Describe how other telescopes/packages handle flagging and blanking, pros & cons, & why we need/don't need

AO/Parkes, JCMT, AIPS++/DISH, AIPS, ALMA SDM, EVLA, VLBA

6. Scenarios

7. Implications for Systems Development

  1. At some point, the control system will need to be modified to allow data taking while the antenna is activating, or alternatively, without requiring scan-based synchronization between the GBT's component devices. This is so that off-source data, which will be valuable for many types of observing, is not lost. Additionally, some high-frequency observations will only require seconds on each source, and the observe time will be largely dominated by latency in the scan startup and slew times. At these times, valuable information could be collected if individual devices could collect and store data.
  2. A general system should be put in place so that quality checks can be performed on any device and/or observation, and the results stored along with the raw data. This should be externally configurable so that project scientists can own the data checks, not softare engineers. This means that the project scientists will be able to fine tune their quality checks without any software development work (consider a generalized case of ModificationRequest12C705). Additionally, this quality information should be accessible not only to quick look but to any downstream applications.
  3. This general system should somehow be able to apply user-defined data quality rules in addition to rules defined by the project scientist.



Design of Flagging and Blanking for GBTIDL

This design information for flagging and blanking in GBTIDL assumes the following:

4. Examples

If one executes the flagging command:

   flag, scan=[18,19,20], int=[1,3], bchan=512, echan=514, idstring="RFI"

the flag file will have the following entry:

*,[18,19,20],[1,3],*,*,*,512,514,"RFI"

Note that the first value is a "*" to indicate that this flag entry is not parametrized by record number. When the first element in a flag entry is an integer or a list of integers, that entry is understood to be parametrized by record number. So, if one executes the following flagrec command:

   flagrec, record=15, bchan=0, echan=8, idstring="bad channels"

the flag file will have the following entry:

15,*,*,*,*,*,0,8,"bad channels"

5. Issues not Addressed by the GBTIDL Scheme

The weighting of data required by averaging operations is a calibration issue as much as a flagging issue. So, weighting is not fully addressed in this document. However, the user should be aware of some potential issues.

Consider two reduced spectra, A and B, which resulted from an average of flagged data. In each of the two spectra, the individual channels have been flagged to different extents, so the final noise in each channel is different depending on how much of the raw data were flagged going into the average. For example, channels 0-10 in A may have been heavily flagged prior to averaging, and so they contain a higher noise than the other channels in A. If the observer then wishes to average A and B, the weighting in the average will be wrong because relative weights have not been stored for these spectra.

I think this document has to somewhere address weighting, and the correct handling of other header information (e.g. integration times). The first version of GBTIDL doesn't necessarily need to implement everything in the design, but the design needs to be thought through.

-- JimBraatz - Revised 12 Oct 2005 -- NicoleRadziwill - 18 Oct 2005

Topic FlaggingAndBlanking . { Edit | Attach | Ref-By | Printable | Diffs | r1.15 | > | r1.14 | > | r1.13 | More }
Revision r1.15 - 20 Jan 2006 - 16:21 GMT - BobGarwood Content copyright © 1999-2007 by the contributing authors.
All material on this collaboration platform is the property of the contributing authors.