NRAO Home  >  Green Bank  |  Wiki Topic:    GB > Data > TWikiUsers > BobGarwood > GBTIDLBlankFlagIntro
   Changes | Index | Contents | Search | Statistics | Jump to Topic

Introduction to Blanking and Flagging in GBTIDL

Contents

Introduction

RFI and other faults that cause intermittent or frequency-dependent bad data make it necessary to be selective when operating on a data set. Bad data can be addressed with a combination of Flagging and Blanking. In this document we give a general description of these concepts and describe the flagging and blanking model used in GBTIDL. It is important to remember that this flagging model is new to GBTIDL in version 2.0 and we are open to suggestions on how to improve it for future versions.

Blanking

Blanking is the process of replacing spectral intensities for a given set of channels with a special value recognized by the data analysis system. When the data analysis system encounters the special value, it must know to handle the requested operation in an appropriate manner. In the case of GBTIDL, the special value is "NaN", which means "Not a Number". In general, blanked values cannot be undone easily because they are replacement values, stored as data. The only way to undo a blanked value is to return to the original source of data and reload the (possibly uncalibrated) spectra.

As an example of special handling of blanked values, consider the show procedure in GBTIDL. Show handles the special values by putting gaps in the plotted spectrum at the locations of blanked data. The stats procedure simply ignores any blanked channels in computing the statistics. The hanning procedure blanks channels in the smoothed spectrum whose constituent channels are themselves blanked. In general, procedures know how to take the appropriate action when they encounter blanked data, and the appropriate action varies depending on the procedure.

Flagging

Flagging is the process of marking, but not replacing, spectral channels, integrations, or entire scans without actually modifying the data entries. A flag is a rule that describes data to be blanked when that data is retrieved from the data source (a disk file). Since there are multiple reasons why data might be flagged, GBTIDL allows one to specify multiple flagging rules.

Flags are stored separately from the data, and can be undone easily. Once undone, the data is retrieved again from the data source and data that was previously blanked by that rule will now not be blanked (other rules may, however, cause the same data to be blanked). The most common purpose of flagging is to identify data that needs to be excluded from a calibration or averaging operation. As such, flags will be attached primarily to raw data and data that have not yet been averaged. Flag rules (flags) are applied when the data is brought in to GBTIDL (e.g. in a calibration or averaging operation such as getps, getfs, or getnod) by 'blanking' the subset of data that falls into a rule.

Once the data is in a GBTIDL data container, the flagging rules are no longer relevant. If that data container is saved to another file, the flag rules are not transferred to the new file; the spectrum in the data container has already been blanked. To recover a spectrum with no blanking, it is necessary to remove (unflag) the original flag rule and re-retrieve that data from disk (and apply any calibration, averaging, or other operation to get back to the same point).

When data requires flagging, an iterative approach to reduction is often required. Here is one approach:

  1. Calibrate the raw data.
  2. Examine the calibrated data and determine whether any flagging is required to improve calibration.
  3. If necessary, flag the offending data and return to step 1.
  4. Write a new sdfits file with calibrated data. In general, the new sdfits file should contain an entry for each integration that will be considered as a candidate for the average.
  5. When all data are calibrated and written to disk, specify the calibrated data file as the new source of input.
  6. Again examine the data and use the flagging procedures to mark residual bad data to exclude from the average.
  7. Average the data.
  8. Examine the average and, if necessary, return to step 1 or step 5 and modify the flagging commands as necessary.
  9. Proceed with analysis of the averaged spectrum.

Because of the iterative nature of the process, it is possible to set and unset flagging commands for a given data set. It is important to emphasize that blanked data are not recoverable without going back to data retrieval, but flagged data are recoverable. Flagging (setting flag rules) allows you to iteratively decide which data should be blanked during processing.

Several types of parametrization are possible in flagging data. Following the scheme used throughout the GBTIDL data model, flags are parameterized according to: scan number, integration number, polarization number, IF number, feed number, and channel number. A second parametrization is available as well: data can be flagged according to record number (location within a file) and channel number. It is permissible to mix these parametrizations in a single flag file, if desired. The data I/O system in GBTIDL applies the flags, blanking data as appropriate (some control over which flags are applied is possible, as described later in this document). Averaging, analysis, and display procedures in GBTIDL take the appropriate action when blanked data are encountered.

Flagging is intended mainly for uncalibrated and pre-averaged data. However, it is not forbidden to flag calibrated, averaged data. The user should use caution in such cases because the header parameters used in the parametrization of flags can be changed during averaging operations. For this reason, when flagging averaged data it is generally best to flag by record number. Flagging by record number also offers a finer level of detail. The select procedure can be useful in conjunction with flagging by record number when the normal flag procedure isn't sufficient (this is described in more detail later in this document).

In the iterative flagging scheme outlined earlier in this section, flagging in Step 3 should be parametrized by scan, polarization, etc. while flagging in step 6 should be parametrized by record number.

Using Blanking in GBTIDL

Blanking works on data in memory, and simply involves replacing data values with NaN's. There are two ways to apply blanking to a spectrum. First, blanking is applied automatically by data retrieval operations (get, getchunk, getfs, getnod, etc) when a data file has a set of flag rules defined. Second, it is possible to use the /blank keyword in the replace procedure to add blanking to a spectrum already in a data container. The replace procedure has the following structure:

  replace, bchan, echan, /zero, /blank

Note that the default behavior of replace is not to blank, but to interpolate the data values. More sophisticated blanking procedures can be implemented on top of replace in a straightforward way, if the user wishes to blank data based on selection criteria.

After blanking is applied, all other GBTIDL analysis procedures handle the blanked data appropriately, and no additional user interaction is necessary.

Using Flags in GBTIDL

Flag rules (flags) can be set from the command line with the procedures flag and flagrec. These procedures generate entries in the flag file associated with the current data source. The flag procedure has the following syntax:

  flag, scan, intnum=intnum, plnum=plnum, ifnum=ifnum, fdnum=fdnum, 
        bchan=bchan, echan=echan, chans=chans, chanwidth=chanwidth, 
        idstring=idstring, scanrange=scanrange, /keep

and the flagrec procedure has the following syntax:

  flagrec, record, bchan=bchan, echan=echan, chans=chans, chanwidth=chanwidth,
        idstring=idstring, /keep

One uses idstring to associate with a rule an identifying string that is typically a reminder of the reason for the flag.

Examples:

The following example shows how to flag a channel range for a small number of scans and integrations. Note that either the scan parameter or scanrange keyword is required but both can not be used at the same time. Otherwise, if a parameter is not specified, "all" is assumed. So in this example, all polarizations are flagged:

  flag, [18,19,20], int=[1,3], bchan=512, echan=514, idstring="RFI"

Equivalently, using the scanrange keyword:

  flag, scanrange=[18,20], int=[1,3], bchan=512, echan=514, idstring="RFI"

The next example shows how to flag all channels for a given integration in one scan:

  flag, 15, int=3, idstring="spectrometer glitch"

The next example shows how one could flag all data for the given three scans:

  flag, [101,105,107]

The next example shows how one might flag a record in a processed data file (a "keep" file):

  flagrec, 15, idstring="Glitch", /keep

The next example shows how mone might flag different channel ranges for a record:

   flagrec, 16, bchan=[0,100], echan=[10,110], idstring="Two RFI Spikes"

The next example flags the same channel ranges:

   flagrec, 16, chans=[5,105], chanwidth=11, idstring="Two RFI Spikes"

The select procedure can be used along with flagrec to provide even more flexible flagging. In this example, the "RR" polarization of IF number 3 for all data with the source name "Orion" is flagged in channels 500 to 520.

   emptystack  ; clear the stack first
   select, source='Orion', polarization='RR', ifnum=3 ; populate the stack
   flagrec, astack(), bchan=500, echan=520, idstring='RFI-Orion'

Note that there may be more than one flag associated with a given idstring. If idstring is not specified in the flag or flagrec calls, it defaults to the string "unspecified".

Listing Flags

Use listflags to list all of the flags or only those flags having a specific idstring. The default listflags output shows all flags in their entirety, but the format sometimes is difficult to read. The /summary keyword to listflags aligns the columns but in order to do that, it may truncate the information in a particular column and so not all of each flag's information may be shown. Examples:

  listflags, 'RFI'    ; shows the flag information associated with
                      ; the 'RFI' idstring
  listflags, /summary ; shows all flags with the information
                      ; aligned by column (and possibly truncated)

To list all of the unique idstring values in the flag file use the listids command.

Example flag lists

If one executes the flagging command:

   flag, [35,36,37], int=[1,3], bchan=512, echan=514, idstring="RFI"

the listflags output will look like this:

#ID,RECNUM,SCAN,INTNUM,PLNUM,IFNUM,FDNUM,BCHAN,ECHAN,IDSTRING
0 * 35:37 1,3 * * * 512 514 RFI

The first line of the output identifies the contents of each column. Most of these fields are self-explanatory. The first field is an ID number that is assigned dynamically and is simply the location of that flag rule in this list. The ID number can be used in the unflag procedure to remove a flag rule.

Flagging a few more scans, not in a nice sequence:

   flag, [40,42,44,47,48,50,56], int=[1,3], bchan=512, echan=514, idstring="More RFI"

adds one new line to the listflags output:

#ID,RECNUM,SCAN,INTNUM,PLNUM,IFNUM,FDNUM,BCHAN,ECHAN,IDSTRING
0 * 35:37 1,3 * * * 512 514 RFI
1 * 40,42,44,47,48,50,56 1,3 * * * 512 514 More RFI

And listflags,/summary produces this output:


#ID  RECNUM        SCAN    INTNUM PLNUM   IFNUM FDNUM  BCHAN  ECHAN     IDSTRING
  0       *       35:37       1,3     *       *     *    512    514          RFI
  1       *  40,42,44,+       1,3     *       *     *    512    514     More RFI

Notice how the scan information is truncated. Fields that contain more information than what is shown end in a "+". Also note that asterisks ("*") indicate all values for that parameter are flagged (as in the unformatted listflags output).

The second column, RECNUM, will only be set if flagrec is used. For example:

   flagrec, 15, bchan=0, echan=8, idstring="bad channels"
   listflags

#ID,RECNUM,SCAN,INTNUM,PLNUM,IFNUM,FDNUM,BCHAN,ECHAN,IDSTRING
0 * 35:37 1,3 * * * 512 514 RFI
1 * 40,42,44,47,48,50,56 1,3 * * * 512 514 More RFI
2 15 * * * * * 0 8 bad channels

Undoing Flags

Flags can be unset using the unflag procedure. The unflag procedure takes a single parameter, id, and it removes all flagging commands that have that id, where id can either be a string matching an idstring value or an integer matching an ID number as shown by listflags.

  unflag, id

If you want to re-flag that same data, you have to reissue the flag or flagrec commands. The id parameter can be either a scalar or an array, to unflag multiple entries at once.

Unflagging by ID number is simple and appealing but users should be familiar with the following very important feature. Since the ID number is generated dynamically, it changes after each flagging-related command, including the unflag command. Users should always use listflags before each use of unflag to be sure that they are using the appropriate ID value. Consider this example:

   listflags

#ID,RECNUM,SCAN,INTNUM,PLNUM,IFNUM,FDNUM,BCHAN,ECHAN,IDSTRING
0 * 35:37 1,3 * * * 512 514 RFI
1 * 40,42,44,47,48,50,56 1,3 * * * 512 514 More RFI
2 15 * * * * * 0 8 bad channels

  ; you want to unflag the last 2 IDs, so you might try the following:
  unflag, 1
  unflag, 2
% FLAGS::UNFLAG_ID: ID could not be found to unflag:        2
The error happens because the first unflag causes the remaining two flag rules to be renumbered to 0 and 1, and so there is no ID 2 to unflag any more. This would have been a more dangerous, silent error had there been more than 3 rules to begin with.

The correct way to unflag the entries:

    listflags
    unflag,1
    listflags
    unflag,1
or:
    listflags
    unflag,[1,2]

Using Flags in GBTIDL Data Retrieval and Averaging Procedures

Flags are applied by the data I/O subsystem when data are retrieved from disk. All of the data retrieval procedures in the GUIDE layer of GBTIDL (including calibration procedures such as getnod and getfs that do data retrieval as part of their operation) use the I/O subsystem, and so flags are applied whenever you get data from disk.

All of these procedures allow you to fine tune which flag rules are actually applied via the useflag and skipflag keywords. The default is to use /useflag, meaning that all flag rules are applied. You can turn off all flagging by using /skipflag. In that case, no data will be blanked by the data retrieval process. You can also apply or not apply some of the flags by referring to them by their idstring. You can not use both the useflag and skipflag keywords in the same call. Unlike unflag, you can not selectively skip or use flag's based on their ID number - only the idstring can be used as an argument to these keywords.

Examples

  getnod, 15                         ; apply all flags
  getnod, 15, /skipflag              ; do not use any flags
  getnod, 15, useflag="RFI"          ; only use the "RFI" flag
  getnod, 15, useflag=["RFI","wind"] ; use "RFI" and "wind" flags only
  getnod, 15, skipflag="RFI"         ; use all flags EXCEPT "RFI"

All of the standard procedures in GBTIDL that in turn use these procedures also have the useflag and skipflag keywords.

Weighting Issues not Addressed by this Flagging Scheme

The weighting of data during averaging operations is a calibration issue as much as a flagging issue. So, weighting is not fully addressed in this document. However, the user should be aware of some potential issues.

Consider two reduced spectra, A and B, which resulted from an average of flagged data. In each of the two spectra, the individual channels have been flagged to different extents, so the final noise in each channel differs depending on how much of the raw data were flagged going into the average. For example, channels 0-10 in A may have been heavily flagged prior to averaging, and so they contain a higher noise than the other channels in A. If the observer then wishes to average A and B, the weighting in the average will be wrong because relative weights have not been stored for these spectra on a channel-per-channel basis.

Format of the Flagging File

The flag file associated with a given sdfits file (e.g. "myfile.fits") has the ".flag" suffix ("myfile.flag"). If no flag file is associated with a given sdfits file then it is assumed that none of the data in that file are to be flagged. Flag files are ASCII files with one flagging rule per line. Each line will have the following entries:

records, scans, integrations, polarizations, IF's, beams, bchan, echan, idstring

The flagging file simply represents the parameters used in the flag or flagrec procedures. Note that the parametrization has some redundancy. If the user flags by record number, then the set of [scans, integrations, polarizations, IF's, beams] is not necessary. So when a record number or numbers are specified in a flag entry, this set of numbers is ignored even if they contain non-wildcard values (that should never happen through normal use of flag and flagrec).

Users should not edit the flag file directly. It is okay to remove a flag flag to eliminate all flagging for a given data file, but GBTIDL should be restarted to ensure the removal of the flag file is incorporated.

Future Flagging Commands

It is possible to develop more sophisticated blanking and flagging procedures on top of flag and flagrec. For example, procedures that automatically detect and flag bad data could be developed. We could provide an autoflag procedure:

  autoflag, bflag, eflag, chan_clip, int_rms, idstring

Such a procedure would use the bflag and eflag parameters to flag channels at the beginning and end of all spectra. The chan_clip would flag, within each integration, channels that exceed the rms by the specified value, and the int_rms would flag integrations whose rms exceed the given value. One could imagine this routine becoming more sophisticated, if necessary.

Another flagging mode would use a visual aid, in the style of the classic AIPS flagger. A dynamic spectrum image could be used to represent the data, and the user could then interact with the image to generate flagging commands. Before this tool is developed, some assessment of its relative merit should be made.

Another flagging procedure would allow the user to specify flags according to the frequency or velocity, rather than channel number.

Topic GBTIDLBlankFlagIntro . { Edit | Attach | Ref-By | Printable | Diffs | r1.12 | > | r1.11 | > | r1.10 | More }
Revision r1.12 - 11 Dec 2006 - 15:46 GMT - AmyShelton
Parents: TWikiUsers > BobGarwood
Content copyright © 1999-2007 by the contributing authors.
All material on this collaboration platform is the property of the contributing authors.

Data.GBTIDLBlankFlagIntro moved from Main.GBTIDLBlankFlagIntro on 24 Apr 2006 - 14:47 by BobGarwood - put it back