gbtmsfiller. In the future, data streams from the control system will be combined, according to the science data model, to form export data suitable for archiving.)
Post-Processing: Any manipulation of the data that is done after the raw data is collected, such as calibration, averaging, etc. Post-processing changes the nature of the data, whereas pre-processing and data capture do not.
Post-Processing Blanking (also called Masking): Discarding data during the offline analysis process. In the latter case, blanked values cannot be undone easily because they are replacement values, stored as data. The only way to undo a blanked value is to return to the original source of data, and in the case of a spectral line dataset, reload the (possibly uncalibrated) spectra.
Processed Data: A data file that has been transformed from the original raw state. For example, a calibrated representation of a data set would be considered a processed file. Reading and rewriting a raw data file row-for-row is not considered a processed file. The reason to distinguish processed data files is that the user may wish to take a first pass at calibrating the raw SDFITS data, and write out the result of that first step as a new data file -- the "processed data". Then the processed data could be inspected and used for flagging and additional averaging.
Record: Describe this without defining SDrecord; in other words, talk about its content and not its firm structure.
Flagging: Flagging is the process of marking, but not replacing, various data objects without actually modifying the entries in the exported/archived data. Flags are stored separately from the data, and can be undone easily. The primary purpose of flagging is to identify data that needs to be excluded from operations within the data analysis software or automated pipeline, such as calibration and averaging. As such, flags are attached primarily to raw data and data that have not yet been averaged. For spectral line data, flagging can be performed with respect to various parameters, such as spectral channels, integrations, or entire scans.
It is possible to carry flagging information along through the entire data analysis process, even after averaging (or other operations) have occured. It is not possible to "reverse" later stages, but the flag data can indicate that at least some data which contributed to this data-point was flagged. This could be useful - e.g. to indicate errors in the averaging algorithm. If all the "funny" processed data is associated with data for which some input was flagged, you might get suspicious. We should consider if we want to do this.
When data requires flagging, an iterative approach to reduction is often required. Here is a typical scenario:
QualityLog.fits files that contain quality information which can be used by GFM or data capture wherever necessary. NMR
Requirements for storage?
QUALITY column, or related, in the SDFITS file. NMR
Requirements for storage?
| Description of Analysis Procedure | Default Handling of Blanked Data | Options |
|---|---|---|
| Add | When a blanked and non-blanked value are added, the result will be ??? | The user should be able to specify an optional parameter for ??? which changes the default behavior to ??? |
replace, bchan, echan, /zero, /blankNote that the default behavior of "replace" is not to blank, but to interpolate the data values. More sophisticated blanking procedures can be implemented on top of "replace" in a straightforward way, if the user wishes to blank data based on selection criteria. After blanking is applied, all other GBTIDL analysis procedures will handle the blanked data appropriately, and no additional user interaction is necessary. note what As an example of special handling of blanked values, consider the "show" procedure in GBTIDL. "Show" will handle the special values by putting gaps in the plotted spectrum at the locations of blanked data. The "stats" command may apply blanking by simply ignoring any blanked channels in computing the rms. The "hanning" command will blank channels in the smoothed spectrum whose constituent channels are themselves blanked. In general, a procedure will have to know how to take the appropriate action when it encounters blanked data, and the appropriate action varies depending on the procedure. replacing spectral intensities for a given set of channels with a special value recognized by the data analysis system. When the data analysis system encounters the special value, it must know to handle the requested operation in an appropriate manner. In the case of GBTIDL, the special value is "NaN", which means "Not a Number". Not saying it isn't worthwhile, but the above seems to simply replicate the purpose of flagging. Is the intent of supporting what is described above to make operations run faster, or cut down on storage requirements, or what? I had assumed that this level of blanking would provide an interpolated value, so that e.g. plots could be produced without auto-scaling being thrown off by RFI points, whereas flagging would show "gaps" in the spectrum. We should clarify the desired behavior.
flag, scan, intnum=intnum, plnum=plnum, ifnum=ifnum, fdnum=fdnum,
bchan=bchan, echan=echan, idstring=idstring
and the flagrec procedure will have the following structure:
flagrec, record, bchan=bchan, echan=echan, idstring=idstring
Examples:
The following example shows how to flag a channel range for a small, select number of scans and integrations. Note that the scan parameter is required, but otherwise if a parameter is not specified, "all" is assumed. So in this example, all polarizations are flagged.
flag, [18,19,20], int=[1,3], bchan=512, echan=514, idstring="RFI"The next example shows how to flag all channels for a given integration in one scan: flag, 15, int=3, idstring="spectrometer glitch"The next example shows how one could flag all data for the given three scans: flag, scans=[101,105,107]Finally, the next example shows how one might flag a record in a processed data file: flagrec, 15, idstring="Glitch" |
autoflag, bflag, eflag, chan_clip, int_rms, idstringSuch a procedure would use the bflag and eflag parameters to flag channels at the beginning and end of all spectra. The chan_clip would flag, within each integration, channels that exceed the rms by the specified value, and the int_rms would flag integrations whose rms exceed the given value. One could imagine this routine becoming more sophisticated, if necessary. Another flagging mode would use a visual aid, in the style of the classic AIPS flagger. A dynamic spectrum image could be used to represent the data, and the user could then interact with the image to generate flagging commands. Before this tool is developed, some assessment of its relative merit should be made. Another flagging procedure would allow the user to specify flags according the frequency or velocity, rather than channel number. I think it would be good to describe how to flag data on the basis of the antenna being off source here. Also, need to describe somewhere how header data will be modified correctly.
unflag, idstring
records, scans, integrations, polarizations, IF's, beams, bchan, echan, idstringThe flagging file simply represents the parameters used in the flag or flagrec procedures. Note that the parametrization has some redundancy. If the user flags by record number, then the set of [scans, integrations, polarizations, IF's, beams] is not necessary. So when a record number or numbers are specified in a flag entry, this set of numbers is ignored.
For example, to list all the flags with idstring="RFI" one could do this from the unix prompt:
grep RFI mydata.flagThere will also be tools in the gbtidl environment to list and query flags as follows: listids : list all the idstrings in the flag file attached to the input dataset listflags, idstring : list the flag details associated with all entries matching the given idstring |
getnod, 15, /noflagAll procedures that retrieve data from disk, notably all the get* family of procedures, will need to have the /noflag option. The averaging procedures will also need the /noflag option.
If one executes the flagging command:
flag, scan=[18,19,20], int=[1,3], bchan=512, echan=514, idstring="RFI"the flag file will have the following entry: *,[18,19,20],[1,3],*,*,*,512,514,"RFI"Note that the first value is a "*" to indicate that this flag entry is not parametrized by record number. When the first element in a flag entry is an integer or a list of integers, that entry is understood to be parametrized by record number. So, if one executes the following flagrec command: flagrec, record=15, bchan=0, echan=8, idstring="bad channels"the flag file will have the following entry: 15,*,*,*,*,*,0,8,"bad channels" |
| Topic FlaggingAndBlanking . { Edit | Attach | Ref-By | Printable | Diffs | r1.15 | > | r1.14 | > | r1.13 | More } |
| Revision r1.15 - 20 Jan 2006 - 16:21 GMT - BobGarwood |
Content copyright © 1999-2007 by the contributing authors. All material on this collaboration platform is the property of the contributing authors. |