SDFits Changes - Add target position and CALTYPE keywords to sdfits and other requested sdfits changes
The GBT spectrometer records correlation data for both auto-correlation (correlations between the same sampler) and cross-correlations (correlations between different samplers). In analyzing the output from the GBT spectrometer in cross-polarization mode, Robishaw and Heiles (GBT mem #244) made three recommendations for changes in the sdfits tool and it's output to ease reduction of the cross-polarization data taken with the GBT. The first of the recommendations, which involves the calculation of the sdfits spectra, is implemented in ModificationRequest2C706 although they recommend one change to that MR which is described here. The other two requests are handled here. Some additional recommendations have arisen in e-mail conversations with Robishaw. We also include one long standing sdfits request in this MR.
These requests are from the Robishaw and Heiles GBT memo 244 The Proper Production of Full-Stokes Spectra Using the Fully-Functioning Full-Stokes Mode of the GBT Spectrometer:
- In the current SDFITS, the requested source position are not stored in SDFITS. This is a huge problem for any observing that uses position offsets. For example, for the Spider scan type, the user specifies a source, say 3C286, and the source’s position in the specified equinox is tracked. As this is happening, the telescope is driven through the source position in such a way as to create a straight line in a specified reference frame. Now, in order to analyze the polarized beam properties, we need to calculate the offsets in azimuth and elevation from the source position. However, all that SDFITS provides is the telescope’s current pointing direction; we cannot calculate the offsets because the source position is not provided. This is not peculiar to the Spider scan: many people make maps in an offset mode.
- In the current SDFITS, there is no indication of whether the high or low cal is used. We live and die by whether the high cal is set since we do not rely on the winking cal. We would really prefer if a binary keyword named HIGH CAL were stored in SDFITS. This can be set based on the value of the HIGH CAL header keyword of the IF instrument FITS file. It’s certainly been the case that we have accidentally observed with the low cal set. It might be true that we have access to the band-averaged cal value stored in the variable TCAL in SDFITS, but to know whether the high cal was set for any receiver based solely on this value would require a priori knowledge of what the typical values of the high cal are as a function of frequency; for instance, when observing at the high-frequency end of X band, the high-cal temperatures are not very much larger than the low-cal temperatures: how would we know from the TCAL value alone whether the high cal was set? Moreover, the band-averaged cal temperature might not be an appropriate value for a particular observing method and the user might wish to access the cal tables directly; in this case, we would need to know whether the high cal was set.
These cover additional details revealed through e-mail exchanges with Robishaw in addition to the primary points raised in the Robishaw and Heiles memo.
- The GO FITS values for the target position need to be added. This should be generalized to use the target position in whatever form that it appears in the GO FITS file (e.g. GLON, GLAT). A separate MR describes a fix to a problem with the RA as stored in the GO FITS files written by Turtle also noticed by Robishaw and Heiles. The MR described here will be implemented assuming that that other MR has also been implemented. For tracking scan and for an spider scan, the GO FITS value for the target position is probably the source position. For a set of mapping scans, it may be the center of the region being mapped. "Target" seems like a more comprehensive word than "source" to describe the positions found in the GO FITS files.
- It must be possible to recover the value of the HIGH_CAL keyword from the SDFITS file. Other receivers have more complicated cal options (e.g. the Ka receiver has 3 possible options for firing the noise diodes). So, we will use a string keyword named CALTYPE to represent this information. The only recognized values at this time will be "HIGH" and "LOW" corresponding to the value of the HIGH_CAL column in the IF_FITS file (1 corresponds to "HIGH" anything else corresponds to "LOW"). A value of "HIGH" means that the high cal was use and a value of "LOW" means the high cal was not used. For any given switching state, the existing value of CAL should be used to determine whether the cal was firing for that row in the SDFITS file.
- ModificationRequest15C306 describes how the lag dropouts are handled and optionally "fixed". The original implementation treated the zero-lag and the low (<1024) lags in the cross-correlation case differently from the auto-correlation. Robishaw and Heiles have requested that they be treated the same.
- A "-timestamp" argument will be added to sdfits. This is necessary to help deal with inadvertent duplicate scan numbers within the same project directory. Currently, there is no way to avoid filling all data from all scans with the same scan number. If the scan number is reset during an observing session (as has happened at least once to Robishaw and Heiles) then it is awkward to fill the data from just one scan or to append new data without also getting a second copy of the old data. It would be useful if a range of timestamps could be entered. Timestamps here will only be used to compare against the timestamp encoded in the raw file names for each scan. They will not be used to select data from within a given scan. This is the same usage that the TIMESTAMP field has in the GBTIDL data container and selection.
- The OBSID will be added to the sdfits index. This value is already present in the sdfits files. A user has requested that this be added to the index so that selection can be done based on this value (GBTIDL bug number 1531901).
- The target position should also be added to the index so that selection can be done on those quantities using the same format as used for the existing LONGITUDE and LATITUDE fields.
- The documentation for the sdfits tool needs to be improved. A complete description of how sdfits arrives at each value in the output SDFITS file must be included in that improved documentation. Failure to fully finish the documentation improvements should not delay the release of the code changes described here if they are ready before the documentation work is done.
Both CALTYPE and the source position items need to also be available in the GBTIDL data container. GBTIDL will also need to be modified to understand and, as necessary, re-generate the index file with the two changes described here. The GBTIDL changes are described in ModificationRequest8C706.
4.1 GO FITS Position Info
The position information recorded in the GO FITS file may be a source position or a center of a mapping region. The phrase "target position" seems more appropriate here than "source position". That usage is also consistent with that in the GO FITS file documentation. In order to fully convey that information, the SDFITS file must contain the longitude and latitude of the target position, an indication of the coordinate system, and possibly also an EQUINOX to fully describe that coordinate system. Indications are that the coordinate system and equinox found in the GO FITS file is equivalent to that used for the antenna positions already stored in the output SDFITS file. So, the only missing information is the actual target position. The information from the GO FITS file is gathered in ScanData.py. The target longitude and latitude positions need to be gathered and made available to the SDFITSWriter.py code. Checks should be added to make sure that the coordinate system and equinox found in the GO FITS file match that found Antenna FITS file for the MAJOR and MINOR columns and a warning printed out if they are not the same. The following new columns will be added to the SDFITS file:
- TRGTLONG the target longitude coordinate value (in degrees) copied from the GO FITS file.
- TRGTLAT the target latitude coordinate value (in degrees) copied from the GO FITS file.
- RADESYS It was noticed in examining the current code that apparent RA, DEC coordinates are not being fully indicated as such in the output SDFITS file due to this missing column. This column is only important for RA, DEC coordinates. It has 3 possible values: "FK5", "FK4", and "GAPPT". When not used, it will contain an empty string. The value comes from the keyword of the same name in the GO fits file.
4.2 CALTYPE column for SDFITS
A CALTYPE column will be added to the output SDFITS. The existing PortTable.py code has methods to determine the HIGH_CAL value by sampler name (bank and port). That simply needs to be used in SDFITSWriter.py in filling this column. The code should check that the HIGH_CAL value is the same for both samplers in the cross-correlation case and a warning printed out if they are not. If HIGH_CAL is 1 then then value of CALTYPE is "HIGH" otherwise it is "LOW".
The cross-correlation specific code in the bad-lags fix will be altered from the aips++ filler (GBTACSTable::checkForDiscontinuities) and the sdfits filler (gbt/lib/spectrometer/src/VanVleck.cpp::checkForDiscontinuities). The code will still look for bad 1024-lag blocks in the first 1024-lags, but when found, it will mark the entire result as bad and not attempt to "fix" the bad lags. This is similar to what is done in the auto-correlation case. In the auto-correlation case, the zero-lag must be tested to determine if it is within the expected limits. If it is not, then the van Vleck correction would fail, mathematically, if that zero-lag were used. In that case, the entire auto-correlation result is replaced with the blanked value (not a number, NaN). The fixbadlags code assumes, in the auto-correlation case, that if the zero-lag is within the expected limits that the entire first 1024-lags are also good. Since the zero-lag in the cross-correlation case does serve the same role in the van Vleck correction, it is not tested. So, the equivalent test in the cross-correlation case is to look for an offset in the first 1024-lags as described in the previous fixbadlags MR. This code will now be altered so that instead of attempting to fix those bad lags the entire cross-correlation result will be replaced with the blanked value if the first 1024-lags are identified as bad. This test will be done in all cases (as is the auto-correlation test). When a log file is produced, this information will be logged to that file to indicate that bad data was found but not fixed.
A new -timestamp argument will be recognized by sdfits. This argument needs to work with -scans (when supplied) to limit the actual scans used to only those that satisfy this new argument (when not supplied, all scans subject to the optional -scans argument will be used). The syntax is described below. This will require changes in a number of locations since the current code simply uses scan number to step through the data. Instead, the timestamp should be used throughout where scan number is used now. The user will still refer to items by scan number through the -scans argument but internally each scan number needs to be converted to one (or more) timestamps and the timestamps need to be used to process the data from each scan. GFM also gets confused when duplicate scan numbers are present. The listing of scans is correct on the left, but when a user selects a scan that appears more than once, the display shows data from the first instance of that scan number. The changes needed for sdfits to use the -timestamp argument should be directly usable by GFM to eliminate that confusion.
We will now describe the changes needed, organized by API and Applications:
4.4.1 Project API
UML for the Project API:
- Project API Only:
- Project.__init__ Sequence:
The project API (classes found in sparrow/gbt/api/project), like most things used by the Sdfits Application, is in dire need of refactoring. Although we will try to illustrate what parts these are, all refactoring may not be done this cycle in the interests of time. Even without the current need to refactor this code to use timestamp instead of scan number as the unique identifier, there are a number of other issues:
- The Project Class is setup for use with an online system, specifically the GFM application. As a result, this makes the code, at best, confusing to read, and at worst, extremely unoptimal when used for an offline application like Sdfits. Perhaps the 'online' functionality could be turned off when not needed. For example, a call to the Read method is forwarded to the Update method, which performs the following actions not needed by Sdfits:
- only reads the ScanLog FITS file only if it's modification time has changed
- spawns a separate thread so that the ScanInfoThread can read metainfo from the GO FITS files and save this info in the Project's list of Scan objects.
- The Scan Class has a number of Acquire/Release methods for threading and FQL which currently 'pass' (do nothing). These are called throughout the class, and cluter the code. If they're not needed, they should be removed.
- The ProjectScans class, for each requested scan, gathers information from the GO and Antenna FITS file, even though only the site location from the Antenna FITS file (which is a constant in all Antenna files) is actually used. This is unnecessary.
- The ProjectScans class also raises an error if there is a problem reading both GO and Antenna files, but only if it's in debug mode. This seems superflous, since this class also explicitly checks for missing files. While sdfits is the only client of this class, these checks should be removed.
To support using timestamp for the unique identifier, the following changes are needed:
- Project Class:
- in the ReadScans method, the DATE-OBS column from the ScanLog FITS file should be used to init and organize scans, rather then just the Scan Number. (Currently, if two scans have the same scan number, the first one will be overwritten with the second).
- we need methods for determining the mapping between scan numbers (not unique) and scan timestamps (unique). In this way, an application like sdfits could specify just scan numbers, and this API would do the translation to timestamps internally.
- CreateScan needs to take timestamp instead of scan number. This is not currently used by
sdfits, but by other applications.
- Scan Class:
- initialization should happen with the timestamp and/or scan number, not just scan number.
- getScanName -> getTimestamp. Also, this currently uses the list of device files to determine this. This should use instead the member variable that is set upon initialization.
- getScanByName -> getScanByTimestamp
- getScan[Index]ByNumber: these methods currently return the first Scan object (or index of this scan in the project) that is encountered that has this scan number. What should it do if there is more then one scan number, and additional info is not supplied? For the index case, I can imagine that it might be good to simply return a vector of all indexes for all instances of that scan. For the non-index (object) case, probably it should just return the first one. I suppose we could return a tuple of (scanObject,count) where scanObject was the first that satisified and count was the count. Both of these might break things, though, so perhaps we should just return the first one and have some other method to return a count for a given scan.
- Project Scans:
- setScan method should take requested timestamp range in addition to scan numbers.
- The whole class needs to be refactored and cleaned up to reflect the way it is actually used by
sdfits.
The syntax will be the following:
-timestamp=starttime,endtime
If ",endtime" is omitted then all data from starttime to the end of available data is filled (-scans is also used to possibly limit the data to be filled). If "starttime" is omitted so that the option is ",endtime" with the leading comma then data will be filled up to and including endtime (again, using -scans). If both times are included then all data from startime to endtime is filled consistent with -scans. The times are specified exactly as they appear in the raw M&C FITS file names: YYYY_MM_DD_HH:MM:SS. This is identical to how TIMESTAMP is specified within GBTIDL to resolve duplicate scan numbers. The timestamps will be compared only against this date in the M&C files for that scan. If a timestamp occurs in the middle of a scan then either that scan is excluded (if the comparison is with starttime) or included (if the comparison is with endtime). No attempt will be made to select individual integrations based on the -timestamp arguments.
- sdfits.py:
- -timestamp option must be added
- usage must be updated to include the above description
- the -timestamp option will be checked for proper formatting here only
- the mapping of timestamps to scans is most easily handled by the Project API, which SDFITSWriter has access to, therefore, we will forego this mapping in this module.
- SDFITSWriter.py:
- in general, this class must be completely refactored to use timestamp in place of scan number. This is a simple design concept, but will require lots of changes throughout this class.
- interfaces must be created so that sdfits.py, or any other client, can pass on its -timestamp and -scan arguments, and feed back is provided as to out-of-range values, etc. (How do we want to handle cases of -scan and -timestamp arguments that dont overlap? I think the user has asked for data that doesn't exist so the same thing should happen that happens now when you ask for scans that don't exist, which I think is that nothing happens. I think sdfits should keep track and if nothing is written for all backends by the time the code exits, it should say that to the user.)
Who uses this application? Do we really have to waste time fixing it?
4.4.4 GFM Application
The plotting of data in response to a user's clicking on a scan number in the left-hand list view in GFM needs to be fixed. Instead of plotting the data based of the scan number that was clicked, the timestamp for that scan needs to be used to plot the data. Note that what is displayed in this list view need not change. The following code needs to change:
- sparrow/gbt/app/gfm/GFMPanel.py , OnSelectScan:
self.SetScanNo(scan.getScanNumber())
self.OnBeginScan(self.GetProjectName(), self.GetScanNo())
The references to scan number must change to represent scan timestamps
- sparrow/gbt/app/gfmGFMNotebook.py, self.OnBeginScan: must use timestamp instead of scan number
- sparrow/gbt/app/gfm/plugins: Ouch. All the plugins here must change their use of OnBeginScan (this is just the Spectral Line related plugins), and the processing they do with the scan object in OnEndScan (all plugins).
In addition, the state engine that still resides in GFMPanel.py (the GFM server/client uses the ScanStateEngine class), must also track the timestamp (a ScanCoordinator parameter) parameter and use this in place of the scan number, in much the same way as what is dicussed below for the online filler (this will be a similar amount of work).
The online sdfits daemon is responsible for detecting when scans end, and commanding sdfits to fill this latest scan in /home/sdfits. Currently, this daemon uses the -scans='latest scan number' option, which currently refills the same scan in the case of duplicate scan numbers. To fix this bug, the daemon must use the -timestamp option, either alone, or in conjunction with this -scan option in order to fill the latest scan. This will entail retrieving this timestamp value either from the M&C system, or using the current time to estimate an appropriate value for the -timestamp option (but this could complicate filling missed scans).
The robust solution would be to use the values derived from the M&C system, since this is what determines the actual value used in the ScanLog.fits file. This would entail the following changes:
- sparrow/gbt/api/ygor:
- ScanCoordinatorDevice:
- this class should track the values of the timestamp(?) parameter as well, with the same methods provided for this value as there are for the scan number parameter.
- ScanCoordinatorPlayback:
- this class, for simulating live data, must also simulate the changing of the timestamp parameter.
- ScanStateEngine:
- this class should use the timestamp value obtained from the ScanCoordinatorDevice class for:
- passing along to client methods which pass the scan number
- initializing the Scan object used for such methods as OnEndScan
- IntFileStateEngine:
- this class needs to use the timestamp rather then scan number for determining which FITS file to check for integrations.
- sparrow/gbt/daemon/onlineSdfits:
- OnlineSdfits:
- throughout this class, the timestamp values from the state engine must be used in place of the scan number.
- in writeLatestScans, this may get complicated, since this method makes sure that any scans that were missed since that last time a scan was filled, or that failed previous attempts to fill, are now filled as well. This is fairly straightforward with integer scan numbers, but may get more complicated with string timestamps.
- in getBackendScans, again, a list of scan numbers that was missed are organized by backend used, so that those scans can be filled togethor to a call to writeScans. This logic will have to be replaced with something based off timestamps, or perhaps the project index of each scan.
- when the final system call to sdfits is made, the -timestamp option must be used.
An alternative method may be used: this is for the the online sdfits daemon alone to just use the current time as an estimate for the -timestamp value, or perhaps the time since the last scan was filled, so that missed scans can get filled as well. This would avoid the many changes necessary in sparrow/gbt/api/ygor, but would still require extensive refactoring in sparrow/gbt/daemon/onlineSdfits.
The following fields will be added to the index: OBSID, TRGTLONG, and TRGTLAT. There must be enough characters in each index column to preserve the precision of each value using what format as use for the existing LONGITUDE and LATITUDE fields.
- The sdfits online documentation need to be modified to describe the new keywords and option.
- The sdfits release notes should also note the addition of the new keywords and option.
- Changes to the Index file written by the Sdfits tool has implications to the files in /home/sdfits. For more details on how this will be handled, see ModificationRequest8C706.
- The new use of Timestamps as the unique identifier for Scans within a Project will also affect GFM. The GFM release notes should note this new feature.
- Users who need to use the new fields in the sdfits file or the new index values will need to upgrade their version of GBTIDL to the new version (ModificationRequest8C706). Users who do not will be able to read these FITS files but not use the new information.
- The sparrow sdfits and spectrometer unit tests must continue to pass.
- GFM has no unit tests, but test data exists for internal testing: Project /home/gbtdata/AGBT02A_053_13 has scans 1 - 95, then repeat scans 1-8 again.
- Pre-existing data will be re-filled and the keywords checked. We will do this in the case where both the low and high cal were fired, as well as for data with a variety of different source epochs.
- the data should continue to pass the integration and regression tests.
- After initial scans are taken, the scan number should be restarted, so that duplicate scan numbers are present. GFM and the online Sdfits filler should not be confused by this and display the latest data, not the previous scan.
APPROVED: I acknowledge that my request is fully contained in this MR, and if the SDD delivers exactly what I specified, I will be happy.
ACCEPTED: I acknowledge that I have validated the completed code according to the acceptance tests, and I am happy with the results.
Symbols:
- Use
%X% if MR is not complete (will display
)
- Use
%Y% if MR iscomplete (will display
)
CCC Discussion Area
-- KarenONeil - 16 Oct 2006
|
Revision r1.30 - 23 Feb 2007 - 11:16 GMT - KarenONeil
|
Content copyright © 1999-2007 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
|
| |