NRAO Home  >  Green Bank  |  Wiki Topic:    GB > Software > ModificationRequest3C107
   Changes | Index | Contents | Search | Statistics | Go

Extract Sampler Information from Log Files & Create Antenna Characterization file

Modification Request #3 (C1 2007)



1. Introduction

The process of creating new GBT pointing models involves the processing of astronomical observations (e.g. pointing runs), an integration of various log file data. The analysis program requires auxiliary data from various log files. This MR specifies a utility which will be able to construct a single file which contains the log data re-sampled to a common time-interval, collated into a single binary format file.

2. Background

This is a specification for creating a binary data file from FITS formatted files, in support of GBT pointing and focus analysis. This file will serve as one of the inputs to the (matlab-based) pointing analysis program.

All the metrology data, for example from the quadrant detector, accelerometers, inclinometers, are normally stored into FITS files with a separate FITS file for each device or in some cases, several FITS files per device. They are organized in terms of the device, manager, and sampler. The proposal here is to define a flexible way that one can specify the FITS files and parameters within those FITS files that are to be extracted and placed into a binary formatted file. All the input FITS files are FITS binary tables with the first field being the MJD time tag.

The request is that the output database be a flat file, with all the data in binary form as double precision (64 bits) IEEE-754 floating point. All of the data is to be selected for a specified time range and interpolated onto a specified time interval. Each 'row' of the output file will contain a timestamp (in MJD and fraction of day) as the first field, followed by all other specified fields.

This would be a general system to consolidate any group of FITS files into a common file, and it can serve many purposes, not just PTCS. Any analysis of GBT monitoring data can use this system. We suggest an option to put the collected data into a big FITS file, as an alternate option. In this form it can be easily imported into idl programs, or plotted with "fv".

The output database will consist of two files, the first an ascii file giving the selection and list of samplers, as described below; the second is a binary file containing all the selected and interpolated data.

3. Requirements

The basic requirements are to allow the specification of a directory, time range, resampling interval, a subdirectory such as 'Weather-Weather1-weather1', fields within the sampler, and interpolation method parameters in a specification file.

The application shall read this file, and read the necessary files which match the time range, and re-sample the data. The data shall be written to either a FITS binary table or a native binary format file.

3.1 Log File Selection Specification

The input specification to specify the time-range, log file set, and resampling interval. Comments will be delimited by a '#' in the first column, and will end at a newline.

The selection specification will contain the following elements:

3.1.1 Time range and sampling interval:

The start-time, end-time and interval are set by assignment of values to the variable 'Starttime', 'Endtime', and 'Interval' respectively as in the example shown below.

Where: the dates are in yyyy-mm-dd, and the times are in HH:MM:SS referenced to UTC. The resampling interval is specified in seconds.

3.1.2 Path Specification

A Path prefix is normally required to set the location of the log directories. The 'Path' variable may be set more than once, as needed. The actual pathname applied to each 'with' statement is the most recent value. The Path variable must be set at least once.

3.1.3 FITS directory name, and optional FITS extension

A log subdirectory may be added by the 'with' keyword. Fields of the sampler are added with the 'add' keyword. If desired the input field map be mapped to a different name using the 'as' keyword. The basic syntax to specify fields is intended to be readable: Square brackets indicate optional clauses. Variations on this basic statement are described below.

To add the wind velocity column from the weather1 sampler:

Additional fieldnames may be specified by providing a comma separated field list, or additional 'add' statements.

The 'using' keyword allows specification of an interpolation method, and if specified requires the window size to be set.

There may be cases when writing a FITS format output file, that column names may clash. The add statement may contain the 'as' keyword, followed by the new field name. If during processing a name clash exists, a message will be output to the console.

Planned Enhancements not supported in the initial version:

This construct may be repeated as needed for multiple FITS files. A full example is shown in section 3.2.

3.2 Example Selection Specification

As an example, consider the following:
#=========================================================
# Example: log data specification
# comments may begin with #

# UT date and time range, and sampling interval in seconds.
#
Starttime=2006-11-29 07:00
Stoptime=2006-11-30 14:45
Interval=0.1

# Set the path prefix. This may be set more than once, so it is
# context-sensitive.
Path=/home/gbtlogs
# The first set of fields from the sampler
with Inclinometer-Inclinometer-InclinometerData
    add x1_angle, y1_angle, x2_angle, y2_angle using mean window 5

# Example use of the 'as' construct. This renames the column in the output file.
with Accelerometer-Accelerometer1-AccelerometerData
    add X as X_1,
    add Y as Y_1,
    add Z as Z_1
with Accelerometer-Accelerometer2-AccelerometerData
    add X as X_2,
    add Y as Y_2,
    add Z as Z_2
# Example specifying different methods for each field
with Accelerometer-Accelerometer3-AccelerometerData
    add X as X_3 using mean,
    add Y as Y_3 using median,
    add Z as Z_3 using nearest
#
with Weather-Weather2-weather2 add WINDVEL, WINDDIR using median window 3

# Change Path to another directory
Path=/home/gbtdata/AGBT_XXX/Archivist
with ServoMonitor-ServoMonitor-Az_El_1Hz
    add El_1Hz_Az, El_1Hz_El

# Include dcr data calling a custom interpolation method with a window of 3
# (Note: Future enhancement)
# Path=/home/gbtdata/AGBT_XXX/DCR
# Note fieldname and extension name are both 'DATA' in the DCR
# with DCR add DATA from DATA using custom_dcr_mean window 3

#=========================================================

This example has specified 17 quantities from 6 different samplers. The output file will consist of groups of 18 double precision floats, i.e, the MJD time stamp followed by the 17 selected quantities. There will thus be 1,143,000 groups, corresponding to tenth of a second sampling over the 31.75 hour time range. Thus a total file size of about 165 MBytes.

The output format depends upon the output filename extension. If the extension is '.fits' then a FITS file will be generated. If any other extension is given, a binary file is generated.

If the specification file above were saved into a file named "specification.input", then a session with the collation utility might look like:

   $ log_collator -o output_file_name.bin -i specification.input
Or if a FITS format outputfile is desired, just change the extension name:
   $ log_collator -o output_file_name.fits -i specification.input

3.3 Interpolation/extrapolation:

There are three general cases:

A requirement to support custom interpolation methods, is under consideration, but will not be included in the initial version. The built-in interpolation methods will be available:

3.3.1 Windowing:

If a field specfication has set the window size, then the request window will be used to select window_size samples centered on the time grid. The input data is assumed to be regularly sampled. If not specified the following heuristics will be used:

3.4 Error Handling

There will be cases where for some reason, data for a portion of the time-interval is unavailable. In this case, the affected fields shall be set to a NAN (not-a-number) value to indicate no data was available. During processing if the case occurs the program should print alert information to the standard-error stream. In this case the program will generate the output, filling the invalid fields with NAN's.

3.5 Output File Data Format

3.5.1 FITS Formated Output

A mode will be available to write a FITS formatted file, with minimal header information such as TTYPE, TFORM, and TUNIT for each column, so that tools such as fv can be used to verify/view the final data product.

3.5.2 Binary Formated Output

The output file in this mode will consist of rows of data, where the first field in each row is a MJD, followed by the fields in the order specified in the selection specification. All fields will be in IEEE-754 64bit double precision format. FYI: It should be noted that the byte ordering will be the native format for an IA-32 instruction set machine. This is exactly opposite of the byte order for binary FITS tables.

3.6 Requirement Discussion/Concerns:

Frank notes:

I'm not sure if this is going to work. The semantics of multi-valued data are encoded in the FITS headers, and are mode or setup dependent. A consumer of the output file (without the original FITS header) would not have a complete specification of what the values mean. Specifically:

  1. Backend timestamps have different definitions than ordinary log files. For example, log file timestamps indicate the 'midpoint' of the data sampling period. However, most backends time-tag the FITS data with either the beginning or end of an integration. In addition the integration may include states of reference or cal signals.
  2. Backend data files often are multi-valued, where log files are not. As an example, a DCR 'datapoint' contains a block of data, which is the product of switching signal phases and the number of channels used.

In the interest of getting this out this cycle, I'd like to exclude the requirement to allow user-defined interpolation methods and multi-valued fields, at least for the initial version. We can revisit this later as an enhancement. Of course the design shall not preclude these enhancements.

4. Design

This program will be implemented in python, using a combination of existing modules as a start. The output will be written using PyFITS, and in the case of a raw binary table, a final post-processing step will be performed to extract the information from the output FITS file.

Object Diagram
Description: The Executive uses the SpecificationParser to read the textual specification, and creates a number of FieldSpecification objects. The Executor then creates the OutputDataWriter, and uses the list of FieldSpecifications to create processing pipelines, each of which consist of an InputDataField, an Interpolator, and an OutputDataField. These thress components are managed and abstracted by the Pipeline object. Processing is driven by the Executive which loops over the time interval, and over each pipeline producing a new time grid value for a given time t.

There is only one output file, but many files may need to be read, each which contain a number of columns which are the input data for one or more pipelines. This means that the Executive maintains an ordered list of the InputDataSrc objects, independent in number from the list of Pipeline objects.

5. Deployment Checklist

What has to get done to integrate this completely into the system. This checklist must be completed before Cycle Integration Testing begins.

6. Test Plan

6.1 Internal Testing

6.2 Sponsor Testing

6.3 Integration/Regression Tests

See Sponsor testing.


Signatures

APPROVED: I acknowledge that my request is fully contained in this MR, and if the SDD delivers exactly what I specified, I will be happy.

ACCEPTED: I acknowledge that I have validated the completed code according to the acceptance tests, and I am happy with the results.

Written DONE - JoeBrandt - 16 Jan 07
Checked DONE - RonGrider - 16 Jan 07
Approved by Sponsor DONE - FrankGhigo - 18 Jan 07
Approved by CCC DONE - RonMaddalena - 5 Feb 07
Accepted/Delivered by Sponsor DONE - FrankGhigo - 27 Mar 07

Symbols:


CCC Discussion Area

Attachment: sort Action: Size: Date: Who: Comment:
Object1.png action 16422 24 Jan 2007 - 16:56 JoeBrandt  

Topic ModificationRequest3C107 . { Edit | Attach | Ref-By | Printable | Diffs | r1.20 | > | r1.19 | > | r1.18 | More }
Revision r1.20 - 27 Mar 2007 - 16:07 GMT - FrankGhigo Content copyright © 1999-2007 by the contributing authors.
All material on this collaboration platform is the property of the contributing authors.