The log collator is designed to take log data from multiple monitor points, and resample the log streams onto a common reference. The specification of file paths, data fields, and re-gridding methods are expressed in a text file using a english-like syntax. A limited set of built-in processing functions are available. User-written interpolation/processing functions may be added using python.
The log_collator is a command-line only utility. A number of command line options are available. Options have both a long and short form. The minimal form of running the log_collator is:
log_collator -i specificationfile -o outputfile [ options ]
(Before it is officially released, to run this utility, you must first source /home/sparrow/integration/sparrow.bash or /home/sparrow/integration/sparrow.tcsh)
A number of additional options may be specified:
| -i, --input | Specifies the spec file to be read |
| -o, --output | Specifies the name of the output data file |
| -f, --fits | Specifies that a FITS format file should be written |
| -q, --quiet | Inhibit processing progress messages |
| -c, --check | Just parse the file and check directory paths |
| -t, --interval | Overrides the interval set in the specification file |
| -s, --start | Overrides the start time set in the specification file |
| -e, --end, --stop | Overrides the stop time set in the specification file |
The default output file format is a binary format file. When writing binary files, an additional file with a ".txt" extension is also produced to describe the column order and processing methods used to generate the data. When the --fits or -f flags are present, a FITS format file will be written, but no description text file is produced.
There are two main types of statements. The first type appears as assignments to certain pre-defined values: Path, Starttime, Endtime, and Interval. For Example:
Path=/home/gbtlogs
Starttime=2007-02-14 12:01:00
EndTime=2007-02-28 23:59:00
Interval=0.1
Starttime, Endtime and Interval should only be set once, however the Path keyword can be re-set between with statements. Each assignment of Path may contain multiple directories separated by a colon.
For example:
# Look in gbtlogs or the archive
Path=/home/gbtlogs:/home/archive
The second type of statement involve a number of keywords:
with somedirname add fieldname using processing_function window size
Where:
- with somedirname - means to search the directory Path/somedirname for files in the time range specified by the Start/End time range.
- add fieldname - means to include the column fieldname in the processing
- using processing_function - means to apply the processing_function in computing the values for the re-gridded data. See below for the list of built-in functions, and how to write user-defined functions. Several columns may share a using clause.
- window size - means that a range of data of length size will be provided to the processing function for each calculation.
The following processing functions are pre-defined:
- mean - computes the average value from the data window
- median - computes the median value from the data window
- linear - computes a linearly interpolated value from the data window (window size must be 2)
- neighbor - finds the nearest data point to the time specified.
The using processing_function window size clause may be omitted (but not advised). In this case the log_collator will attempt to use some reasonable defaults:
- If the time grid step is finer that the sampled data, then linear interpolation is used.
- If the time grid is courser than the sampled data, and there are more than 5 data points per time grid step, use a median; otherwise use a mean.
Example statement which defaults the processing functions and window sizes:
with Weather-Weather1-weather1 add WINDVEL, WINDDIR
Example statements which use the 'as' clause to relabel the output data to unique names in the output file
with Accelerometer-Accelerometer1-AccelerometerData
add X as X_1
add Y as Y_1
add Z as Z_1
with Accelerometer-Accelerometer2-AccelerometerData
add X as X_2
add Y as Y_2
add Z as Z_2
Example specifying different processing methods for each field:
with Accelerometer-Accelerometer3-AccelerometerData
add X as X_3 using mean window 3
add Y as Y_3 using median window 5
add Z as Z_3 using neighbor
Example statement where all columns 'share' a single using statement. In this case all columns will be processed by the mean function with a window size of 10.
with Accelerometer-Accelerometer3-AccelerometerData
add X as X_3
add Y as Y_3
add Z as Z_3 using mean window 10
The log_collator allows the user to provide processing functions written in python. The name of the file should be given in an import statement directly in the specification file. (Note that it may be necessary to provide a full pathname to the user-defined file.
A user-defined processing function must accept two args: a scalar time and a list of tuples of timestamps and data. For example to write a simple average function:
In the file "myavg.py":
# For NAN value definition
import fpconst
def myavg(t, data):
"""
A function to compute a simple average to a few data points.
The time to be calculated is 't', and the data argument has the form of
a list of tuples of timestamps and data. The length of data will be what
was specified in the 'window' clause.
If a value cannot be produced for some reason, then a not-a-number value
should be returned instead.
"""
if len(data) < 1:
raise "WindowSizeError"
if some_other_data_problem:
return fpconst.NaN
for datapt in data:
c += datapt[1] # 0==timetag, 1==data sample
return c/len(data)
In the specification file, prior to any 'with' statements, add an import of your .py file:
import myavg.py
# or alternatively:
import /users/joeastro/myfuncs/myavg.py
# To use myavg, specify it in a using clause:
with Weather-Weather1-weather1 add WINDVEL using myavg window 12
# That's it!
1. Do not use commas to separate multiple add x as y statements. The use of commas in this case seems to cause columns to be excluded from processing.
Don't do this:
with Weather-Weather1-weather1 add WINDVEL as windspeed , add WINDDIR as winddirection
Do this instead:
with Weather-Weather1-weather1 add WINDVEL as windspeed add WINDDIR as winddirection
2. If all output data columns are NaN values (e.g. because the data was missing), the log_collator will omit the record in the output. This is easily changed, so I'd like input as to what should be done in this case.
-- JoeBrandt - 16 Feb 2007
Revision r1.7 - 10 Oct 2007 - 17:54 GMT - ToddHunter Parents: ModificationRequest3C107
|
Content copyright © 1999-2007 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
|
| |