DCR Linux Design Notes
This week:
- Dcr and clock control daemons now use device files /dev/dcr and /dev/bc635 so module use count is correct. (Prevents module from being unloaded while clock and Dcr Manager are running.
- Hostname change. Met with Richard and revised the [[Main.MachineNamesProposal][host naming guidelines] page.
- Created taskmaster configuration, ntpd/clock startup and installation scripts for drivers.
- Fixed the problem with total power mode not sensing switching states correctly.
- Worked with Ray to integrate Tcal from database code into Dcr.
- Conducted some initial testing of Dcr, and (Karen) found some errors in the equations being used for Tsys. Will revise MR, and fix the errors.
- Started on the LO1 design document.
This has proved to be one of the toughest problems, and one which seems to have an easy resolution (fortunately).
To recap: the problem was that during operation the DCR would ocassionally get the interrupt handler dispatched,
but the hardware status of the interface card had no indication of the source of the interrupt. To make matters
worse, if the interrupt was then ignored, data loss would occur and a second error interrupt would be asserted.
The problem seems to be an anomilous behaviour of the interface card. If a VME interrupt is asserted while
VME memory is being accessed from the PCI bus, the Bit3 interface card causes a PCI interrupt, but then clears
out any indication of which VME interrupt level was active.
On the DCR, a user-space clock control program was run in the background to periodically access the VME (without
interrupts) to read the Bancomm card time. Somehow this activity was enough to cause the clearing out of the
card status. (Interesting that a user-level Linux process could have such a negative effect on a RTAI-handled
kernel-level interrupt.)
The resolution is to simply coordinate the access to the bancomm card. The DCR driver now provides data to a 'Time FIFO', which contains the raw bc635 data, and a system time. This data is used by the clock control program to determine the system clock offset and passed on to the ntp deamon via shared memory.
The DCR hardware is currently hosted in a VME chassis, with a MVME-167 33MHz 68030 processor running VxWorks. The intent of the upgrade is to rehost the DCR with a modern system running Linux. The DCR hardware will still reside in a VME chassis, and will be controlled by a PCI to VME bus interface card. Real-time processing will be accomplished using the RTAI extensions to Linux.
Only a small portion of the total DCR code is truely hard real-time. This code will be revised and ported into a kernel-based loadable module. The balance of the refactored DCR code will reside in user-space as a normal Linux process. This partition of hard vs. soft real-time is a key advantage in the new design. The interface between the Manager/Linux process and the driver will be through a dual fifo mechanism. A command fifo will pass required configuration data (such as switching signal setup) down to the driver. When data is available, the driver will place the data onto a fifo that is later read by the Linux process.
- Maintain existing parameter set
- Re-implement FITS file to conform to M&C standards
- Maintain or exceed current system performance
- Attempt to normalize partially blanked data??
The hardware configuration (apart from the host) includes a Bit3 PCI to VME bus adapter; the VME based DCR integrating counters; the DCR timing generator and the voltage-to-frequency interface cards. Th existing VME based IRIG card is retained to maintain accurate timestamps.
The test jig wiring setup documentation is to-be-added. (After we figure it out!)
It is important to note that in both the test DCR and the GBT DCR, switching signal SW7 is routed into the SW6 input.
The original design of the DCR routed SW7 internally when the total power mode was programmed. Unfortunately this mode also unconditionally clears the SW0-SW6 signals (making them unmonitorable) (See Integrating counters schematic, sheet 5, region C3.) Because there is a SW7 to SW6 jumper, this special internal routing is unecessary. The Linux DCR therefore does not use the total power mode, instead it treats it like any other switching configuration. In the single phase total power mode, the driver 'toggles' SW7 to produce a phase change at every phase period transition.
The kernel based device driver must manipulate the local memory management unit (MMU), (a piece of hardware which maps virtual addresses into physical hardware addresses) and a second MMU on the PCI to VME interface card. The first is done through the kernel ioremap() function, the second step is described below.
The PCI-VME card has 8196 registers, each of which represents the address translation, address modifier codes and bytes swapping for a 4096 byte segment of address. The total VME address space represented is 32 MBytes. For the DCR the base address is 0x8000000 in A32D32 space. This means an address modifier code of either 0x09 or 0x0D should be used. The byte swapping setup should be "none" (The board does the correct thing by default).
The Bit3 card comes with a driver which is also in use on the Spectrometer. The driver is actually two drivers for a plethora of card types, and is implemented in 20 or more files. I needed something much simpler, which I found in a open source driver specifically written for the model of card we are using (Model 617/618). I extended this driver to provide kernel-level interfaces for memory and interrupt management, callable from RTAI kernel processes.
3.3.1 DCR Device Driver Hardware Interactions
A typical setup cycle includes the following steps:
- strobe the timing generator to generate the setup pulse (TBD) (TG_SETUP register)
- issue a timing generator reset (TIMING_GENERATOR_RESET register)
- initialize the counters (INIT_COUNTERS register) This sets the first set of data to come from the 'B' bank of counters
- reset the counter values (RESET_COUNTERS register) I think this only resets the current non-integrating bank which at this point should be bank 'B'
- set the bank in the BANK_AND_TEST_AND_TP_SELECT register. I think this is misordered, should always be 'A' and should happen prior to resetting the counter values.
- turn off/on the test signal, and set total power mode (BANK_AND_TEST_AND_TP_SELECT register)
- Reset the V to F errors (RESET_V_F_ERRORS register)
- set the timing generator normal delay (NORMAL_DELAY register)
- set the timing generator advance delay (ADV_DELAY register)
- reset the timing generator fifo (RESET_FIFO register)
- load the advance switch signal timing (ADV_SWITCH_SIGS_AND_PHASE register) Note we don't use these signals
- load the normal switching signal timing (NORMAL_SWITCH_SIGS_AND_PHASE, NORMAL_SWITCH_SIGS_HI registers)
- repeat the last two steps to load at least one full switching cycle
- clock the first value into the timing generator (PHASE_PERIOD_INIT register) This does not start the generator.
- arm the timing generator to start on the next 1PPS edge
Once the timing generator is setup with the switching signal configuration, the timing generator must be armed jest prior to the 1PPS transition (usually in the doArm method) at the start of the scan. This is done by writing to the TIMING_GENERATOR_ARM register. Since there will be an interrupt at the scan starttime, the driver or manager should ignore the first data sample following the start of the scan.
This is not be needed, as it applies to a switching scheme no longer in use.
When any of the switching signals change state, an interrupt is generated on the VME bus signaling the event. (An important exception is that interrupts are not generated by blanking) On a typical interrupt cycle the following processing occurs:
- acknowledge the PCI and then the VME interrupt
- verify the interrupt was generated by the DCR by reading the interrupt vector
- read the state of the timing generator (TIMING_GENERATOR_STATUS register)
- read the state of the switching signalls (DCR_SWITCH_STATE register)
- read the sixteen DCR_COUNTER registers
- reload the timing generator FIFO with enough information to sequence until the next switching cycle[1]. (remember blanking phase do not generate interrupts)
- reset the counters (RESET_COUNTERS register)
- time-tag the data
[1] The timing generator FIFO depth is 1024 words, typically only 2 are used.
3.3.2 Bancomm VME 635 driver
The original card is retained in the new DCR configuration. A clock driver has been implemented to setup and control
the bancomm card via the SHM or shared memory interface for the ntp deamon. A standalone program uses the vmedrv driver to memory map the Bancomm card into user-space. The program then waits delays approximately one second and reads the system and IRIG clocks. A difference is formed and is passed on to the ntpd daemon via the ntp SHM interface. The timing in scheduling the process is non-critical. In fact a small randomization is advantagous, because it reduces the jitter caused by other periodic events in the system. (For example, if a 1PPS interrupt was used, and the DCR was set to interrupt on the 1PPS, some minor interaction would take place. Randomizing the I/O to the bancomm card reduces the likelyhood of repeated event collisions.)
The DCR hardware consists of three VME cards, two of which are accessible from the VME bus. The timing generator is a flexible digital signal generator which has seven signal outputs and a blanking signal output. The Counter register card has circuitry to detect phase changes, as well as sixteen 28 bit wide counter registers. The maximum count rate is 10MHz.
| Timing Generator Registers |
| Address | On Read | On Write |
| 0x0 | Input Status | Generate setup pulse |
| 0x4 | | Loads normal delay word (bits 0-23) |
| 0x8 | | Loads advanced delay word (bits 0-23) |
| 0xC | | FIFO reset pulse, write to this address before starting to load the FIFO during a setup |
The timing generator status register has the following format:
| Input Status Register Read Address 0x0 |
| Bits 31-8 | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
| Undefined | SCAN NOT STARTED | NO 1PPS | NO 10MHZ | Counter FPGA ERROR | Timer FPGA ERROR | New Phase & Counters Not Read | FIFO FULL | FIFO EMPTY |
| Integrating Counter Registers |
| Address | On Read | On Write |
| 0x0 | Channel 1 count | Strobe to initialize counter select logic. When this strobe is issued after setup of the Timing Generator, the first counter to integrate good data will be Counter B. |
| 0x4 | Channel 2 count | Reset current counter bank (after reading them) |
| 0x8 | Channel 3 count | Strobe as a "last ditch effort" if all else seems to fail. (Equivalent to turning the power off then back on.) |
| 0xC | Channel 4 count | V/F input control |
| 0x10 | Channel 5 count | Chart Recorder output selection |
| 0x14 | Channel 6 count | Reconfigure FPGA. |
| 0x18 | Channel 7 count | Output channel V/F selection |
| 0x1C | Channel 8 count | Reset V/F monitor errors |
| 0x20 | Channel 9 count | |
| 0x24 | Channel 10 count | |
| 0x28 | Channel 11 count | |
| 0x2C | Channel 12 count | |
| 0x30 | Channel 13 count | |
| 0x34 | Channel 14 count | |
| 0x38 | Channel 15 count | |
| 0x3C | Channel 16 count | |
| 0x40 | Switching Signal Status | |
| 0x44 | V/F monitors Status | |
Indicates the state of the switching signals for the counters presently which are to be read. Its format is as follows:
| Switching Signal Status Register Read Offset 0x40 |
| Bit 31-16 | Bit 15-10 | Bit 9 | Bit 8 | Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
| X | X | Counter Bank==A | Bad Data | SW7 | SW6 | SW5 | SW4 | SW3 | Blank | Cal | SigRef |
Indicates the status of the V/F monitors.
| V/F Monitor Status Register Read Offset 0x44 |
| Bit 31-4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
| X | Channel B > 9.5 MHz | Channel B < 1.0 KHz | Channel A > 9.5 MHz | Channel A < 1.0 KHz |
5.0 DCRInterface class
The DCRInterface class library encapsulates the interface to the DCR kernel driver, and also interfaces to the DCR kernel simulator.
This method opens either the DCR rtai fifo's, or the simulator fifos depending on the argument provided. The FIFO's are closed automatically in the DCRInterface destructor.
The following methods are provided, with data types that should match the standard:
* void set_number_phases(int n);
* void set_phase_start(double *phase_start);
* void set_sig_ref_state(int *sigrefstate);
* void set_cal_state(int *calstate);
* void set_blanking(double *blanking);
* void set_switch_period(double);
* void is_switching_master(bool is_master);
The DCR has the following additional switching signals. sw3-sw6 are not currently connected, sw7 is used internally.
* void set_sw3_state(int *sw3state);
* void set_sw4_state(int *sw4state);
* void set_sw5_state(int *sw5state);
* void set_sw6_state(int *sw6state);
* void set_sw7_state(int *sw7state);
bool reset_device();
bool setupSwitching();
This reads (on demand) and returns the timing generator status, switching signal state, and current phase index. See Timing generator status register(4.1.1) and switching signal status(4.2.1) registers above for the bit definition.
bool get_device_status(int &tgstatus, int &swstatus, int &cphase);
bool set_inputs(int);
bool set_test_tone(bool isOn);
Selects which ports will be output to D/A's for chart-recorder.
bool chartRecorderSelect(int output1, int output2);
Selects which V/F monitors are routed to front panel display.
bool vfMonitorSelect(int output1, int output2);
Select which bank of V/F's are routed as inputs. (Selects one of two 16 channel banks.)
bool set_vf_bank(int );
This method writes the testtone, chart recorder, bank select and vfmonitor configuration to the hardware.
bool configure_io_selection();
bool arm(const TimeStamp &start, const TimeStamp &length);
Unfortunately, the hardware once armed does not like to be stopped. This method will stop un-desired data from being sent into the data FIFO, but does not disable interrupts. (There is not such function in the hardware.) This is not a problem, since the driver handles it.
bool abort();
This routine is not useful to the final product.
bool no_op();
Data is extracted from the driver using the method:
bool read_phase_data(PhaseData *data)
Notice: I changed the semantics of data->phase_number. It now counts each phase in a scan, starting at 1.
See the header file for more info.
Note that this method will block until data is available.
| Switching_State Bits |
| Bit 7 | Bit 6 | Bit 5 | Bit 4 | Bit 3 | Bit 2 | Bit 1 | Bit 0 |
| SW7 | SW6 | SW5 | SW4 | SW3 | Blank | Cal | SigRef |
| Bit 15 | Bit 14 | Bit 13 | Bit 12 | Bit 11 | Bit 10 | Bit 9 | Bit 8 |
| X | X | Scan started | New Phase & Counters Not Read | FIFO FULL | FIFO EMPTY | Counter Bank==A | Bad Data |
| Bit 31/30 | Bit 29/28 | Bit 27/26 | Bit 25/24 | Bit 23/22 | Bit 21/20 | Bit 19/18 | Bit 17/16 |
| SR/C8 | SR/C7 | SR/C6 | SR/C5 | SR/C4 | SR/C3 | SR/C2 | SR/C1 |
This section is meant to document some of the special tweaks made on dozer (the DCR Linux host).
6.1 NTP configuration
6.2 BIOS Settings
Most of the on-board unused peripherals have been disabled. Of special note are the USB controllers, which tend to produce lots of SMI (System Management Interrupt) events.
6.3 SMI Interrupt Reconfiguration
Disabling the USB helps, but does not eliminate the SMI interrupt problem. A second tweak requires the program smi_user to detect and disable the global SMI enable bits of the 82801 integrated peripheral controller.
This involves running the DCR as both slave and master, and selecting each of the standard switching mode available on the Scancoordinator Cleo screen. (With the notable exception of total power in slave mode.)
Conduct a series of pointing and focus scans using the VxWorks based DCR. Switch control over to the Linux based DCR, and repeat a series of pointing and focus scans. Note the pointing offset differences, or apparent hysteresis. Run another set of scans using different rates and compare again looking for hysteresis.
This section is intended to list bugs found after release.
8.1 Dcr Overflows Timing FIFO
This problem occurs in the senario: A pointing scan is performed with the DCR configured as switching master. Once complete, the user removes the Dcr from the scan coordinator, and then configures another backend (and the switching signal selector) as master. Since the Dcr was removed from the scan coordinator, it is not notified about the loss of mastership. This causes the dcr to overflow the timing generator FIFO if the last configured switching rate on the Dcr is slower than the actual master.
A fix has been deployed to prevent this from being a fatal error, but the true resolution is for the dcr manager to monitor the actual mastership, either via a panel or SIB connection.
The original Sw7 (Sw5 in the code) dependency routine has the signal toggling during each phase of the switching cycle when the system is the switching master. This should not be necessary unless there are no other signal transisions (e.g. total power without cal).
8.3 Dcr relies on NTP to maintain system time synchronization.
Since the Dcr host relies upon ntp for time synchronization, it should be added to the list of ntp monitored hosts.
8.4 Dcr driver does not update NTP, when there is no switching.
In some cases, like spectral line in total power mode, there is no switching. Since the bancomm card is only read during switching, the userspace ntpshmem clock loses its synchonization source. This is non-fatal, and ntp knows how to compensate by using another synchronization source. Starting with M&C version 8.4 this bug is fixed.
-- JoeBrandt - 10 Mar 2005
Revision r1.29 - 06 Aug 2008 - 14:36 GMT - JoeBrandt Parents: TWikiUsers > JoeBrandt
|
Content copyright © 1999-2007 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
|
| Software.DcrLinuxDesign moved from Main.DcrLinuxDesign on 10 Mar 2005 - 18:06 by JoeBrandt - put it back |