NRAO Home  >  Green Bank  |  Wiki Topic:    GB > Software > TWikiUsers > JoeBrandt > DcrLinuxDesign
   Changes | Index | Contents | Search | Statistics | Go

DCR Linux Design Notes



Status

June 16

This week:

Resolution of 'Lost Interrupt' Problem

This has proved to be one of the toughest problems, and one which seems to have an easy resolution (fortunately).

To recap: the problem was that during operation the DCR would ocassionally get the interrupt handler dispatched, but the hardware status of the interface card had no indication of the source of the interrupt. To make matters worse, if the interrupt was then ignored, data loss would occur and a second error interrupt would be asserted.

The problem seems to be an anomilous behaviour of the interface card. If a VME interrupt is asserted while VME memory is being accessed from the PCI bus, the Bit3 interface card causes a PCI interrupt, but then clears out any indication of which VME interrupt level was active.

On the DCR, a user-space clock control program was run in the background to periodically access the VME (without interrupts) to read the Bancomm card time. Somehow this activity was enough to cause the clearing out of the card status. (Interesting that a user-level Linux process could have such a negative effect on a RTAI-handled kernel-level interrupt.)

The resolution is to simply coordinate the access to the bancomm card. The DCR driver now provides data to a 'Time FIFO', which contains the raw bc635 data, and a system time. This data is used by the clock control program to determine the system clock offset and passed on to the ntp deamon via shared memory.

1.0 Overall Architecture

The DCR hardware is currently hosted in a VME chassis, with a MVME-167 33MHz 68030 processor running VxWorks. The intent of the upgrade is to rehost the DCR with a modern system running Linux. The DCR hardware will still reside in a VME chassis, and will be controlled by a PCI to VME bus interface card. Real-time processing will be accomplished using the RTAI extensions to Linux.

Only a small portion of the total DCR code is truely hard real-time. This code will be revised and ported into a kernel-based loadable module. The balance of the refactored DCR code will reside in user-space as a normal Linux process. This partition of hard vs. soft real-time is a key advantage in the new design. The interface between the Manager/Linux process and the driver will be through a dual fifo mechanism. A command fifo will pass required configuration data (such as switching signal setup) down to the driver. When data is available, the driver will place the data onto a fifo that is later read by the Linux process.

2.0 Design Requirements

3.0 Design Details

3.1 Manager

3.2 Hardware Configuration

The hardware configuration (apart from the host) includes a Bit3 PCI to VME bus adapter; the VME based DCR integrating counters; the DCR timing generator and the voltage-to-frequency interface cards. Th existing VME based IRIG card is retained to maintain accurate timestamps.

The test jig wiring setup documentation is to-be-added. (After we figure it out!)

3.2.1 Switching Signal Routing
It is important to note that in both the test DCR and the GBT DCR, switching signal SW7 is routed into the SW6 input.

The original design of the DCR routed SW7 internally when the total power mode was programmed. Unfortunately this mode also unconditionally clears the SW0-SW6 signals (making them unmonitorable) (See Integrating counters schematic, sheet 5, region C3.) Because there is a SW7 to SW6 jumper, this special internal routing is unecessary. The Linux DCR therefore does not use the total power mode, instead it treats it like any other switching configuration. In the single phase total power mode, the driver 'toggles' SW7 to produce a phase change at every phase period transition.

3.3 Device Driver

The kernel based device driver must manipulate the local memory management unit (MMU), (a piece of hardware which maps virtual addresses into physical hardware addresses) and a second MMU on the PCI to VME interface card. The first is done through the kernel ioremap() function, the second step is described below.

The PCI-VME card has 8196 registers, each of which represents the address translation, address modifier codes and bytes swapping for a 4096 byte segment of address. The total VME address space represented is 32 MBytes. For the DCR the base address is 0x8000000 in A32D32 space. This means an address modifier code of either 0x09 or 0x0D should be used. The byte swapping setup should be "none" (The board does the correct thing by default).

The Bit3 card comes with a driver which is also in use on the Spectrometer. The driver is actually two drivers for a plethora of card types, and is implemented in 20 or more files. I needed something much simpler, which I found in a open source driver specifically written for the model of card we are using (Model 617/618). I extended this driver to provide kernel-level interfaces for memory and interrupt management, callable from RTAI kernel processes.

3.3.1 DCR Device Driver Hardware Interactions

3.3.1.1 Basic Pre-subscan Initialization
A typical setup cycle includes the following steps:

3.3.1.2 Arming the Timing Generator
Once the timing generator is setup with the switching signal configuration, the timing generator must be armed jest prior to the 1PPS transition (usually in the doArm method) at the start of the scan. This is done by writing to the TIMING_GENERATOR_ARM register. Since there will be an interrupt at the scan starttime, the driver or manager should ignore the first data sample following the start of the scan.

3.3.1.3 Timing Generator Setup Pulse
This is not be needed, as it applies to a switching scheme no longer in use.

3.3.1.4 Phase Change Interrupts
When any of the switching signals change state, an interrupt is generated on the VME bus signaling the event. (An important exception is that interrupts are not generated by blanking) On a typical interrupt cycle the following processing occurs:

[1] The timing generator FIFO depth is 1024 words, typically only 2 are used.

3.3.2 Bancomm VME 635 driver
The original card is retained in the new DCR configuration. A clock driver has been implemented to setup and control the bancomm card via the SHM or shared memory interface for the ntp deamon. A standalone program uses the vmedrv driver to memory map the Bancomm card into user-space. The program then waits delays approximately one second and reads the system and IRIG clocks. A difference is formed and is passed on to the ntpd daemon via the ntp SHM interface. The timing in scheduling the process is non-critical. In fact a small randomization is advantagous, because it reduces the jitter caused by other periodic events in the system. (For example, if a 1PPS interrupt was used, and the DCR was set to interrupt on the 1PPS, some minor interaction would take place. Randomizing the I/O to the bancomm card reduces the likelyhood of repeated event collisions.)

4.0 Hardware

The DCR hardware consists of three VME cards, two of which are accessible from the VME bus. The timing generator is a flexible digital signal generator which has seven signal outputs and a blanking signal output. The Counter register card has circuitry to detect phase changes, as well as sixteen 28 bit wide counter registers. The maximum count rate is 10MHz.

4.1 Timing Generator

Timing Generator Registers
Address On Read On Write
0x0 Input Status Generate setup pulse
0x4   Loads normal delay word (bits 0-23)
0x8   Loads advanced delay word (bits 0-23)
0xC   FIFO reset pulse, write to this address before starting to load the FIFO during a setup

4.1.1 Timing Generator Status Register
The timing generator status register has the following format:
Input Status Register Read Address 0x0
Bits 31-8 Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
Undefined SCAN NOT STARTED NO 1PPS NO 10MHZ Counter FPGA ERROR Timer FPGA ERROR New Phase & Counters Not Read FIFO FULL FIFO EMPTY

4.2 Integrating Counters

Integrating Counter Registers
Address On Read On Write
0x0 Channel 1 count Strobe to initialize counter select logic. When this strobe is issued after setup of the Timing Generator, the first counter to integrate good data will be Counter B.
0x4 Channel 2 count Reset current counter bank (after reading them)
0x8 Channel 3 count Strobe as a "last ditch effort" if all else seems to fail. (Equivalent to turning the power off then back on.)
0xC Channel 4 count V/F input control
0x10 Channel 5 count Chart Recorder output selection
0x14 Channel 6 count Reconfigure FPGA.
0x18 Channel 7 count Output channel V/F selection
0x1C Channel 8 count Reset V/F monitor errors
0x20 Channel 9 count  
0x24 Channel 10 count  
0x28 Channel 11 count  
0x2C Channel 12 count  
0x30 Channel 13 count  
0x34 Channel 14 count  
0x38 Channel 15 count  
0x3C Channel 16 count  
0x40 Switching Signal Status  
0x44 V/F monitors Status  
4.2.1 Switching Signal Status Register
Indicates the state of the switching signals for the counters presently which are to be read. Its format is as follows:
Switching Signal Status Register Read Offset 0x40
Bit 31-16 Bit 15-10 Bit 9 Bit 8 Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
X X Counter Bank==A Bad Data SW7 SW6 SW5 SW4 SW3 Blank Cal SigRef

4.2.2 V/F monitors Status:
Indicates the status of the V/F monitors.
V/F Monitor Status Register Read Offset 0x44
Bit 31-4 Bit 3 Bit 2 Bit 1 Bit 0
X Channel B > 9.5 MHz Channel B < 1.0 KHz Channel A > 9.5 MHz Channel A < 1.0 KHz

5.0 DCRInterface class

The DCRInterface class library encapsulates the interface to the DCR kernel driver, and also interfaces to the DCR kernel simulator.

5.1 open_fifos(is_simulator) Method

This method opens either the DCR rtai fifo's, or the simulator fifos depending on the argument provided. The FIFO's are closed automatically in the DCRInterface destructor.

5.2 Switching Signal setup methods

The following methods are provided, with data types that should match the standard:
   * void set_number_phases(int n);
   * void set_phase_start(double *phase_start);
   * void set_sig_ref_state(int *sigrefstate);
   * void set_cal_state(int *calstate);
   * void set_blanking(double *blanking);
   * void set_switch_period(double);
   * void is_switching_master(bool is_master);

The DCR has the following additional switching signals. sw3-sw6 are not currently connected, sw7 is used internally.

   * void set_sw3_state(int *sw3state);
   * void set_sw4_state(int *sw4state);
   * void set_sw5_state(int *sw5state);
   * void set_sw6_state(int *sw6state);
   * void set_sw7_state(int *sw7state);

5.3 Reset Device

    bool reset_device();

5.4 Send switching setup to the hardware

    bool setupSwitching();

5.5 Get Device Status

This reads (on demand) and returns the timing generator status, switching signal state, and current phase index. See Timing generator status register(4.1.1) and switching signal status(4.2.1) registers above for the bit definition.
    bool get_device_status(int &tgstatus, int &swstatus, int &cphase);

5.6 Input Selection

    bool set_inputs(int);

5.7.1 Signal I/O Selection

5.7.1.1 Select Test tone input
    bool set_test_tone(bool isOn);

5.7.1.2 Chart Recorder Selection
Selects which ports will be output to D/A's for chart-recorder.
    bool chartRecorderSelect(int output1, int output2);

5.7.1.3 V/F Monitor Selection
Selects which V/F monitors are routed to front panel display.
    bool vfMonitorSelect(int output1, int output2);

5.7.1.4 Select V/F Bank
Select which bank of V/F's are routed as inputs. (Selects one of two 16 channel banks.)
    bool set_vf_bank(int );

5.7.1.5 Write Signal I/O Selection
This method writes the testtone, chart recorder, bank select and vfmonitor configuration to the hardware.
    bool configure_io_selection();

5.8 Arming the Hardware/Starting a Scan

    bool arm(const TimeStamp &start, const TimeStamp &length);

5.9 Aborting a scan

Unfortunately, the hardware once armed does not like to be stopped. This method will stop un-desired data from being sent into the data FIFO, but does not disable interrupts. (There is not such function in the hardware.) This is not a problem, since the driver handles it.
    bool abort();

5.10 My debug routine

This routine is not useful to the final product.
    bool no_op();

5.11 Reading out data

Data is extracted from the driver using the method:
    bool read_phase_data(PhaseData *data)
Notice: I changed the semantics of data->phase_number. It now counts each phase in a scan, starting at 1. See the header file for more info. Note that this method will block until data is available.

5.12 Interpreting the switching_state field

Switching_State Bits
Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0
SW7 SW6 SW5 SW4 SW3 Blank Cal SigRef
Bit 15 Bit 14 Bit 13 Bit 12 Bit 11 Bit 10 Bit 9 Bit 8
X X Scan started New Phase & Counters Not Read FIFO FULL FIFO EMPTY Counter Bank==A Bad Data
Bit 31/30 Bit 29/28 Bit 27/26 Bit 25/24 Bit 23/22 Bit 21/20 Bit 19/18 Bit 17/16
SR/C8 SR/C7 SR/C6 SR/C5 SR/C4 SR/C3 SR/C2 SR/C1

6.0 System Oriented Special Notes

This section is meant to document some of the special tweaks made on dozer (the DCR Linux host).

6.1 NTP configuration

6.2 BIOS Settings

Most of the on-board unused peripherals have been disabled. Of special note are the USB controllers, which tend to produce lots of SMI (System Management Interrupt) events.

6.3 SMI Interrupt Reconfiguration

Disabling the USB helps, but does not eliminate the SMI interrupt problem. A second tweak requires the program smi_user to detect and disable the global SMI enable bits of the 82801 integrated peripheral controller.

7.0 Testing

7.1 Nominal Operation Testing

This involves running the DCR as both slave and master, and selecting each of the standard switching mode available on the Scancoordinator Cleo screen. (With the notable exception of total power in slave mode.)

7.2 Observational Testing

Conduct a series of pointing and focus scans using the VxWorks based DCR. Switch control over to the Linux based DCR, and repeat a series of pointing and focus scans. Note the pointing offset differences, or apparent hysteresis. Run another set of scans using different rates and compare again looking for hysteresis.

8.0 Dcr Bugs

This section is intended to list bugs found after release.

8.1 Dcr Overflows Timing FIFO

This problem occurs in the senario: A pointing scan is performed with the DCR configured as switching master. Once complete, the user removes the Dcr from the scan coordinator, and then configures another backend (and the switching signal selector) as master. Since the Dcr was removed from the scan coordinator, it is not notified about the loss of mastership. This causes the dcr to overflow the timing generator FIFO if the last configured switching rate on the Dcr is slower than the actual master.

A fix has been deployed to prevent this from being a fatal error, but the true resolution is for the dcr manager to monitor the actual mastership, either via a panel or SIB connection.

8.2 Dcr Switches toggles Sw7 even when not in total power mode.

The original Sw7 (Sw5 in the code) dependency routine has the signal toggling during each phase of the switching cycle when the system is the switching master. This should not be necessary unless there are no other signal transisions (e.g. total power without cal).

8.3 Dcr relies on NTP to maintain system time synchronization.

Since the Dcr host relies upon ntp for time synchronization, it should be added to the list of ntp monitored hosts.

8.4 Dcr driver does not update NTP, when there is no switching.

In some cases, like spectral line in total power mode, there is no switching. Since the bancomm card is only read during switching, the userspace ntpshmem clock loses its synchonization source. This is non-fatal, and ntp knows how to compensate by using another synchronization source. Starting with M&C version 8.4 this bug is fixed.

-- JoeBrandt - 10 Mar 2005

Topic DcrLinuxDesign . { Edit | Attach | Ref-By | Printable | Diffs | r1.29 | > | r1.28 | > | r1.27 | More }
Revision r1.29 - 06 Aug 2008 - 14:36 GMT - JoeBrandt
Parents: TWikiUsers > JoeBrandt
Content copyright © 1999-2007 by the contributing authors.
All material on this collaboration platform is the property of the contributing authors.

Software.DcrLinuxDesign moved from Main.DcrLinuxDesign on 10 Mar 2005 - 18:06 by JoeBrandt - put it back