Saravana Pandian Annamalai
25. April 2019 · Write a comment · Categories: ARM, Embedded Software · Tags: , , ,

In our earlier blogs on ARM Interrupt architectures, we explored the ARM exception models and registers. Also, we went through different kinds of Interrupt controllers being used. In the upcoming blogs, we will primarily see ARM Interrupt handling from the firmware/software perspective including operating systems like FreeRTOS, Linux and WinCE. To being with, this blog will discuss interrupt handling in ARM Cortex M MCUs.

Cortex M Vector Table

As discussed earlier, the ARM Cortex M series of MCUs typically carters to lower end application with the core running between a few MHz to a maximum 150MHz. To target low cost tools and ease of development, the interrupt architecture is designed to be simpler and straight forward. The vector table in ARM Cortex M series looks like:

Cortex M Vector Table

Typically, on power-on reset, the Vector table base address is defined to be at 0. The ARM core, up on boot up, loads the stack pointer with the value stored at offset 0. And then it loads the Program counter with the address available at offset 4 and starts executing the same.

Thus, the vector table design is such that the stack is operational before the core starts executing thereby eliminating the need for an assembly code to set things up for calling functions. In the firmware perspective, this is a major advantage. There is no need to write any assembly code.

Following these 2 words, the table should hold the addresses of the exception handlers. The first 14 of them are pre-defined ad reserved for handling specific to the core and its execution. From offset, 0x40, the SoC specific interrupt handlers are defined and can be customized by the silicon vendor.

It can also be noted that there is a priority associated with each of these exceptions. Lower the number the higher the priority. Some of the major exceptions defined by ARM are

Reset Handler

At the highest priority of -3, this is the entry point of execution. Loaded to PC on power on reset, this is responsible for initializing the system peripherals and start executing the firmware/OS.

NMI Handler

At a priority only next to Reset Handler (-2), as the name suggests this cannot be masked by software. It is typically triggered by a specialized peripheral unit that can be connected to a critical functionality.

HardFault Handler

This exception is caused when there is an error during exception processing. At a higher priority of -1 than other exception, this can be used to recover from issues during exception handling.

MemManage Handler

Caused due to memory protection faults, the priority level can be configured by the firmware.

BusFault Handler

Caused during to memory access – either during instruction fetch or data access, the priority level can be configured by the firmware

UsageFault Handler

Caused during to instruction executing, the priority level can be configured by the firmware. The handler is called when one of the following errors occurs

  • Presence of an undefined instruction
  • Performing an illegal unaligned access
  • Core in invalid state on instruction execution
  • An error occurring on exception return.
  • Doing an unaligned address on word and halfword memory access
  • Performing division by zero

SVCall Handler

Known as the Supervisor Call, this handler is called up on the core executing a SVC instruction. This is typically used in OS environments to execute system services.

PendSV Handler

This is typically used in OS environments to perform context switching.

SysTick Handler

The ARM Cortex M core defines a specialize timer module to keep track of the System time. This handler is executed once this timer value reaches 0.

With this understanding of Cortex M vector table, now we will see how the firmware handles exceptions in software.

Cortex M Vector Table

To practically understand Cortex M Interrupt handling, we will take an example of software implementation of FreeRTOS running on NXP K66 MCU compiled using GCC tool chain. The first and foremost step is to define the vector table and place it in the Vector base location. The vector table for the device looks like this:

__attribute__ ((used, section(“.isr_vector”)))

void (* const g_pfnVectors[])(void) = {

// Core Level – CM4

&_vStackTop,                            // The initial stack pointer

ResetISR,                                   // The reset handler

NMI_Handler,                           // The NMI handler

HardFault_Handler,                   // The hard fault handler

MemManage_Handler,              // The MPU fault handler

BusFault_Handler,                     // The bus fault handler

UsageFault_Handler,                 // The usage fault handler

0,                                                // Reserved

0,                                                // Reserved

0,                                                // Reserved

0,                                                // Reserved

SVC_Handler,                           // SVCall handler

DebugMon_Handler,                // Debug monitor handler

0,                                               // Reserved

PendSV_Handler,                     // The PendSV handler

SysTick_Handler,                     // The SysTick handler

 

// Chip Level – MK66F18

DMA0_DMA16_IRQHandler,       // 16 : DMA Channel 0, 16 Transfer Complete

DMA1_DMA17_IRQHandler,       // 17 : DMA Channel 1, 17 Transfer Complete

DMA2_DMA18_IRQHandler,       // 18 : DMA Channel 2, 18 Transfer Complete

:

:

UART0_RX_TX_IRQHandler,      // 47 : UART0 Receive/Transmit interrupt

:

:

CAN1_Tx_Warning_IRQHandler,  // 113: CAN1 Tx warning interrupt

CAN1_Rx_Warning_IRQHandler,  // 114: CAN1 Rx warning interrupt

CAN1_Wake_Up_IRQHandler,      // 115: CAN1 wake up interrupt

 

}; /* End of g_pfnVectors */

As it can be seen, the interrupt handlers are clubbed together as an array with the address to the top of the stack (lowest address as it grows towards higher address) as the first element. This array is placed in address 0, via linker script mechanism. In this example, using the section (.isr_vector) keyword, the location of the vector table is set to 0.

Also, it can be noted that there are hundreds of interrupt handlers supported. Even this example has close to 115 handlers. Obviously, a firmware system will not have all these implemented, hardly 20-30 interrupts will be used in a system. Instead of asking the user to define handlers, the start-up code provided by the silicon vendors, will have dummy functions (a while (1) loop) defined with weak reference.

weak void NMI_Handler(void)

{ while(1) {}

}

WEAK_AV void SVC_Handler(void)

{ while(1) {}

}

So, when the developer defines an interrupt handler with the same name in his code, this weak function will be discarded and user-defined function linked against instead.

Cortex M Interrupt Handling

With the vector table installed, the functions that are needed can be added one by one in the firmware. As a first step, the Reset Handler has to be created. At typical, reset handler will look like as below.

void ResetISR(void) {

// Disable interrupts

__asm volatile (cpsid i”);

// If __USE_CMSIS defined, then call CMSIS SystemInit code

SystemInit();

//

// Copy the data sections from flash to SRAM.

//

unsigned int LoadAddr, ExeAddr, SectionLen;

unsigned int *SectionTableAddr;

// Load base address of Global Section Table

SectionTableAddr = &__data_section_table;

// Copy the data sections from flash to SRAM.

while (SectionTableAddr < &__data_section_table_end) {

LoadAddr = *SectionTableAddr++;

ExeAddr = *SectionTableAddr++;

SectionLen = *SectionTableAddr++;

data_init(LoadAddr, ExeAddr, SectionLen);

}

// At this point, SectionTableAddr = &__bss_section_table;

// Zero fill the bss segment

while (SectionTableAddr < &__bss_section_table_end) {

ExeAddr = *SectionTableAddr++;

SectionLen = *SectionTableAddr++;

bss_init(ExeAddr, SectionLen);

}

// Reenable interrupts

__asm volatile (cpsie i”);

main();

//

// main() shouldn’t return, but if it does, we’ll just enter an infinite loop

//

while (1) {

;

}

}

This minimalistic handler, disables all interrupts up on entry, configures the core and major peripherals via SystemInit function. Then it initializes the data and bss sections. Finally, it enables the interrupts before jumping to the main function.

Now we will see how a peripheral is configured for interrupt operation based on the Systick unit.

uint32_t SysTick_Config(uint32_t ticks)

{

if ((ticks – 1UL) > SysTick_LOAD_RELOAD_Msk)

{

return (1UL); /* Reload value impossible */

}

SysTick->LOAD = (uint32_t)(ticks – 1UL); /* set reload register */

NVIC_SetPriority (SysTick_IRQn, (1UL << __NVIC_PRIO_BITS) – 1UL); /* set Priority for Systick Interrupt */

SysTick->VAL = 0UL; /* Load the SysTick Counter Value */

SysTick->CTRL = SysTick_CTRL_CLKSOURCE_Msk |

SysTick_CTRL_TICKINT_Msk |

SysTick_CTRL_ENABLE_Msk; /* Enable SysTick IRQ and SysTick Timer */

return (0UL); /* Function successful */

}

As the code listing shows, the function configures various registers needed for proper operation of the Systick interrupt such as LOAD, VAL, CTRL etc. IT=t also configures the priority of the interrupt. As mentioned earlier, in Cortex M architecture, each of the interrupts has an associated priority. Depending on the implementation, there could be n number of bits corresponding to each interrupt number. Lower the number, higher the priority/urgency and can be set via the NVIC_SetPriority CMSIS API. With this mechanism, it is possible for the interrupts to be prioritized and only the higher priority one will be serviced if more than one interrupt is pending at the same time.

Typical handler for SysTick looks like:

void SysTickHandler( void )

{

/* The SysTick runs at the lowest interrupt priority, so when this interrupt

executes all interrupts must be unmasked. There is therefore no need to

save and then restore the interrupt mask value as its value is already

known. */

         portDISABLE_INTERRUPTS();

         {

                    /* Increment the RTOS tick. */

                    if( xTaskIncrementTick() != pdFALSE )

                   {

                                  /* A context switch is required. Context switching is performed in the PendSV interrupt. Pend the PendSV interrupt. */

                            portNVIC_INT_CTRL_REG = portNVIC_PENDSVSET_BIT;

                  }

         }

     portENABLE_INTERRUPTS();

}

Cortex M Interrupt Handling via Assembly

So far, we have not writing a single line of assembly code and still able to do all the required functionality in software. In some cases, there will be a need to do assembly code. In these cases, the handler can be defined as naked function (Which will not generate any stack push/pop and return codes) and implement them in assembly.

For example, in FreeRTOS, the SVC Handler implementation is as below:

void vPortSVCHandler( void ) __attribute__ (( naked ));

void vPortSVCHandler( void )

{

     __asm volatile (

     ldr r3, pxCurrentTCBConst2 \n” /* Restore the context. */

     ldr r1, [r3] \n” /* Use pxCurrentTCBConst to get the pxCurrentTCB address. */

     ldr r0, [r1] \n” /* The first item in pxCurrentTCB is the task top of stack. */

     ldmia r0!, {r4-r11, r14} \n” /* Pop the registers that are not automatically saved on exception entry and the critical nesting count. */

     msr psp, r0 \n” /* Restore the task stack pointer. */

     isb \n”

     mov r0, #0 \n”

     msr basepri, r0 \n”

     bx r14 \n”

     ” \n”

     ” .align 4 \n”

     “pxCurrentTCBConst2: .word pxCurrentTCB \n”

     );

}

Thus with compiler extensions, it is possible to include assembly based interrupt handling as well.

Switching Vector tables

In the above examples, we have noted that the vector table is located at address 0. But there are cases, where we will need to place it at a different location. For example, the flash location could be at address 0, and the RAM, where the user wants to place the vector table could be elsewhere. Or it could be a dual A/B firmware or a bootloader/application firmware each sitting at a different location. So, based on the current code being executed, the vector table location differs.

ARM provides a simple mechanism to switch the base address of the vector table. It provides a Vector table offset register at address 0xE000ED08 in the NVIC group, where the base address can be programmed. For example, to switch the vector location to address 0x20000000, following lines will suffice.

Saravana Pandian Annamalai
27. October 2016 · Write a comment · Categories: ARM, Embedded Software, Technology · Tags: ,

With an understanding of ARM registers and Exception model, now we are ready to explore the ARM interrupt controllers. These are the modules that sit in between the interrupt sources (peripherals) and ARM cores deciding how to route the interrupts to.

The ARM core accepts only two input signals handling unscheduled interrupts from the external systems – nIRQ and nFIQ. If IRQ is asserted, the system enters IRQ mode as discussed earlier or if FIQ is asserted, FIQ mode is entered.

It is up to the ARM licences i.e SoC manufacturer to decide the mechanism to route the interpret signals to the core. There are many ways to implement the interrupt controllers each of which are discussed in detail in this blog.

Vendor Specific Model

In the early ARM implementations where there used to be only one core in general, the logic for this routing of interrupts to the core is done by the SoC manufacturer mostly based on their design philosophy. More likely there will be a set of following registers

  • RAW Interrupt Status register
  • Interrupt Enable Register
  • Interrupt Status Register
  • IRQ Priority Encoding
  • FIQ/IRQ Selection

Up on assertion of any interrupt line, the interrupt source is checked if it is configured as FIQ. If so, the signal is routed to the core immediately. If it is an IRQ interrupt, ARM interrupt status is updated. If multiple interrupts occur simultaneously, the priority is resolved between other pending interrupts and finally the core is given the interrupt signal.

The below diagram gives a model implementation for the Interrupt Controller by Freescale called the Interrupt Collector (ICOLL) used in Freescale/NXP iMx233/iMx28 series of MCUs.

Freescale Interrupt Collector

Vendor specific Interrupt Controller

Vectored Interrupt Controller – VIC

ARM itself came up with a model called Vectored Interrupt Controller (VIC). Sitting directly on the AMBA High Speed bus, the latency is significantly reduced. Being an early generation controller it supports 32 interrupt sources each of which can be routed to either FIQ or IRQ signals. A unique feature of VIC is that, as it name suggests, it supports 16 vectored interrupts. In this there are 16 registers where the address of the corresponding interrupt service routines (ISR) can be saved. Based on the priority, the VIC identifies the high priority interrupt and loads its ISR address to a register called VICVectAddr. The firmware can simply use a LDR PC instruction to jump to this ISR. This saves a lot of software effort in branching to the ISR there by reducing latency.

Interrupt Controller from ARM

Vectored Interrupt Controller (VIC)

But VIC supports only level sensitive interrupts that must remain active HIGH till the ISR services it. Thus for these reasons, many SoC designers preferred their own implementation rather than the VIC.

Generic Interrupt Controller – GIC

As processors evolved, soon the number of interrupts became quiet large and also it was not uncommon to have more than 1 core (either symmetric or asymmetric), there was a need for a more standardized way of handling interrupts.

ARM defines Generic Interrupt Controller that suits this need. Though vendors are still free to choose their own mechanism, GIC has become very popular and is almost present in all modern SoCs. It consists of primarily two components – Distributor and CPU interfaces. The primary functionalities of the same include

Distributor:

This is the peripheral facing component that is available as only one implementation (instance). It is responsible for managing interrupts in the whole system and decides priorities between them and routing mechanism of the same.

CPU Interfaces:

For each CPU core available, there is a corresponding CPU Interface present bridging the Distributor interface with the core. It implements the priority masking for the processor.

The below diagram explains the same.

 

ARM's Interrupt Controller for Multi-core SoCs

ARM Generic Interrupt Controller (GIC)

The interrupts are identified by unique ID and could be in any one of the following 4 states (as explained by the GIC Architecture Specification):

  • Inactive: An interrupt that is not active or pending.
  • Pending: An interrupt from a source to the GIC that is recognized as asserted in hardware or generated by software and is waiting to be serviced by a target processor.
  • Active: An interrupt from a source to the GIC that has been acknowledged by a processor, and is being serviced but has not completed.
  • Active and pending: A processor is servicing the interrupt and the GIC has a pending interrupt from the same source.

Based on the source, there are three major types of interrupts defined.

Shared Peripheral Interrupts – SPI: These interrupts sources are typically from different peripherals in the system. They can be routed to (i.e shared with) any one or more of the cores as per the requirement and will be handled suitably. For example, UART0 and UART1 interrupts could be SPI and configured in the Distributor to be routed to two available cores. Up on UART0 interrupt, the signal is routed to first available processor. If at that instance, UART1 interrupt is received, the distributor routes it to the other core for handling.

These are assigned Interrupt ID from 32 to 1019.

Private Peripheral Interrupts – PPI: There could be interrupts that are only specific to one processor. In such cases, they are routed to PPI of only that processor. For example in an asymmetric system with a Cortex A5 and Cortex M4, a Watchdog interrupt corresponding to A5 will be routed only to the A5 core. There might be no need to share it with the M4. Hence it will be routed as a PPI.

The interrupt ID is defined from 16 to 31.

Software Generated Interrupts – SGI: ARM defines interrupt IDs 0 through 15 specifically for Inter processor communication. It is possible a SGI can be routed to one or more processors through the Distributor.

The Interrupts 0 to 31 are banked by the distributor for each CPU Interface i.e. each processor sees them different and are identified by the CPUID. For example, PPI 16 could be pending in CPU0 but not in CPU1. Whereas, in case of the SPI, it will be same across the CPU’s as they are not banked.

Irrespective of all these, each core is provided signal through either nIRQ or nFIQ lines for interrupting program execution.

Nested Vectored Interrupt Controller – NVIC

While the above implementations are suitable for powerful processors, there is a need for specialized handling in microcontroller profiles that typically run in sub-100MHz speed and with few tens of kilobytes of RAM and flash. It is important to reduce the interrupt latency and to leverage the fact that the number of peripherals is less and hence fewer interrupt sources. For that, ARM defines a NVIC model for the Cortex-M implementations.

In Cortex-M implementation, the interrupt service routine addresses are to be provided in a set of consecutive addresses at offsets corresponding to the vector number. As soon as the interrupt signals are received, the NVIC finds the ISR corresponding to the highest priority interrupt and jumps to it.

The Cortex-M core accepts only nIRQ interrupt and there is no option for a FIQ.

With our understanding of hardware implementation, in the upcoming blogs, we will discuss software mechanism in handling interrupts in ARM architectures.

As discussed in the previous blog, the LEDs in a DMD panel are organized as small groups in a matrix form. With this the number of pins and power required are significantly reduced. But it is preferable to be able to control with 4-5 pins so that even an 8 bit MCUs can manage such display. This calls for a bit complex electronics in the panel and software in the MCU, but should be manageable in terms of cost and usability.

This blog covers the internal circuitry inside DMD in detail. The block diagram of the overall set up is depicted below.

DMD - System Block Diagram

Dot Matrix Display – Overall Setup Block Diagram

As it can be seen, there are shift registers and demulitplexers used to simplify the effort in MCU. The dual P-channel MOSFET feeds the positive supply voltage to the LED anodes which denotes the row control and the shift registers provides the return path for the LED’s which denotes the column control. The MCU controls the demultiplexer with GPIO interface and interfaces with the shift registers through serial interface, most commonly SPI. More details of them are covered below. For further explanation, we will consider a DMD panel of 512 LEDs, each row is a collection of anode of 32 LEDs and each column is a collection of cathode of 4 groups of 4 LEDs (one column of 16 LEDs divided into 4 groups each of 4 LEDs for reducing IO pins required). The following figure depicts the arrangement of 16X32 LED panel

Row and Column arrangement in 16 X 32 DMD panel

16 X 32 DMD panel arrangement

Role of Serial Shift Register (74HC595)

The purpose of the shift register is to reduce the number of GPIOs required to drive the column of the LED matrix. 74HC595 is an 8-bit serial shift register with output latches and storage register.  The block diagram of the shift register commonly used inside DMD is given below (courtesy of 74HC595 datasheet)

74HC595 shift register block diagram

Serial Shift Register – Block Diagram

With one shift register we get 8 GPIOs possible. Hence for driving 32 columns of 16 LEDs, (total 128 GPIOs) we require 16 serial shift register. SPI clock, SPI MOSI and a latch signal acts a shift register inputs. The main advantage of this serial shift register is that it outputs the serial input fed to it in its serial output pin only when the latch signal is provided. The serial output of the first shift register is connected to the serial input second register. Likewise 16 shift registers are cascaded in series resulting in 128 GPIO pins.

Each of the 128 pin will in turn control 4 LEDs in a column resulting 512 LEDs on the whole. The data to be displayed can be fed as a 128 bit data with 128 clock pulse to the serial shift register. The data will not appear on the output unless the data is transferred to the storage register. Only upon the positive transition of the latch signal, the data will be transferred to the storage register. The data will automatically appear on the output since the output enable pin is permanently grounded.

Since the shift register corresponds to the control of 4 groups of LEDs in 32 columns, the Demultiplexer IC is required to drive the LEDs in 16 rows corresponding to the required data bits.

The following figure depicts the serial shift register circuitry in 16X32 DMD panel, with 128 output lines. Each shift register outputs has 8 outputs, hence 16 shift registers are serially connected for 128 output lines for column control. One output line is connected to 1 group of 4 LEDs in a column. Hence 128 output lines will be connected to 128 groups of 4 LEDs across 32 columns (i.e. one column has 4 groups of 4 LEDs).

Serial shift register with SPI interface

Serial Shift Register Circuit in 16 X 32 DMD panel

Role of Demultiplexer

Demultiplexer is dedicated for row control. 74HC138 is a 3 to 8 line demultiplexer with eight mutually exclusive inverting outputs. Out of three available address inputs only two inputs are selected for 4 individual inverting outputs. The four outputs from the demux will control the gates of four dual P-channel MOSFET where we get 4 pairs of drive outputs which in turn will drive the necessary current to the LEDs in the 16 rows. Finally there are 4 individual sets of multiplexed rows within the DMD. The block diagram of demultiplexer is depicted below (courtesy of 74HC138 datasheet)

3 to 8 line demultiplexer block diagram

Demultiplexer Block Diagram

With this arrangement only four rows will be illuminated at a time while the other is not illuminated. Hence the values of 4 outputs from the demux should be toggled periodically to illuminate all the sets of multiplexed rows. With persistence of human eye, if the LEDs are refreshed once around 20 ms, it is sufficient to show a flicker free display.

Following figure shows the four individual sets of the multiplexed rows inside DMD. The Color coding differentiates the 4 multiplexed rows and the four demux outputs are connected in the following manner

Y0 – connected to the rows R16, R12, R8, R4

Y1 – connected to the rows R15, R11, R7, R3

Y2 – connected to the rows R14, R10, R6, R2

Y3 – connected to the rows R13, R9, R5, R1

Row multiplex inside Dot Matrix Display

Multiplexed rows inside DMD

The demultiplexer input/output combination and the DMD row illumination sequence is illustrated in the following table

DMD-TB

Example

  1. Consider a 32X16 DMD panel
  2. The following figures illustrates the sequence of bit shifting

A single data bit is shifted in to the DMD and it is effectively present in the [R16, C8] location.

Bit 1 shifted into R16, C8

First Bit shift into DMD

On further data in, the Old data moves one bit to [R16, C7] and new data bit is loaded at [R16, C8]

Bit 2 loaded at R16, C8

After 2nd bit input

After the input of 9th bit, the first bit is moved to the twelfth row at [R12, C8] and the bit 9 is loaded at [R16, C8]

Bit 9 loaded at R16, C8

After 9th Bit input

Similarly with 32nd bit input, the bit one is moved to the forth row at [R4, C1] and the bit 32 is loaded at [R16, C8]

Bit 32 is loaded at R16, C8

After 32nd bit input

Upon the input of 33rd bit, the bit one is again moved to the sixteenth row but this time at [R16, C16] and the new bit 33 is loaded at [R16, C8]

New bit 33 is loaded at R16, C8

After 33rd bit input

Likewise on input of 128 bits, the bit 1 is moved to the consecutive rows and columns till [R4, C25] and the bit 128 is loaded at [R16, C8]

Bit 128 is loaded at R16, C8

After 128th bit input

Thus the entire pattern to be displayed can be loaded bit by bit. Running even at a low clock frequency of 8KHz, the 128 bits can be easily shifted in 16 ms, more than needed for human eye to detect the change.

DMD – Daisy Chain

It is possible to connect the multiple DMD panels in series using ribbon cables. This is called daisy-chaining. The number of the DMD in series is limited to the RAM size and the SPI clock frequency.
Even though the DMD can also come with multiple LEDs of varying color such as RED, GREEN, BLUE, etc, the underlying connection mechanism is same and each of the colored LED’s are controlled separately.

Now that we can control each LED of the DMD display independently, we can create any pattern to be displayed. In the upcoming blogs, we will discuss in detail about the software based control mechanism and creating rolling displays.