STM32 Getting started

Introduction

This post begins a series of investigation and examples for the STM32 processors. Specifically I will be looking at STM32F405RGT6 which is the processor used on the VESC (by Benjamin Vedder) which I have bought one copy of from ebay. Thus the long goal will be to replace the software on the board by my own. In order to debug software I prefer having an evaluation board, and for this purpose I have bought the CorexxxR (also on ebay). This board give me access to the processor pins directly by a logic analyser.

For the first part of this series we will only be using the CorexxxR board with the following equipment:

  • Segger J-link (JTAG) or ST-link v.2
  • Logic analyser
  • USB to RS232 TTL Uart
  • GCC toolchain can be downloaded at https://launchpad.net/gcc-arm-embedded.
  • Any editor can be used, I prefer Emacs or Vim, but that’s just a personal choice.

First example

The first program simply toggles a gpio port at a constant rate. Since the development board does not contain a LED I will simply use the logic analyser for verification of the program.

In the spirit of this series we will begin by writing a minimal program in assembly code. The first thing we need to be aware of is the vector table located at the beginning of program memory (flash). Each address (entry) in the vector table contain a jump address to a function that is called when an interrupt occur. The exception is the first address which contain the initial stack pointer value (automatically loaded after reset).

Addr Description
0x0 Initial stack pointer value (automatically loaded after reset)
0x4 Jump address to start of our code
0x8 NMI handler
0xc Hard fault handler
0x10 Memory fault handler
0x14 Bus fault handler
0x18 Usage fault handler

Normally the vector table would be even bigger as it would contain jump addresses of other interrupt routines, but we won’t enable those, hence we will only need to implement the ones we cannot mask. Basically all our fault handlers will point to the same function, which basically will be an infinite loop – making the program hang.

As kind of an aside.. there is a little trick to the jump addresses in the vector table.. because of thumb-2 they need to be odd addresses.. so we artificially will plus 1 to the address of the function in our code.. (Thumb-2 demands that any value loaded into the program counter has the least significant bit set..)

With that out of the way, we can discuss our main program. When the processor boots it will tick at a clock frequency of 8-16Mhz (fix this), and we won’t bother changing this for our first example. What we need to do however is to enable the clock for the peripheral we will use. Then we need to configure the general purpose IO port to push-pull output mode. And at last we need to actually toggle the port and make a suitable delay loop so we actually can observe the port toggling.

Now we need to dig out the addresses of the registers which we will need to write to in order to do the above.. It turns out that there are only a few of those.. and their addresses can be found in the reference manual beloning to the processor of choice.

// Directives

    .thumb
        .syntax unified

 // Equates
        .equ RCC_APB1ENR,   0x40023800 + 0x30

        .equ GPIOC_OFFSET,  0x40020800
        .equ GPIOC_MODER,   GPIOC_OFFSET +  0x0
        .equ GPIOC_OTYPER,  GPIOC_OFFSET +  0x4
        .equ GPIOC_OSPEEDR, GPIOC_OFFSET +  0x8
        .equ GPIOC_PUPDR,   GPIOC_OFFSET +  0xc
        .equ GPIOC_ODR,     GPIOC_OFFSET +  0x14

        .equ STACKINIT,     0x20020000 //  End of RAM
        .equ LEDDELAY,      800000
.section .text
        .org 0

// Vectors
vectors:
        .word STACKINIT          // stack pointer value when stack is empty
        .word _start + 1         // reset vector (manually adjust to odd for thumb)
        .word _nmi_handler + 1  
        .word _hard_fault  + 1  
        .word _memory_fault + 1 
        .word _bus_fault + 1    
        .word _usage_fault + 1  

_start:
        //  Enable the Port C peripheral clock by setting bit 3
        ldr r6, = RCC_APB1ENR
        mov r0, 0x4
        str r0, [r6]

        //  Set the config and mode bits for Port C bit 4 so it will
        //  be a push-pull output (up to 50 MHz) by setting bits 19-16
        //  to '0011'.

        ldr r6, = GPIOC_MODER
        ldr r0, = 256 //  port 4 to output mode, all others to input mode
        str r0, [r6]

        ldr r6, = GPIOC_OTYPER
        ldr r0, = 0 //  output push-pull (all ports) not really necessary to write.. 
        str r0, [r6]

        ldr r6, = GPIOC_OSPEEDR
        ldr r0, = 0  //  low speed
        str r0, [r6]

        ldr r6, = GPIOC_PUPDR
        ldr r0, = 256  // pull-up
        str r0, [r6]


        //  Load R2 and R3 with the "on" and "off" constants

        mov r2, 0x10          //      value to turn on LED
        mov r3, 0x0         //  value to turn off LED

        ldr r6, = GPIOC_ODR   //    point to Port C output data register

loop:
        str r2, [r6]           //  set Port C, pin 4, turning on LED
        ldr r1, = LEDDELAY
delay1:
        subs r1, 1
        bne delay1

        str r3, [r6]           //  clear Port C, pin 4, turning off LED
        ldr r1, = LEDDELAY
delay2:
        subs r1, 1
        bne delay2

        b loop                 //  continue forever

_dummy:                        //  if any int gets triggered, just hang in a loop
_nmi_handler:
_hard_fault:
_memory_fault:
_bus_fault:
_usage_fault:
        add r0, 1
        add r1, 1
        b _dummy
/* Simple linker script for the STM32 ARM Cortex M4.  Link the text 
   of the program into on-board flash and use on-board RAM for data and stack. 
*/

SECTIONS
{
        /* interrupt vectors start at zero */
        . = 0x0;  /* start of flash */

        .text :  {  *(.text)   }

        /* constant data follows code but still in flash */
        .data :
        { 
          *(.data) 
          *(.rom)
        }

        /* internal RAM starts at 0x20000000 */
        . = 0x20000000; 
        .ram : { *(.ram) }

        .bss :
        {
          *(.bss)
          *(.ram)
        }
}        

To create object and listing files:

~/Projects/external/gcc-arm-none-eabi-5_4-2016q3/bin/arm-none-eabi-as -mcpu=cortex-m4 -mthumb -gstabs -o ex1.o ex1.asm

To link:

~/Projects/external/gcc-arm-none-eabi-5_4-2016q3/bin/arm-none-eabi-Ld -v -T stm32.ld -nostartfiles -o ex1.elf ex1.o

To create binary:

~/Projects/external/gcc-arm-none-eabi-5_4-2016q3/bin/arm-none-eabi-objcopy -O binary ex1.elf ex1.bin

To program:

st-flash write ex1.bin 0x8000000

Congratulations – we a done!

Now the only thing left is to connect the logic analyser and verify that the program works!

A closer look at the first example

Let’s look at the generated machine code, while we’re at it.

~/Projects/external/gcc-arm-none-eabi-5_4-2016q3/bin/arm-none-eabi-objdump -D ex1.elf > ex1.lst 
ex1.elf:     file format elf32-littlearm


Disassembly of section .text:

00000000 :
   0:	20020000 	andcs	r0, r2, r0
   4:	0000001d 	andeq	r0, r0, sp, lsl r0
   8:	0000005d 	andeq	r0, r0, sp, asr r0
   c:	0000005d 	andeq	r0, r0, sp, asr r0
  10:	0000005d 	andeq	r0, r0, sp, asr r0
  14:	0000005d 	andeq	r0, r0, sp, asr r0
  18:	0000005d 	andeq	r0, r0, sp, asr r0

0000001c <_start>:
  1c:	4e12      	ldr	r6, [pc, #72]	; (68 <_bus_fault+0xc>)
  1e:	f04f 0004 	mov.w	r0, #4
  22:	6030      	str	r0, [r6, #0]
  24:	4e11      	ldr	r6, [pc, #68]	; (6c <_bus_fault+0x10>)
  26:	f44f 7080 	mov.w	r0, #256	; 0x100
  2a:	6030      	str	r0, [r6, #0]
  2c:	4e10      	ldr	r6, [pc, #64]	; (70 <_bus_fault+0x14>)
  2e:	2000      	movs	r0, #0
  30:	6030      	str	r0, [r6, #0]
  32:	4e10      	ldr	r6, [pc, #64]	; (74 <_bus_fault+0x18>)
  34:	2000      	movs	r0, #0
  36:	6030      	str	r0, [r6, #0]
  38:	4e0f      	ldr	r6, [pc, #60]	; (78 <_bus_fault+0x1c>)
  3a:	f44f 7080 	mov.w	r0, #256	; 0x100
  3e:	6030      	str	r0, [r6, #0]
  40:	f04f 0210 	mov.w	r2, #16
  44:	f04f 0300 	mov.w	r3, #0
  48:	4e0c      	ldr	r6, [pc, #48]	; (7c <_bus_fault+0x20>)

0000004a :
  4a:	6032      	str	r2, [r6, #0]
  4c:	490c      	ldr	r1, [pc, #48]	; (80 <_bus_fault+0x24>)

0000004e :
  4e:	3901      	subs	r1, #1
  50:	d1fd      	bne.n	4e 
  52:	6033      	str	r3, [r6, #0]
  54:	490a      	ldr	r1, [pc, #40]	; (80 <_bus_fault+0x24>)

00000056 :
  56:	3901      	subs	r1, #1
  58:	d1fd      	bne.n	56 
  5a:	e7f6      	b.n	4a 

0000005c <_bus_fault>:
  5c:	f100 0001 	add.w	r0, r0, #1
  60:	f101 0101 	add.w	r1, r1, #1
  64:	e7fa      	b.n	5c <_bus_fault>
  66:	38300000 	ldmdacc	r0!, {}	; 
  6a:	08004002 	stmdaeq	r0, {r1, lr}
  6e:	08044002 	stmdaeq	r4, {r1, lr}
  72:	08084002 	stmdaeq	r8, {r1, lr}
  76:	080c4002 	stmdaeq	ip, {r1, lr}
  7a:	08144002 	ldmdaeq	r4, {r1, lr}
  7e:	35004002 	strcc	r4, [r0, #-2]
  82:	Address 0x0000000000000082 is out of bounds.


Disassembly of section .ARM.attributes:

00000000 <.ARM.attributes>:
   0:	00002041 	andeq	r2, r0, r1, asr #32
   4:	61656100 	cmnvs	r5, r0, lsl #2
   8:	01006962 	tsteq	r0, r2, ror #18
   c:	00000016 	andeq	r0, r0, r6, lsl r0
  10:	726f4305 	rsbvc	r4, pc, #335544320	; 0x14000000
  14:	2d786574 	cfldr64cs	mvdx6, [r8, #-464]!	; 0xfffffe30
  18:	0600344d 	streq	r3, [r0], -sp, asr #8
  1c:	094d070d 	stmdbeq	sp, {r0, r2, r3, r8, r9, sl}^
  20:	Address 0x0000000000000020 is out of bounds.


Disassembly of section .stab:

00000000 <.stab>:
   0:	00000001 	andeq	r0, r0, r1
   4:	001f0000 	andseq	r0, pc, r0
   8:	00000009 	andeq	r0, r0, r9
   c:	00000001 	andeq	r0, r0, r1
  10:	00000064 	andeq	r0, r0, r4, rrx
    ...
  1c:	00210044 	eoreq	r0, r1, r4, asr #32
  20:	0000001c 	andeq	r0, r0, ip, lsl r0
  24:	00000000 	andeq	r0, r0, r0
  28:	00220044 	eoreq	r0, r2, r4, asr #32
  2c:	0000001e 	andeq	r0, r0, lr, lsl r0
  30:	00000000 	andeq	r0, r0, r0
  34:	00230044 	eoreq	r0, r3, r4, asr #32
  38:	00000022 	andeq	r0, r0, r2, lsr #32
  3c:	00000000 	andeq	r0, r0, r0
  40:	00290044 	eoreq	r0, r9, r4, asr #32
  44:	00000024 	andeq	r0, r0, r4, lsr #32
  48:	00000000 	andeq	r0, r0, r0
  4c:	002a0044 	eoreq	r0, sl, r4, asr #32
  50:	00000026 	andeq	r0, r0, r6, lsr #32
  54:	00000000 	andeq	r0, r0, r0
  58:	002b0044 	eoreq	r0, fp, r4, asr #32
  5c:	0000002a 	andeq	r0, r0, sl, lsr #32
  60:	00000000 	andeq	r0, r0, r0
  64:	002d0044 	eoreq	r0, sp, r4, asr #32
  68:	0000002c 	andeq	r0, r0, ip, lsr #32
  6c:	00000000 	andeq	r0, r0, r0
  70:	002e0044 	eoreq	r0, lr, r4, asr #32
  74:	0000002e 	andeq	r0, r0, lr, lsr #32
  78:	00000000 	andeq	r0, r0, r0
  7c:	002f0044 	eoreq	r0, pc, r4, asr #32
  80:	00000030 	andeq	r0, r0, r0, lsr r0
  84:	00000000 	andeq	r0, r0, r0
  88:	00310044 	eorseq	r0, r1, r4, asr #32
  8c:	00000032 	andeq	r0, r0, r2, lsr r0
  90:	00000000 	andeq	r0, r0, r0
  94:	00320044 	eorseq	r0, r2, r4, asr #32
  98:	00000034 	andeq	r0, r0, r4, lsr r0
  9c:	00000000 	andeq	r0, r0, r0
  a0:	00330044 	eorseq	r0, r3, r4, asr #32
  a4:	00000036 	andeq	r0, r0, r6, lsr r0
  a8:	00000000 	andeq	r0, r0, r0
  ac:	00350044 	eorseq	r0, r5, r4, asr #32
  b0:	00000038 	andeq	r0, r0, r8, lsr r0
  b4:	00000000 	andeq	r0, r0, r0
  b8:	00360044 	eorseq	r0, r6, r4, asr #32
  bc:	0000003a 	andeq	r0, r0, sl, lsr r0
  c0:	00000000 	andeq	r0, r0, r0
  c4:	00370044 	eorseq	r0, r7, r4, asr #32
  c8:	0000003e 	andeq	r0, r0, lr, lsr r0
  cc:	00000000 	andeq	r0, r0, r0
  d0:	003c0044 	eorseq	r0, ip, r4, asr #32
  d4:	00000040 	andeq	r0, r0, r0, asr #32
  d8:	00000000 	andeq	r0, r0, r0
  dc:	003d0044 	eorseq	r0, sp, r4, asr #32
  e0:	00000044 	andeq	r0, r0, r4, asr #32
  e4:	00000000 	andeq	r0, r0, r0
  e8:	003f0044 	eorseq	r0, pc, r4, asr #32
  ec:	00000048 	andeq	r0, r0, r8, asr #32
  f0:	00000000 	andeq	r0, r0, r0
  f4:	00420044 	subeq	r0, r2, r4, asr #32
  f8:	0000004a 	andeq	r0, r0, sl, asr #32
  fc:	00000000 	andeq	r0, r0, r0
 100:	00430044 	subeq	r0, r3, r4, asr #32
 104:	0000004c 	andeq	r0, r0, ip, asr #32
 108:	00000000 	andeq	r0, r0, r0
 10c:	00450044 	subeq	r0, r5, r4, asr #32
 110:	0000004e 	andeq	r0, r0, lr, asr #32
 114:	00000000 	andeq	r0, r0, r0
 118:	00460044 	subeq	r0, r6, r4, asr #32
 11c:	00000050 	andeq	r0, r0, r0, asr r0
 120:	00000000 	andeq	r0, r0, r0
 124:	00480044 	subeq	r0, r8, r4, asr #32
 128:	00000052 	andeq	r0, r0, r2, asr r0
 12c:	00000000 	andeq	r0, r0, r0
 130:	00490044 	subeq	r0, r9, r4, asr #32
 134:	00000054 	andeq	r0, r0, r4, asr r0
 138:	00000000 	andeq	r0, r0, r0
 13c:	004b0044 	subeq	r0, fp, r4, asr #32
 140:	00000056 	andeq	r0, r0, r6, asr r0
 144:	00000000 	andeq	r0, r0, r0
 148:	004c0044 	subeq	r0, ip, r4, asr #32
 14c:	00000058 	andeq	r0, r0, r8, asr r0
 150:	00000000 	andeq	r0, r0, r0
 154:	004e0044 	subeq	r0, lr, r4, asr #32
 158:	0000005a 	andeq	r0, r0, sl, asr r0
 15c:	00000000 	andeq	r0, r0, r0
 160:	00560044 	subseq	r0, r6, r4, asr #32
 164:	0000005c 	andeq	r0, r0, ip, asr r0
 168:	00000000 	andeq	r0, r0, r0
 16c:	00570044 	subseq	r0, r7, r4, asr #32
 170:	00000060 	andeq	r0, r0, r0, rrx
 174:	00000000 	andeq	r0, r0, r0
 178:	00580044 	subseq	r0, r8, r4, asr #32
 17c:	00000064 	andeq	r0, r0, r4, rrx

Disassembly of section .stabstr:

00000000 <.stabstr>:
   0:	31786500 	cmncc	r8, r0, lsl #10
   4:	6d73612e 	ldfvse	f6, [r3, #-184]!	; 0xffffff48
    ...

Trick for scaling in embedded processors

Suppose you have an ADC signal e.g 12-bit value that needs to be scaled. If your processor is for instance an 8-bit AVR and/or you have few cycles to waste on scaling you cannot afford using floating point arithmetic and not even division. Often the following scheme can be used with sufficient accuracy:

Suppose it is a current you want to find with the following “ideal” floating point scaling

curr_A = adc_raw_value * 0.121

This could be done in integer math by: curr_A = adc_raw_value * 121/1000, but that would use divisions which on many processors are a costly operation. Instead of division by 1000 we should use division by say 1024 (or any other 2^k number) – then division is simply replaced by a shift operation which is usually very fast. Hence, by trial and error we search for a number “k” s.t curr_A = adc_raw_value * (0.121*2^k)/2^k introduce the lowest rounding error instead. Just make sure that “k” is not so large to cause overflow…

 

Fun with Factor and cellular automata

As a follow up on my last post – I created a Github repository for code that can be used to visualise cellular automata in Factor (using OpenGl).

https://github.com/lonelab/cellularautomata

Just copy the contents into the factor source tree under work/cellular and you should be able to run it the usual way… (see http://www.factorcode.org)

Then pictures like these can be produced: