• Docker-based development environment IDE support

    I’ve been talking quite a bit lately about Docker-based build environments (in Hamburg, in Munich, and at many of our clients).

    One of the biggest drawbacks of the technique is the poor integration story for IDEs. Many IDEs require that your build environment (eg. target JVM and associated build tools) is installed locally to enable all of their useful features like code completion and test runner integration. But if this is isolated away in a container, the IDE can’t access it, so all these handy productivity features won’t work.

    However, it looks like JetBrains in particular is starting to integrate these ideas into their products more:

    • WebStorm will now allow you to configure a ‘remote’ Node.js interpreter in a local Docker image (details here)
    • RubyMine takes this one step further: you can configure a Ruby interpreter based on a service definition in a Docker Compose file (details here). A similar feature is available for Python in PyCharm.

    Both of these are great steps forward, and if you’re using Docker-based build environments, I’d encourage you to take a look at this.

    Now, if only they’d do this for JVMs in IntelliJ…

  • Docker's not just for production: using containers for your development environment

    Tonight I spoke at the ThoughtWorks Munich meetup, presenting an updated version of my presentation Docker’s not just for production: using containers for your development environment. There are two major changes over the previous version:

    • I’ve added a section about using containers to manage test environments, in addition to the existing content about build environments
    • I’ve reworked the structure of the presentation based on feedback to focus more on the technique itself rather than a discussion of existing alternatives

    The slides are available to download as a PDF, and the sample environment I used during the presentation is available on GitHub.

  • Docker's not just for production: using containers for your development environment

  • A deeper look at the STM32F4 project template: the sample application

    Over three months ago (yeah… sorry about that) we took a look at the linker script for the STM32F4 project template. I promised that we’d examine the sample application next.

    Note: the sample app assumes that you’re using the STM32F4 Discovery development board. If you’re not using that board, you should still be able to easily follow along, but some of the pin assignments might be slightly different.

    The app is pretty straightforward: all it does is blink the four LEDs on the board in sequence, like this:

    Blinking LEDs

    If you’ve familiar with Arduinos, you’d probably expect to have something along these lines, perhaps without the repetition:

    while (1) {
    	digitalWrite(LED_1, HIGH);
    	digitalWrite(LED_1, LOW);
    	digitalWrite(LED_2, HIGH);
    	digitalWrite(LED_2, LOW);
    	digitalWrite(LED_3, HIGH);

    However, I’ve taken a different approach. While it’s definitely possible to do something like that, I wanted to illustrate the use of the timer hardware and interrupts.


    Most of the work is just configuring all the peripherals we need, and this all happens in main():

    void main() {
    	for (auto i = 0; i < pinCount; i++) {
    		enableOutputPin(GPIOD, pins[i]);
    	setPrescaler(TIM2, 16 - 1); // Set scale to microseconds, based on a 16 MHz clock
    	setPeriod(TIM2, millisecondsToMicroseconds(300) - 1);

    What does this all mean? Why is it necessary? (I’ve omitted the individual method definitions above and below for the sake of brevity, but you can find them in the source.)

    • enableGPIOD(): Like most peripherals on ARM CPUs, GPIO banks (groups of I/O pins) are turned off by default to save power. All four of the LEDs are in GPIO bank D, so we need to enable it.

    • enableOutputPin(GPIOD, pins[i]): just like pinMode() for Arduino, we need to set up each GPIO pin. Each pin can operate in a number of modes, so we need to specify which mode we want to use (digital input and output are the most common, but there are some other options as well).

    • enableTIM2(): just like for GPIO bank D, we need to enable the timer we want to use (TIM2). We’ll use the timer to trigger changing which LED is turned on at the right moment.

    • enableIRQ(TIM2_IRQn) and enableTimerUpdateInterrupt(TIM2): in addition to enabling the TIM2 hardware, we also need to enable its corresponding IRQ, and select which events we want to receive interrupts for. In our case, we want timer update events, which occur when the timer reaches the end of the time period we specify.

    • setPrescaler(TIM2, 16 - 1): timers are based on clock cycles, so one clock cycle equates to one unit of time. However, that’s usually not a convenient scale to use – we’d prefer to think in more natural units like microseconds or milliseconds. So the timers have what is called a prescaler: something that scales the clock cycle time units to our preferred time units.

      In our case, the CPU is running at 16 MHz, so setting the prescaler value to 16 sets up a 16:1 scaling – 16 CPU cycles is one timer time unit. But there’s an additional complication: the value we set in the register is not exactly the divisor used. The divisor used is actually one more than the value we set, so we set the prescaler value to 15 to achieve a divisor of 16.

    • setPeriod(TIM2, millisecondsToMicroseconds(300) - 1): this does exactly what it says on the tin. We want the timer to fire every 300 ms, so we configure the timer’s period, or auto-reload value, to be 300 ms.

      The reason it’s called an ‘auto-reload value’ is due to how the timer works internally. The timer counts down ticks until its counter reaches zero, at which point the timer update interrupt fires. Once the interrupt has been handled, the auto-reload value is loaded into the counter, and the timer starts counting down again. So by setting the auto-reload value to our desired period, we’ll receive interrupts at regular intervals.

      And, just like the prescaler value, the value used is one more than the value we set, so we subtract one to get the interval we’re after.

    • enableAutoReload(TIM2): we need to enable resetting the counter with the auto-reload value, otherwise the timer will count down to zero and then stop.

    • enableCounter(TIM2): the counter won’t actually start updating its counter in response to CPU cycles until we enable the counter

    • resetTimer(TIM2): any changes we make in the timer configuration registers don’t take effect until we reset the timer, at which point it pulls in the values we’ve just configured.

    So, after all that, we’ve setup the GPIO pins for the LEDs and configured the timer. Now all we have to do is wait for the timer interrupt to fire, and then we’ll change which LED is turned on.

    You might be wondering how to find out what you need to do to use a piece of hardware. After all, there was a lot of stuff that needed to be done to set up that timer, and not all of it was particularly intuitive. The answer is usually a combination of trawling through the datasheet, looking at examples provided by the manufacturer (ST in this case) and Googling.

    Timer interrupt handling

    In comparison to the configuration of everything, actually responding to the timer interrupts and blinking the LEDs is relatively straightforward.

    First of all, we need an interrupt handler:

    extern "C" {
    	void TIM2_IRQHandler() {
    		if (TIM2->SR & TIM_SR_UIF) {

    Because this method is called directly by the startup assembly code, we have to mark it as extern "C". This means that the method uses C linkage, which prevents C++’s name-mangling from changing the name. We don’t want the name to be changed because we want to be able to refer to it by name in the assembly code. This Stack Overflow question has a more detailed explanation if you’re interested.

    The handler itself is relatively straightforward:

    • we check if the reason for the interrupt is the update event we’re interested in
    • if it is, we call out to our handler function onTIM2Tick()
    • we reset the timer interrupt – otherwise our interrupt handler will be called again straight away

    onTIM2Tick() is also straightforward:

    void onTIM2Tick() {
    	lastPinOn = (lastPinOn + 1) % pinCount;
    	for (auto i = 0; i < pinCount; i++) {
    		BitAction value = (i == lastPinOn ? Bit_SET : Bit_RESET);
    		GPIO_WriteBit(GPIOD, 1 << pins[i], value);

    All we do is loop over each of the four LEDs, turning on the next one and turning off all of the others. (GPIO_WriteBit() is a function from the standard peripherals library that does exactly what it sounds like.)

    The end

    That’s all there is to it – a lot of configuration wrangling and then it’s smooth sailing. Next time (which hopefully won’t be in another three months), we’ll take a quick look at the build system in the project template.

  • WindyCityThings wrap-up

  • WindyCityThings: TDD in an IoT World

    I’ve been selected to speak at WindyCityThings in Chicago on June 23 and 24. I’ll be speaking about ‘TDD in an IoT World’.

    The full agenda is available here – hope to see you there!

  • A deeper look at the STM32F4 project template: linking it all together

    Last time, we saw how the microcontroller starts up and begins running our code, and I mentioned that the linker script is responsible for making sure the right stuff ends up in the right place in our firmware binary. So today I’m going to take a closer look at the linker script and how it makes this happen.

    And like last time, while I’ll be using the code in the project template as an example, the concepts are broadly applicable to most microcontrollers.

    What does the linker do again?

    Before we jump into the linker script, it’s worthwhile going back and reminding ourselves what the linker’s job is. It has two main roles:

    1. Combining all the various object files and statically-linked libraries, resolving any references between them so that symbols (eg. printf) can be turned into memory locations in the final executable

    2. Producing an executable in the format required for the target environment (eg. ELF for Linux and some embedded systems, an .exe for Windows, Mach-O for OS X)

    In order to do the second part above, it needs to know about the target environment. In particular, it needs to know where different parts of the code need to reside in the binary, and where they will end up in memory. This is where the linker script comes in. It takes the different parts of your program (arranged into groups called sections) and tells the linker how to arrange them. The linker then takes this arrangement and produces a binary in the required format, with all symbols replaced with their memory locations.

    (Dynamic linking, where some libraries aren’t combined into the binary but are instead loaded at runtime, is a bit different. I won’t cover it here because it’s less common in embedded software.)

    By default, the linker will use a standard linker script appropriate for your platform – so if you’re building an application for OS X, then the default linker script will be one appropriate for OS X, for example. However, because there are so many microcontrollers out there, each with their own memory layout, there is no one standard linker script that could just work for every possible target. So we have to provide our own. Many manufacturers provide sample linker scripts for a variety of toolchains, so you usually don’t have to write it yourself. However, as you’re about to see, they’re not complicated.

    What does a linker script look like?

    The best way to understand how a linker script works is to work through an example and explain what’s going on. So I’m going to do just that with the one I’m using in the project template.

    Just like for the startup assembly code, I’ve used Philip Munts’ example linker script in the project template. (The startup assembly code and linker script are fairly closely related, as you’ll see in a minute.)

    Memory and some constants

    The first part defines what memory blocks are available:

      flash (rx): ORIGIN = 0x08000000, LENGTH = 1024K
      ram (rwx) : ORIGIN = 0x20000000, LENGTH = 128K
      ccm (rwx) : ORIGIN = 0x10000000, LENGTH = 64K

    This is fairly self-explanatory. We have three different types of memory available on our microcontroller, either read-only executable (rx) flash ROM, or read-write executable (rwx) RAM and core-coupled memory (CCM). Each has a particular memory location and size, given by ORIGIN and LENGTH, respectively. These values are shown on the memory map diagram in the STM32F405 / STM32F407 datasheet.

    Next up, we define some symbols, some of which we used in the startup assembly code and some of which are used by library functions:

    __rom_start__   = ORIGIN(flash);
    __rom_size__    = LENGTH(flash);
    __ram_start__   = ORIGIN(ram);
    __ram_size__    = LENGTH(ram);
    __ram_end__     = __ram_start__ + __ram_size__;
    __stack_end__   = __ram_end__;      /* Top of RAM */
    __stack_size__  = 16K;
    __stack_start__ = __stack_end__ - __stack_size__;
    __heap_start__  = __bss_end__;      /* Between bss and stack */
    __heap_end__    = __stack_start__;
    __ccm_start__   = ORIGIN(ccm);
    __ccm_size__    = LENGTH(ccm);
    end             = __heap_start__;


    And then we come to the meat of the linker script – the SECTIONS command. As I mentioned before, sections are used to differentiate between different kinds of data so they can be treated appropriately. For example, code has to end up in a executable memory location, while constants can go in a read-only location, so each of these are in different sections.

    Before we get into the details, a quick note on terminology. There are two kinds of sections we talk about when working with the linker:

    • Input sections come from the object files we load (usually as the result of compiling our source code), or the libraries we use.
    • Output sections are exactly what they sound like – sections that appear in the output, the final executable.

    In most scenarios, you’ll start with many different input sections that are then combined into far fewer output sections.

    .text section

    The first section we define is the .text output section. .text holds all of the executable code. It also contains any values that can remain in read-only memory, such as constants.

    The definition of .text in the linker script looks like this:

      .text : {
        KEEP(*(.startup))         /* Startup code */
        *(.text*)                 /* Program code */
        KEEP(*(.rodata*))         /* Read only data */
        . = ALIGN(4);
        __ctors_start__ = .;
        KEEP(*(.init_array))      /* C++ constructors */
        KEEP(*(.ctors))           /* C++ constructors */
        __ctors_end__ = .;
        . = ALIGN(16);
        __text_end__ = .;
      } >flash

    Let’s work through this and understand what’s going on. KEEP(*(.startup)) tells the linker that the first thing in the output section should be anything in the .startup input section. For example, this includes our startup assembly code. (That’s why there’s the .section .startup, "x" bit at the start of the assembly code). KEEP tells the linker that it shouldn’t remove any .startup input sections, even if they’re unreferenced – particularly important for the startup code.

    Next up is our program code, *(.text*). You’ll notice that there are two asterisks:

    • the one before the parentheses means ‘include sections that matches the inner pattern from any input file’
    • the one near the end is a wildcard – so any section that starts with .text will be included

    For comparison, *(.startup) means ‘include any .startup section from any input file’.

    We then include some more sections: .rodata for read-only data, .glue_7 and .glue_7t to allow ARM instructions to call Thumb instructions and vice-versa, and .eh_frame to assist in exception unwinding.

    Finally, we set up the global and static variable constructors area. We saw this used in the call_ctors / ctors_loop part of the startup code. We define __ctors_start__ and __ctors_end__ so that the startup code knows where the list of constructors starts and ends.

    You’ll notice there’s . = ALIGN(4); just before this, and . = ALIGN(16); just afterwards. . refers to the current address, so . = ALIGN(x) advances the current address forward until it is a multiple of x bytes. (If it’s already a multiple of x, nothing changes.) This is used to ensure that data is aligned with particular boundaries. For example, if an instruction is 4 bytes long, there might be a requirement that all instructions have to be aligned to 4 byte boundaries. So we can use . = ALIGN(4); to ensure that we start in a valid location. (We’ll see why . = ALIGN(16); and __text_end__ = .; are necessary in a second.)

    Now that we’ve finished specifying what needs to go inside the section, we need to tell the linker which memory block to put it in. Given that .text just contains read-only instructions and data, we use >flash to tell the linker to put .text in flash memory.

    .data section

    That’s .text sorted, so let’s take a look at .data next. .data contains initial values of mutable global and static values.

    This is how it’s defined in the linker script:

      .data : ALIGN(16) {
        __data_beg__ = .;         /* Used in crt0.S */
        *(.data*)                 /* Initialized data */
        __data_end__ = .;         /* Used in crt0.S */
      } >ram AT > flash

    .data is fairly similar to .text, with a few small differences used to achieve the one purpose: initialising initial values for global and static variables.

    >ram AT > flash instructs the linker that the .data section should be placed in the flash memory block, but that all symbols that refer to anything in it should be allocated in the ram memory block. Why? Because .data contains just the initial values, and they’re not constants – they’re mutable values we can modify in our program. Therefore we need them to be in RAM, not read-only flash. But we can’t just set values in RAM directly when we flash our microcontroller, as the only thing we can flash is flash memory. So the solution is to store them in flash, and then as part of the startup code, we copy them into RAM, ready to be modified.

    If you think back to the startup code, you’ll remember copy_data and copy_data_loop were responsible for copying the initial values from flash to RAM. There are a couple of values that the linker script sets so that code knows what to copy, and to where:

    • __text_end__, which was defined at the end of the .text section, gives us the first flash memory location of the .data section. Why this is the case might not be clear at first: the . = ALIGN(16); advances . to the next 16 byte boundary, and then we store that value in __text_end__. When the linker comes to .data, which is the next section, it starts allocating flash memory locations for .data from __text_end__ onwards, because that is the next available location in flash.
    • __data_beg__ and __data_end__ give the start and end locations of .data in RAM. ALIGN(16) in the section definition ensures that the start location is aligned to a 16 byte boundary.

    So copy_data follows this pseudocode:

    • if __data_beg__ equals __data_end__, there is nothing to initialise, so skip all of this
    • otherwise:
      • set current_text to __text_end__ and current_data to __data_beg__
      • copy the byte at address current_text to address current_data (X)
      • advance current_text and current_data each by one
      • if we’ve reached the end (ie. our updated current_data equals __data_end__), stop, otherwise go back to (X)

    .bss section

    Two down, one to go… .bss is the last major output section:

      .bss (NOLOAD) : ALIGN(16) {
        __bss_beg__ = .;          /* Used in crt0.S */
        *(.bss*)                  /* Uninitialized data */
        *(COMMON)                 /* Common data */
        __bss_end__ = .;          /* Used in crt0.S */
      } >ram

    Again, this is a section we’ve seen references to in the startup code. zero_bss and zero_bss_loop was the part of startup that sets up any global or static variables that have a zero initial value. You’ll remember that rather than storing a whole bunch of zeroes in flash and copying them over, we instead store how many zeroes we need and then initialise that many memory locations to zero when starting up.

    A few different parts combine to produce this result:

    • NOLOAD specifies that this section should just be allocated addresses in whichever memory block it belongs to, without including its contents in the final executable.
    • >ram at the end specifies that addresses should be allocated in the RAM memory block.
    • We use a similar trick to what we saw in .data, where we store the first and last RAM locations of the zero values as __bss_beg__ and __bss_end__ respectively. These are then used by the startup code to actually initialise those locations with zeroes.

    .ARM.extab and .ARM.exidx sections

    These are sections that are used for exception unwinding and section unwinding, respectively. This presentation gives a short overview of the information contained in them and how they’re used.


    Last, but certainly not least, we need to define the entrypoint of the application. This is done with the command ENTRY(_vectors). _vectors is the starting point of the exception vector table, which we defined in the startup code. This isn’t used by the microcontroller, as it always starts at the same memory address after a reset. However, it’s still necessary so that the linker doesn’t just optimise everything away as unused code.

    The end

    So that’s the linker script… next time we’ll take a look at the sample application and how it makes those LEDs blink.


  • Developers and Designers Can Pair Too

  • A deeper look at the STM32F4 project template: getting things started

    As promised in my post about my STM32F4 project template, I’m going to be publishing a series of posts about various interesting aspects of it. (Well, interesting to me…)

    The first topic: how does the microcontroller start up and get to the point where it’s running the awesome flashing LEDs code?

    Most of this is controlled by the startup assembly code, stm32f407vg.S. I didn’t write this myself – I used Philip Munts’ examples, although I did make some small changes.

    Side note: different compilers have different syntaxes for assembly code. The instructions are the same for all compilers for the same processor architecture. However, how you write them might be slightly different from compiler to compiler. This is unlike languages like C++ or Ruby, where the syntax is the same everywhere. For example, some compilers denote comments with // and /* ... */ like C or C++, some use @, while others use ;. If you’re getting unexpected compiler errors, especially with things you’ve borrowed from the internet, make sure the syntax matches what your compiler expects.

    Step 1: the exception vector table

    If you take a look in stm32f407vg.S, you’ll see a section like this:

    // Exception vector table--Common to all Cortex-M4
    _vectors:   .word   __stack_end__
        .word   _start
        IRQ   NMI_Handler
        IRQ   HardFault_Handler
        IRQ   MemManage_Handler
        IRQ   BusFault_Handler
        IRQ   UsageFault_Handler
        .word   0
        .word   0
        .word   0
        .word   0
        IRQ   SVC_Handler
        IRQ   DebugMon_Handler
        .word   0
        IRQ   PendSV_Handler
        IRQ   SysTick_Handler
    // Hardware interrupts specific to the STM32F407VG
        IRQ   WWDG_IRQHandler
        IRQ   PVD_IRQHandler
        IRQ   TAMP_STAMP_IRQHandler
        IRQ   RTC_WKUP_IRQHandler
        IRQ   FLASH_IRQHandler
        IRQ   RCC_IRQHandler
        ... and so on

    This is where the first bit of magic happens. This sets up the exception vector table, which is what is used by the microcontroller to work out what to do when starting up. This is not something specific to the STM32F4 series – the memory layout of the table is standard for all ARM Cortex-M4 processors. It’s documented in section 2.3.4 of the Cortex-M4 Devices Generic User Guide. Note that the diagram in the documentation has the memory going from the lowest address (0x0000) at the bottom of the diagram to the highest at the top, whereas the assembly code has the lowest address first.

    Cortex-M4 exception vector table layout Image source: section 2.3.4 of the Cortex-M4 Devices Generic User Guide

    The first entry is the initial value of the stack pointer, and here we’re initialising it to __stack_end__. Setting the initial value of the stack pointer to the end of the stack rather than the beginning may seem counterintuitive, but keep in mind that adding something to the stack decrements the stack pointer, moving it to a lower value. The value of __stack_end__ is set by the linker script, and imported into this file by the .extern __stack_end__ statement near the beginning of the file. The linker script is a topic for a whole other post, but it is worthwhile mentioning that it’s responsible for making sure the exception vector table ends up in the right place in our firmware binary, so that it ends up in the right place in memory on the device when we flash our program on to it.

    (If, like me, you’re a bit rusty on what the stack pointer is and how it is used, this page has a good explanation. It talks about the MIPS architecture, but the concepts are the same for ARM and pretty much every other processor architecture.)

    Next up, in the second entry (memory offset 0x0004), is the reset vector. The reset vector is a pointer to the first instruction the processor should execute when it is reset. In our case, this is _start. We’ll come back to _start in a second.

    The following entries give the addresses of various interrupt handlers. ARM calls them exception vectors, hence the name ‘exception vector table’. The first few handle failure scenarios, and then the rest cover the interrupts we’re used to dealing with, such as timers. I’m not going to go into too much detail here about this here, but Philip has defined a handy IRQ macro to use to help set these up. This enables us to define handlers only for the interrupts we’re interested in – if we don’t set up a handler for a particular interrupt, it’ll use a default handler that just returns immediately.

    Step 2: preparation for user code: _start

    So once the processor has initialised the stack pointer with __stack_end__, it starts executing the code specified in the reset vector. As we saw before, in our case, this is _start. Philip has done a pretty good job of explaining each instruction here, so I’m not going to go through it line by line, but I will call out the rough steps:

    1. copy_data and copy_data_loop: Copy anything in the .data segment from flash to RAM. The .data segment includes global and static variables that have a non-zero initial value.

    2. zero_bss and zero_bss_loop: Similar to the previous step, this initialises global and static variables that have a zero initial value in what is called the .bss segment. Why are zero values handled separately to non-zero values? It saves flash memory space, and is quicker to load: it takes much less space to store that X zero values are needed and initialise that many locations to zero than it does to record zero X times and then copy all those zeroes from flash into RAM.

      (Wikipedia and this page both have good explanations of the .data and .bss segments if you want to read more.)

    3. call_ctors and ctors_loop: This does what the name suggests – it calls constructors for static and global variables.

    4. run: The final step before we run our main() method is to call SystemInit(). SystemInit() is a function in system_stm32f4xx.c that sets up the processor’s clock. ST provides a utility to generate this file based on your application’s requirements and hardware. (The one I’m using should be suitable for the Discovery board.)

    Step 3: main()

    At this point, run branches to main() and we’ve finally made it! Everything has been initialised and the processor is running our code, happily doing whatever we’ve asked of it, whether that be flashing LEDs or controlling a chainsaw-wielding drone.

    Step 4: life after main()

    In most applications, main() is the last stop on our journey: it will usually eventually loop forever or put the microcontroller to sleep, and so main() will never return. However, we still need to handle the case where it does return.

    If it does return, the processor will execute the next instruction after the call to main():

        ...other stuff in run
        bl     main                         // Call C main()
        // What's next?

    If we don’t put something there, the processor could do anything – that memory location could potentially contain anything if we don’t explicitly set it to something. In our case, I’ve added an infinite loop:

        ...other stuff in run
        bl     main                         // Call C main()
        // Fallthrough
        b      loop_if_end_reached          // Loop forever


  • A project template for the STM32F4 Discovery board

    I set myself a bit of a challenge recently: pull together a working toolchain for a microcontroller. I’ve always used environments automagically generated by tools or prepared by others and so it’s been a bit of a mystery to me – I wanted to dig into the details to understand what’s going on.

    The result of this challenge is a project template for the STM32F4 Discovery board, which you can find on GitHub.

    I picked the STM32F4 series as it’s a relatively popular MCU choice and has a low-cost development board in the form of the Discovery board. It’s also an ARM processor, so there are plenty of good tools and resources available. The project template should work for any part in the STM32F4 series, although a bit of tweaking of some configuration files may be needed.

    It still has some rough edges, but it should be ready for use by people who aren’t me. It has everything you need to get started:

    You can grab the template from GitHub. If you run into any issues or have any suggestions, creating issues on the repository is the best way to get in touch.

    I’m planning on writing up a few posts explaining some of the more interesting parts soon… stay tuned.

  • Fast(er) database integration testing with snapshots

    On a recent project, we were struggling with our integration test suite. A full run on a developer PC could take up to 15 minutes. As you’d expect, that was having a significant impact on our productivity.

    So we decided to investigate the issue and quickly discovered that the vast majority of that 15 minutes was spent spinning up and then destroying test databases for each test, rather than executing the tests themselves. (In the interests of test isolation, each individual test was given its own fresh database.) We didn’t want to stop giving each test a clean database, because we liked the independence guarantee that gave us, but at the same time, a 15 minute test run was well past the point of being bearable. There had to be a better way…

    Some context

    We were using SQL Server and the database schema we were testing against was a behemoth that had grown over many years to contain somewhere around 50 to 100 tables, plus dozens of stored procedures, views and indices. Creating a new database from scratch with this schema took 15-30 seconds.

    Our production environment had nearly 1 TB of data, with most of that concentrated in two key tables that we frequently queried in different ways. As you can imagine, achieving an acceptable level of performance with this volume of data was challenging at times. (Thankfully we didn’t need that much data for each test.)

    Because of this, we had a large amount of hand-crafted, artisanal SQL and were using Dapper. We had previously been using Fluent NHibernate to construct our queries, but found that NHibernate introduced a significant performance penalty which was not acceptable in our case. Furthermore, many queries involved a large number of joins and conditions that were awkward to express using the fluent interface and often resulted in inefficient generated SQL. They were much more expressive, concise and efficient in hand-written SQL.

    Furthermore, some parts of the application wrote to the database, and different tests required different sets of data, so we couldn’t have one read-only database shared by all tests. Due to the number of database interactions under test and the number of tests, we wanted something that would have a minimal impact on the existing code.

    The existing solution

    The existing test setup looked like this:

    1. Restore database backup with schema and base data
    2. Insert test-specific data from CSV files
    3. Run test
    4. Destroy test database
    5. Repeat steps 1 to 4 for each test

    Apart from the performance issue I’ve already talked about, we also wanted to try to address some of these issues:

    • The database was created from a backup stored on a shared network folder, meaning anyone could break the tests for everyone at any time. Moreover, it wasn’t under source control, so we couldn’t easily revert back to an old version if need be.

    • The database was a binary file that was not easy to diff, which made it difficult to work out what had changed when tests suddenly started breaking.

    • Any schema changes that were made needed to be manually applied to the database backup file (applying any new database migrations was not included in the test set up process). This meant tests weren’t always running against the most recent version of the schema.

    What are snapshots?

    Before we jump into talking about how we improved the performance of our tests, it’s important to understand SQL Server’s snapshot feature. Other database engines have similar mechanisms available, some built-in, some relying on file system support, but we were using SQL Server so that’s what I’ll talk about here. (Snapshots are unrelated to other things with the word ‘snapshot’ in them, such as snapshot isolation.)

    Snapshots are just a read-only copy of an existing database, but the way in which they are implemented is critical to the performance gains we saw. Creating a snapshot is a low-cost operation as there is no need to create a full copy of the original database on disk. Instead, when changes are made to the original database, the affected database pages are duplicated on disk before those changes are applied to the original pages.

    Then, when the time comes to restore the snapshot, it is just a matter of deleting the modified pages and replacing them with the untouched copies made earlier. And the fewer changes you make after the snapshot, the faster it is, as fewer pages need to be restored. This means that the act of restoring the database from a snapshot is very, very fast, and certainly much faster than recreating the database from scratch. This makes them well suited to our needs – we were recreating the database to get to a known clean state before the start of each test, but this achieves the same effect in much less time.

    (If you’re interested in learning more, there are more details about how snapshots work and how to use them on MSDN.)

    Introducing snapshots into our test process

    After a few iterations, we arrived at this point:

    1. Create empty test database
    2. Run initial schema and base data creation script
    3. Apply any migrations created since the initial schema and data script had been created
    4. Take a snapshot of the database
    5. Restore database from snapshot
    6. Insert test-specific data from CSV files
    7. Run test
    8. Repeat steps 5 to 7 for each subsequent test
    9. Destroy snapshot and test database

    While this approach might be slightly more complicated, it gave us a huge performance boost (down to around 8 minutes). We were no longer spending a significant amount of time creating and destroying databases for each test.

    There are a few other things of note:

    • Step 5 (restoring the database from the snapshot) isn’t strictly necessary for the first test. We left it there anyway, instead of moving it to after running the test, just to make it clear that each test started with a clean slate. Also, as I mentioned earlier, the cost of restoring from a snapshot is negligible, especially when there are no changes, so this does not significantly increase the run time of the test.

    • We addressed our first two ‘nice to haves’ by turning the binary database backup into an equivalent SQL script that recreated the schema and base data (used in step 2), and put this file under source control. Running the script rather than restoring the backup was marginally slower, but we were happy to trade speed for maintainability in this case, especially given that we’d only have to run the script once per test run.

    • Step 3 (running any outstanding migrations) addressed the final ‘nice to have’. We couldn’t just build up the database from scratch with migrations because for some of the earliest parts of the schema, there weren’t any migration scripts. Also, we found that building a minimal schema script (and leaving the rest of the work to the migrations we did have) was a time-consuming, error-prone manual process.

    Adding a dash of parallelism

    Despite our performance gains, we still weren’t satisfied – we knew we could do even better. The final piece of the puzzle was to take advantage of NUnit 3’s parallel test run support to run our integration tests in parallel.

    However, we couldn’t just sprinkle [Parallelizable] throughout the code base and head home for the day. If we did that without making any further changes, each test running at the same time would be using the same test database and would step on each other’s toes. We didn’t want to go back to having each test create its own database from scratch either though, because then we’d lose the performance gains we’d achieved.

    Instead, we introduced the concept of a test database pool shared amongst all testing threads. The idea was that as each test started, it would request a database from the pool. If one was available, it was returned to the test and it could go ahead and run with that database. If there weren’t any databases available in the pool, we’d create one using the same process as before (steps 1 to 4 above) and then return that database. At the end of the test, it would then return the database to the pool for other tests to use.

    This meant we created the minimum number of test databases (remember that creating the test database was one of the most expensive operations for us), while still realising another significant performance improvement – our test run was now down to around 5 minutes. While this still wasn’t as fast as we’d like (let’s be honest, we’re impatient creatures, tests can never be fast enough), we were pretty happy with the progress we’d made and needed to start looking at the details of individual tests to improve performance further.

    Other approaches that we tried and discarded

    We experimented with a number of other approaches that we chose not to use, but might work in other situations:

    • Using a lightweight in-process database engine (eg. SQL Server Compact or SQLite), or something very quick to spin up (eg. Postgres in a Docker container): while this option looked promising early on, we needed to use something from the SQL Server family so that we were testing against something representative of the production environment, which left SQL Server Compact as the only option. Sadly, SQL Server Compact was missing some key features that we were using, such as views, which instantly ruled it out. (There’s a list of the major differences between Compact and the full version on MSDN.)

    • Wrapping each test in a transaction, and rolling back the transaction at the end of the test to return the database to a known clean state: if we were writing our system from scratch, this is the approach I’d take. However, we already had a large amount of code written, and it would have been very time-consuming to rework it all to support being wrapped in a transaction. Most of the database code we had was responsible for creating a database connection, starting a transaction if needed and so on, and so reworking it to support an externally-managed transaction would have required a significant amount of work not only in that code but throughout the rest of the application.

    • Using a read-only test database for code that made no changes to the test database: at one point in time, a large part of our application only read data from the database, so we considered having two different patterns for database integration testing: one for when we were only reading from the database, and another for when it was modified as part of the test (eg. commands involving INSERT statements). We shied away from this approach for the sake of having just one database testing approach.

    Updated April 3: minor edits for clarity, thanks to Ken McCormack

  • Yow Connected presentation: 'Meet me halfway: developers and designers pairing for the win'

    Greg Skinner and I gave a presentation at Yow Connected in Melbourne on September 17, where we discussed how pairing can be a very effective form of collaboration for developers and designers.

    The talk was recorded, so I hope there will be a video of it available soon. In the meantime, you can download a PDF version of the slides here.

    The final version of the app we developed during the presentation is still live at http://yc.charleskorn.com, and the source code is also on GitHub.

  • DevOpsDays Melbourne presentation: 'Deploying two apps, three microservices and one website with zero hair loss: what worked for us'

    I’ve been holding off posting about this because I was hoping the video would be posted, but it doesn’t seem to be coming… I’ll update this post if it does come out.

    I gave a lightning / Pecha Kucha-style talk at DevOpsDays Melbourne on July 16 about our how we were able to deliver multiple production deployments a week at Trim for Life. This involved the use of (amongst other things) a microservices architecture, a backwards- and forwards-compatible data storage mechanism, fully automated deployment pipelines and automated monitoring.

    The slides are available to download as a PDF.

  • Two Raspberry Pi Ansible roles published

    Over the weekend I published two Ansible roles for automating common set up tasks for a Raspberry Pi:

    They’re also available on Ansible Galaxy.

    The readme files for both roles include more information on how they work and an example of how to use them.

    Both of them have some opportunities for improvement, and pull requests and issue reports are most welcome.

  • My blogging setup

    The purpose of this post is to document what I’m using to run this blog (and thus serve as some kind of prompt when I’m trying to fix stuff in six months’ time) and hopefully also help you set up your own blog.

    The tl;dr version: Jekyll + GitHub Pages + custom domain name = charleskorn.com


    I use GitHub Pages to host this blog. If you’ve never heard of GitHub Pages before, it is a completely free static website hosting service provided by GitHub. As you might expect with something coming from GitHub, each site is backed by a Git repository (mine is charleskorn/charleskorn.github.io, for example). This means you instantly get things like version history, backups and offline editing with zero extra effort.

    You can use GitHub Pages to create a site for a user (which is what I have done for this site), a single repository or an organisation. More details on the differences are explained in the help area.

    You can use it to host purely static content, or you can have your final HTML generated by the Jekyll static site generator, which is what I use. As soon as you push your changes up to your GitHub repository, all the static files are automatically regenerated within a matter of seconds.

    With Jekyll, you write your posts as Markdown files, and then it generates HTML from your posts based on templates that you define. It also has built in SASS and CoffeeScript compilation, RSS feed generation and support for permalinks, amongst other things. It’s also straightforward to set up a custom 404 page.

    Side note: as you can imagine, some features of Jekyll are restricted by GitHub. For example, there is a limited set of allowed plugins, but by and large, you can use the full functionality of the core Jekyll project. If you need more flexibility, you can always generate the pages yourself and push the result up to a branch called gh-pages, or just turn Jekyll processing off altogether. (We use this technique for the ThoughtWorks LevelUp website, for example.)

    The final missing piece of the puzzle is setting up a custom domain name to point at the site hosted by GitHub. This is fairly straightforward to do. It requires creating a CNAME file in the root directory of your repository with the domain name you’d like your site to use (you can see mine for an example), and then configuring your DNS server to point at GitHub’s servers.

    Writing process

    My writing process looks something like this at the moment:

    • bundle exec jekyll draft "My cool new article"
    • bundle exec guard
    • writing, checking a live preview in my browser
    • procrastinating
    • more writing
    • editing and proofreading
    • bundle exec jekyll publish _drafts/my-cool-new-article.md
    • git commit -m "My cool new article"
    • git push

    Not all of this is built in to Jekyll:

    The jekyll draft and jekyll publish commands come from a handy plugin called jekyll-compose. They don’t do anything you couldn’t do yourself – jekyll draft just sets up the scaffolding of a draft post and jekyll publish just moves that post into the _posts folder and adds the date to the file and the front matter – but they certainly make things that little bit easier.

    There are a couple of things that work together to produce live previews in a browser. The guard-jeykll-plus plugin for guard makes use of Jekyll’s built-in auto-regeneration support to automatically regenerate the site when any change is detected. Once that’s done, the guard-livereload plugin sends a notification to a small JavaScript snippet embedded into every page to trigger a reload once the build process is finished.


    I also used the opportunity of setting up this blog to learn about some standard SEO practices for websites (given the topic sometimes comes up in my day job).

    The first thing was setting up Google Analytics (for traffic reporting and analysis) and Google Webmaster Tools (for alerts from Google when something goes awry). The easiest way to do this is to use Google Tag Manager. This makes it easy to manage all the necessary JavaScript tags for your site, and only requires a single JavaScript tag on your pages – the rest are inserted dynamically based on your configuration.

    The next step was adding an XML sitemap, which helps search engines discover all of the pages on your site. This is can be generated for you as part of the Jekyll build process by the jekyll-sitemap plugin.

    However, simply generating the sitemap isn’t enough – you also need to tell Google about it. You can use Webmaster Tools to do this instantly, or list it in your robots.txt and Google’s crawler will pick it up next time it indexes your site. Furthermore, when the content on your site changes, it’s a good idea to ping Google to let it know. I have a Travis CI job set up to do this automatically whenever I push changes to GitHub. (This also checks for simple issues like broken links and malformed HTML using html-proofer.) It’s a similar process for other search engines as well.

    Another thing that I’m not 100% sold on the value of is schema.org markup. The idea is that you decorate your HTML with extra attributes that describe what their content means, for example, that a span contains the author’s name, or that a h2 is the blog post title. It enables some of the extra information you sometimes see next to search results such as the date the post was written and the author’s name, but I’m not sure how much it helps with people actually finding content relevant to their query. (Although I’m sure that if Google told us that it did impact search ranking, you can bet that it would be exploited within hours, diminishing its usefulness somewhat.)

    This turned out to be much longer than I expected… I hope you found this useful (even if this is just Future Charles reading this).

  • An Ansible role for running ACIs with systemd

  • My experiences with rkt, an alternative to Docker

    This post was written for the v0.5.5 release of rkt, so some things may be slightly outdated.

    I’ve recently been experimenting with rkt (pronounced ‘rock-it’), a tool for running app containers. rkt comes from the team at CoreOS, who you might know from other projects such as etcd and fleet. rkt is conceptually similar to Docker, but was developed by CoreOS to address some of the shortcomings they perceive in Docker, particularly around security and modularity. They wrote up a great general overview of their approach in the rkt announcement blog post.

    I was initially drawn to rkt because of the difficulty running Docker on a Raspberry Pi without recompiling the kernel. (Although, as of now, I still haven’t got rkt running on it either - that should change soon though, see below.)

    My goal with this blog post is to discuss some of the things I’ve liked about rkt, and some of the frustrations.

    Image build process

    The first thing that struck me about rkt was the simplicity and speed of the image build process. A simple image for a personal project takes four seconds to build, and most of that is spent compiling the Golang project that makes up the image. (This is the shell script that I use to compile that image if you’re interested.) This simplicity and speed stems from the fact that neither the image nor its base image (if there is one) are actually started during the build process when using the standard build tool (actool). Instead, each image is created from the files it requires and a list of images it depends on to run. Not booting the container also eliminates the need to download the base images to the build machine, saving further time. Moreover, the runtime provides a standard base image upon which all other images are run (called the stage 1 image), eliminating the need for a base image in many simple cases. (Similar to Docker, the “merging” of the top-level image with its dependencies happens at runtime, using overlayfs if available on the host.)

    Removing the need to boot up an image during the build process also has some other advantages. It makes it far easier to run on different architectures or operating systems to those on the target machine. For example, it’s quite easy to build an image targeting Linux from OS X, and no tools like boot2docker are required. Furthermore, the build process can be run as an unprivileged (non-root) user, which can make life easier in restricted hosted build environments.

    However, this approach does have one major drawback that can make life very painful. As the container is not booted, tools such as apt-get can’t be used to easily install dependencies - you’re responsible for ensuring that all dependencies find their way onto the image yourself. With a simple Golang server, for example, this isn’t a big issue, as it’s quite straightforward to produce a statically-linked binary, but for many other languages this can be problematic as the entire runtime and their dependencies are required (eg. a JVM, Ruby runtime etc.).

    Separation of build tools and runtime

    As I mentioned earlier, one of the major focus areas has been on modularity and enabling developers to use the most appropriate tool for the job, rather than the more one-size-fits-all approach of Docker. As such, the runtime (rkt) is completely separate from the build tool (actool), and either could be completely replaced by another tool - the specification they conform to is available on GitHub. There are already some alternative runtime implementations and a variety of different tools springing up.

    Security / image verification model

    One of the most interesting aspects of rkt is its security model. rkt comes with cryptographic verification of images enabled by default (and defaults to trusting no signing keys). This enables you to ensure that not only are public images are from those you trust, but also that any private images you upload to services outside your control are unmodified when you pull them back down again. It’s also pretty straightforward to set up and running, with a good guide available on GitHub.

    Compatibility with Docker

    Although I haven’t tried this myself, it’s possible to run Docker images with the rkt runtime, if you need to run both Docker and rkt images side-by-side. (source)


    rkt is still very much under active development - there have been 14 releases since the first at the end of November last year - which means what little documentation and online resources are out there are generally made obsolete very quickly, which can be a bit of a pain at times. There is a focus on stabilising things with the upcoming 0.6 release, but until that materialises, keep in mind that anything you’re reading now could be very out of date.

    Furthermore, the official documentation is patchy - some places (such as the app container specification) are thorough, whilst other places are full of placeholders.

    Multiple architectures

    Because the stage 1 image is provided by the host environment, building images for different architectures is (theoretically) easier - if your image does not require a specific base image, all you need to do is recompile your binaries for the other architecture. (I say theoretically because I haven’t yet tried it myself - I’m still waiting for support for running containers on ARM processors, which is expected in the 0.6 release.)


    I’ve really liked working with rkt so far - it’s been very quick to get up and running and there wasn’t a lot of fussing around to get something simple working, even in its current form. There are still some rough edges, particularly around documentation, and some things that can make life frustrating, like dealing with dependencies, but there is definitely a lot of potential. It’s going to be very interesting to watch rkt evolve.

  • Welcome to my blog

    Thanks for visiting… I’m going to try to post semi-regularly.