Faster digital read on Arduino nano 33 BLE

For anyone looking to read digital pins faster than what stock nano 33 BLE running Arduino’s framework can achieve. Used information gotten from here, information from the repository files here (relied heavily on NRF51822.h) and information from “nRF52840 Product Specification v1.7”.

Did the toolchain setup as specified here. Setup CLion and VS Code as described here and here (wasn’t sure which I’d like better so did both). Installed git and downloaded HelloWorld (rather unsuccessfully if I might add). Downloaded a bunch of additional stuff in response to error prompts from CLion while trying to get HelloWorld to work. Gave up on CLion when pyocd failed to install properly and openocd returned errors
during project configuration. Finally got HelloWorld working after switching over to VS Code, which automatically downloaded a whole other bunch of stuff.

Changed the code in main.cpp of HelloWorld to the code below.

#include "mbed.h"

void ISRcounter(void);

Ticker counterTicker;

volatile unsigned long Tesbuf1[10000];//remove after testing
volatile unsigned long Tesbuf2[10000];//remove after testing
volatile unsigned long Tesbuf3[10000];//remove after testing

volatile int bufpresE[2500];//Set to reflect memory limits
volatile int bufpresF[2500];//Set to reflect memory limits
volatile unsigned long buft3[2500];//Set to reflect memory limits

volatile int i = 0;
volatile int k = 1;
volatile int n = 0;

int main()
{
  counterTicker.attach(ISRcounter, 3s); // Call ISRcounter function every stated s.

  NRF_P0->PIN_CNF[27] = 0x0C;
  NRF_P1->PIN_CNF[2] = 0x0C;

  bufpresE[0] = ((NRF_P0->IN & 0x08000000) ? 1 : 0) ;
  bufpresF[0] = ((NRF_P1->IN & 0x00000004) ? 1 : 0) ;
  bufpresE[1] = bufpresE[0];
  bufpresF[1] = bufpresF[0];
  buft3[0] = 0;

  
  CoreDebug->DEMCR |= 0x01000000; //Enable the use of DWT.
  DWT->CYCCNT = 0; //Reset cycle counter.
  DWT->CTRL |= 0x1; //Enable cycle counter.

  while(true) 
  {
    if (i == 1)
    {
      for (n = 0; n < 10000; n++)
      {
        printf("%lu, %lu, %lu\n", Tesbuf1[n], Tesbuf2[n], Tesbuf3[n]);//remove after testing
      }
      
      for (n = 0; n < k; n++)
      {
        printf("%lu, %i, %i\n", buft3[n], bufpresE[n], bufpresF[n]);
      }
      printf("end of sensor data\n");
      while(1)
      ;
    }
  }

  // main() is expected to loop forever.
  // If main() actually returns the processor will halt
  return 0;
}

void ISRcounter(void)
{
  counterTicker.detach();

  for (n = 0; n < 10000; n++)
  {
    Tesbuf1[n] = DWT->CYCCNT;//remove after testing
    bufpresE[k] = ((NRF_P0->IN & 0x08000000) ? 1 : 0) ;
    bufpresF[k] = ((NRF_P1->IN & 0x00000004) ? 1 : 0) ;
    buft3[k] = DWT->CYCCNT;
    if ((bufpresE[k-1] != bufpresE[k]) || (bufpresF[k-1] != bufpresF[k]))
    {
      k++;
    }
    Tesbuf2[n] = DWT->CYCCNT;//remove after testing
    Tesbuf3[n] = DWT->CYCCNT;//remove after testing
  }

  i = 1;
}

Uploaded to the board by entering the following line of text in the integrated terminal.

C:\Users\xxxxxxx\AppData\Local\Arduino15\packages\arduino\tools\bossac\1.9.1-arduino2/bossac -d --port=COM4 -U -i -e -w C:/Users/xxxxxxx/CLionProjects/mbed-ce-hello-world/build/HelloWorld.bin -R

“xxxxxxx” represents your computer name. The first address points to the location of the bossac upload tool in Arduino’s IDE installation. I expect that a standalone bossac tool located elsewhere will work too. The second address points to the binary file to be uploaded. Note that CLion and VS Code have different storage locations for binaries.

Read the output using Coolterm. Here is a sample of the board’s output.

51	16	19
51	16	19
51	16	19
51	16	19
51	16	19
51	16	19
51	16	19
51	16	19
51	16	19
51	16	19
51	16	19

The first column is the estimated time to read two pins, get a DWT->CYCCNT-based timestamp for the read events and do two comparisons to update a count. The second column is an estimate of the error introduced by the measurement process. The third column is an estimate of the time it takes to loop back to Tesbuf1[n]. Column 2 should be subtracted from columns 1 and 3 to remove offsets added by the measurement process.

Based on the output, the estimated time to read two pins, get a DWT->CYCCNT-based timestamp for the read events and do two comparisons to update a count was 35 clock cycles (546.88 ns on nano 33 BLE which runs at 64MHz).

Did the same test for DigitalIn using the code below.

#include "mbed.h"

void ISRcounter(void);

Ticker counterTicker;

DigitalIn line9(D9);
DigitalIn line10(D10);

volatile unsigned long Tesbuf1[10000];//remove after testing
volatile unsigned long Tesbuf2[10000];//remove after testing
volatile unsigned long Tesbuf3[10000];//remove after testing

volatile int bufpresE[2500];//Set to reflect memory limits
volatile int bufpresF[2500];//Set to reflect memory limits
volatile unsigned long buft3[2500];//Set to reflect memory limits

volatile int i = 0;
volatile int k = 1;
volatile int n = 0;

int main()
{
  line9.mode(PullUp);
 line10.mode(PullUp);

  bufpresE[0] = line9;
  bufpresF[0] = line10;
  bufpresE[1] = bufpresE[0];
  bufpresF[1] = bufpresF[0];
  buft3[0] = 0;

  counterTicker.attach(ISRcounter, 3s); // Call ISRcounter function every stated s.


  CoreDebug->DEMCR |= 0x01000000; //Enable the use of DWT.
  DWT->CYCCNT = 0; //Reset cycle counter.
  DWT->CTRL |= 0x1; //Enable cycle counter.

  while(true) 
  {
    if (i == 1)
    {
      for (n = 0; n < 10000; n++)
      {
        printf("%lu, %lu, %lu\n", Tesbuf1[n], Tesbuf2[n], Tesbuf3[n]);//remove after testing
      }
      
      for (n = 0; n < k; n++)
      {
        printf("%lu, %i, %i\n", buft3[n], bufpresE[n], bufpresF[n]);
      }
      printf("end of sensor data\n");
      while(1)
      ;
    }  
  }

  // main() is expected to loop forever.
  // If main() actually returns the processor will halt
  return 0;
}

void ISRcounter(void)
{
  counterTicker.detach();

  for (n = 0; n < 10000; n++)
  {
    Tesbuf1[n] = DWT->CYCCNT;//remove after testing
    bufpresE[k] = line9;
    bufpresF[k] = line10;
    buft3[k] = DWT->CYCCNT;
    if ((bufpresE[k-1] != bufpresE[k]) || (bufpresF[k-1] != bufpresF[k]))
    {
      k++;
    }
    Tesbuf2[n] = DWT->CYCCNT;//remove after testing
    Tesbuf3[n] = DWT->CYCCNT;//remove after testing
  }

  i = 1;
}

Here is a sample of the board’s output for DigitalIn.

145	10	18
145	10	18
145	10	18
145	10	18
145	10	18
145	10	18
145	10	18
145	10	18
145	10	18
145	10	18
145	10	18
145	10	18
145	10	18
145	10	18
145	10	18

This reports 135 clock cycles (2.109 microseconds). That’s almost four times slower than the fast read process.

Repeated the test for digitalRead in the arduino environment using the code below.

unsigned long Tesbuf1;//remove after testing
unsigned long Tesbuf2;//remove after testing
unsigned long Tesbuf3;//remove after testing

byte bufpresE[2500];//Set to reflect memory limits
byte bufpresF[2500];//Set to reflect memory limits
unsigned long buft3[2500];//Set to reflect memory limits

int k = 1;
int y = 0;//remove after testing

void setup() {
  Serial.begin(2000000); 
  while (!Serial); //Wait for serial port to connect. Needed for native USB on nano 33 ble sense.

  pinMode(9, INPUT_PULLUP);
  pinMode(10, INPUT_PULLUP);

  bufpresE[0] = digitalRead(9);
  bufpresF[0] = digitalRead(10);
  bufpresE[1] = bufpresE[0];
  bufpresF[1] = bufpresF[0];
  buft3[0] = 0;

  CoreDebug->DEMCR |= 0x01000000; //Enable the use of DWT.
  DWT->CYCCNT = 0; //Reset cycle counter.
  DWT->CTRL |= 0x1; //Enable cycle counter.
}

void loop() {
  while (1) {
    Tesbuf1 = DWT->CYCCNT;//remove after testing
    bufpresE[k] = digitalRead(9);
    bufpresF[k] = digitalRead(10);
    buft3[k] = DWT->CYCCNT;
    if ((bufpresE[k-1] != bufpresE[k]) || (bufpresF[k-1] != bufpresF[k])) {
      k++;
    }
    Tesbuf2 = DWT->CYCCNT;//remove after testing
    Tesbuf3 = DWT->CYCCNT;//remove after testing
    Serial.print(Tesbuf1);//remove after testing
    Serial.print(",");//remove after testing
    Serial.print(Tesbuf2);//remove after testing
    Serial.print(",");//remove after testing
    Serial.println(Tesbuf3);//remove after testing
    y++;//remove after testing
    //set y as memory permits
    if (y == 5000) {//remove after testing
      while(1);//remove after testing
    }//remove after testing
  };
}

Here is a sample of the board’s output for digitalRead.

235	4	46878
235	4	50477
235	4	57283
235	4	58792
235	4	49821
235	4	61696
235	4	63799
235	4	57201
1995	4	56483
235	4	46689
235	4	50213
235	4	57628

This reports 231 clock cycles (3.609 microseconds). That’s about six and a half times slower than the fast read process. The loop back time is crappy here cos of all the print statements included within each iteration (got lazy knowing this wasn’t gonna fly anyways).

One question that may pop up is “why use an interrupt for the read calls?”. Well, as seen in the output for the Arduino case, something (i assume an interrupt of some sort) kept jumping in and was messing up the flow of the read operation. Not very familiar with NVIC so used a timer interrupt to block that “something” until the process was done then set it loose. Note that the for-loop in ISRcounter appears to interact with the USB virtual Com Port. Setting very high loop counts (above 999999) delays the creation of the virtual port which affects data display. Getting rid of the while(1); after the print operation in main()'s infinite loop should work though.

I should also point out that I tried to get the FastIO library to work but that didn’t turn out so well. Kept having issues with the classes not recognising one another or not having access to supposedly shared resources/variables.

I suppose this approach can be applied to DigitalOut and DigitalInOut too (not particularly interested in those right now). Hope this will save someone a couple hours and a few tufts of hair.

PS. CLion now works fine and CCache is great. Turns out it has faster build times than VS Code which builds from scratch every time.

2 Likes