Performance question - BufferedSerial::write byte after byte or in chuncks

Hi there!

I’m finding myself scratching my head about how to deal with large buffers sent to BufferedSerial::write() as I was working on optimizing/refactoring our logger.

should we write the buffer byte after byte or in chuncks?

The data is stored in a CircularBuffer (fifo) and the call to BufferedSerial::write() is made with an event queue in a low priority thread to avoid blocking important stuff and only output logs when nothing else is happening.

Current implementation writing byte after byte looks like this:

inline void process_fifo()
{
	while (!buffer::fifo.empty()) {
		auto c = char {};
		buffer::fifo.pop(c);
		_default_serial.write(&c, 1);
	}
}

Chunk implementation looks like that:

inline auto process_buffer = std::array<char, 64> {};

// ...

inline void process_fifo()
{
	while (!buffer::fifo.empty()) {
		auto length = buffer::fifo.pop(buffer::process_buffer.data(), std::size(buffer::process_buffer));
		_default_serial.write(buffer::process_buffer.data(), length);
	}
}

We first pop the data in a 64-byte std::array then pass this buffer to BufferedSerial::write().

The assumptions are that using the temporary 64-byte std::array buffer can:

  • reduce the number of calls to BufferedSerial::write()
  • empty the CircularBuffer (fifo) faster allowing it to be filled faster as well
  • allow the compiler to copy/move bigger chunks of memory and optimize things

But to be honest I’m not sure :joy:

Using chuncks adds 64 bytes of RAM and 64 bytes of flash, which is something we can live with.

But are my assumptions correct? Am I optimizing anything? Or doing premature pessimization?

Our test code seems to be running the same, character output is the same:

  • input 1988 characters/ms in fifo
  • output 12 characters/ms to serial

So what do you guys think? Should we make the change? :slight_smile:

For reference:

From a brief read of BufferedSerial, it looks like it pushes the buffer you supply into its own. one byte at a time. I’m not sure how much you save block writing. A few if statements and a critical section lock/unlock per byte.

From that, if you set BufferedSerial to non-blocking mode, you should be able to do something like

inline void process_fifo()
{
    char c;
    //Assumes already set non-blocking
    while(buffer::fifo.pop(c)) {
        if(_default_serial.write(&c,1) != 1) {
            _default_serial.set_blocking(true);
            _default_serial.write(&c,1);
            _default_serial.set_blocking(false);
            break;
        }
    }
}

This is off the top of my head, so likely has something wrong, but should rapid-fire chars into the serial buffer until it’s full or there’s none to pop. When it fills up we’re left with 1 unwritten char, so we block until there’s space.

If there’s no other operations done by the thread, then I’d say it’s premature optimisation. Block write spends a little less time in critical_section, but saves a couple of usually false if-statements. I’d be surprised if it was noticeable.

If only BufferedSerial exposed the size() method of its write buffer…

Thanks a lot @Tom_Skevington for the detailed answer!

I’ll keep it in mind.

We’ve made the move to sending by chuncks, everything still works, so I’ll keep it as is for now :slight_smile: