Callback runtime

JojoS · October 13, 2021, 8:18pm

Hello,
I have written a stepper motor driver that uses a variable hardware timer period for generating step pulses.
The hardware timer uses a CThunk to call a classes method from the the timer ISR. then this method is calling a callback for the user function of the timer component.
These two callbacks take about 4 µs runtime on my STM32F407 @ 168 MHz. This seems to be a lot, the calculation of the period time with several floating point operations take less than 2 µs.
Are callbacks so expensive? The program is compiled already as release with optimization. Times are measured with a saleae logic analyzer, sampling with 50 MHz, measuring the step pulse.

Edit:
I have published the source code here:

JojoS · October 14, 2021, 5:29pm

it works, but I haven’t tried to improve the callbacks runtime yet. The CThunk/Callbacks are so convient…

hudakz · October 14, 2021, 8:06pm

Very impressive! Especially as you run the display driver, the touch-screen driver and the stepper driver in parallel with each other.

The hardware timer uses a CThunk to call a classes method from the the timer ISR.

The CThunk/Callbacks are so convient…

There is no reference to the CThunk in the Callback documentation (Docs › API references and tutorials › Platform › Other platform APIs › Callback) and I was not able to find anything elsewhere about CThunk apart of the mbed-os/platform/include/platform/CThunk.h header file. However, according to the Wikipedia, Thunk is utilizing a dispatch table ( a table of pointers or memory addresses to functions or methods). Use of such a table is a common technique when implementing late binding (similar to VTABLE in case of virtual functions). So I think this could contribute to having longer callback execution times (similar as in case of virtual functions).

This seems to be a lot, the calculation of the period time with several floating point operations take less than 2 µs.

The STM32F407 is equipped with an FPU so the floating point calculations are very fast.

JojoS · October 14, 2021, 9:21pm

Thanks for your feedback Zoltan,
yes, this was some test do all at the same time

I have still built it with CLI1, but I wanted to convert some more libraries to CLI2 and then I will publish this code also.
Lvgl is a great library for GUI stuff and not difficult to use. The driver for the ILI9341 needs only one function to copy a render buffer (which is at minimum 10 lines x display width) to the display. This board uses a parallel 16 bit interface, which is very fast when driven by the FSMC interface on the F4. I’m using DMA with low priority, so the fast timer interrupts have low jitter.

I don’t know where I discovered the CThunk, it is used to bind the timer ISR to some class / method. An ISR has no parameters, so there are some levels of indirection and a static table is used. The number of CThunks is also limited and adjustable via a mbed_lib.json setting.
But also like the callbacks, it takes about 2µs, which is time for many instruction on a 168 MHz CPU. So I want to check further why it is taking so long. It maybe faster to use some own jumper table. I remember that the callbacks need good compilier optimization, but I’m using the release set already.

And yes, the FPU is amazing fast and there is really no need to try error prone integer optimization. One thing to remember is to always use float constants with suffix ‘f’, otherwise you will get quickly a penalty of several µs! The bad thing when converting libs for AVR controllers is that they didn’t about the difference between float and double. The AVR C-lib treats double as float and so there is difference on AVR, but on a tiny LPC8xx you blow your flash with one double instruction

boraozgen · October 15, 2021, 9:59am

I bet they are expensive, what annoys me most is the call stack they produce. It can become annoying to read through a call stack with mbed callbacks during debugging.

I wonder why this was developed instead of using std::function - cppreference.com. A performance/feature comparison with std::function would be interesting…

JojoS · October 15, 2021, 10:37am

Thanks, that sounds interesting, I haven’t used the std::function before.
The callbacks are used long time ago in mbed-os2 already, so I guess it is for historical reasons.
I will try to compare both variants.

First google hit for ‘std::function performance’ is

ladislas · October 15, 2021, 11:28am

we use std::function for code not related to mbed.

a PR might also be welcome to extend the different API tu use std::function if it’s possible.

hudakz · October 15, 2021, 11:47am

If possible, you can also try to trade convenience for speed by using static callbacks. Of course, in that case all the data members used in the callback have to be static as well (practically, such callback is a “global” C function wrapped in the class’s namespace). It’s a bit awkward technique I use in my Arduino libraries and in this mbed library.

ladislas · October 15, 2021, 1:41pm

Yes, we’ve actually also used this solution.

But it works only if you have one instance of the class or if all the instances can share the same callback. You can differentiate the caller by having an id parameter that’s different for each object.

JojoS · October 15, 2021, 3:59pm

I found some discussion about mbed::callback and std::function here:

github.com/ARMmbed/mbed-os

Align C++ version with compiler supported.

opened 06:52PM - 15 Nov 17 UTC

closed 08:29AM - 27 Aug 19 UTC

pan-

### Description - Type: Enhancement - Related issue: #5329 --------------…-------------------------------------------------- ## Enhancement **Reason to enhance or problem with existing solution** Mbed OS 5.8 will be a leap forward in compiler supports. ARM compiler v5.06 will be deprecated in favor of v6.7and IAR v7.8 will be deprecated in favor of IAR v8.2. For both of these compilers the major version change induce a change in the version of C++ they support: ### mbed os 5.6 | | C++98 | C++11 | C++14 | |----------|:-----:|:---------:|:---------------------:| | GCC 6 | X | X | X | | ARM 5.06 | X | ~ partial | - | | IAR 7.80 | X | - | - | ### mbed-os 5.8 | | C++98 | C++11 | C++14 | |----------|:-----:|:---------:|:---------------------:| | GCC 6 | X | X | X | | ARMC 6.7 | X | X | beta | | IAR 8.2 | - | - | X | The choice of C++98 as the C++ standard supported by mbed was obvious for mbed OS 5.6 but it is not for mbed OS 5.8 because IAR v8.2 only compile in C++ 14 mode and does not provide backward compatibility mode. As close as they are C++98, C++11 and C++14 standards are not 100% backward compatible. As a consequence an mbed os 5.8 application may compile on one compiler and not the other or act slightly differently. The introduction of new keywords may also breaks existing applications. **Suggested enhancement** There is no lowest common denominator between the compiler that mbed OS 5.8 will support. However C++14 is much closer to C++11 than C++98. I'd suggest to define C++11 as the mode used by C++ compilers when they compile an mbed application and use C++14 when the ARM compiler supports it officially (in beta actually). From my perspective it doesn't mean that our codebase should allow new features of the standards mentioned above. I believe this question should be addressed in a different thread; we can use the CI to ensure that our code base follows our actual guideline. ## Notes: Breaking changes introduced by C++11 can be found in the Section C2 of the [C++11 specification](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3485.pdf) and breaking changes introduced by C++14 can be found in the section C3 of the [C++14 specification](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf).

But that is heafy stuff…

JojoS · October 15, 2021, 7:44pm

I have created a simple test for measuring the cycles for a call:

The ‘penalty’ for using callbacks looks not that big:

callback performance test
Hello from STM32F407VE_BLACK
Mbed OS version: 6.15.0

cyles staticFn          : 16  0.095 us
cyles callback staticFn : 35  0.208 us
cyles callback memberFn : 51  0.304 us

For the ISR, there should be additional 12+17 (for FPU) cycles.

edit:
ok, I added CThunk to the test, and there I have my 2 µs (when I add ISR cycles):

cyles cthunk            : 277  1.649 us

Can I improve the runtime in my code by putting the code into RAM? How can I do this?

hudakz · October 15, 2021, 9:44pm

The STM32F407 is equipped with a 64kB of Core Coupled Memory (CCM) allowing 0-wait state execution. The CCM is usually used to store critical data. However, it can be used to store code instead of data. To use it we need to define this memory region inside the linker script as discussed in this post. But for storing code in CCM rather than data the ccm region should be modified as below:

.ccm :
{
    . = ALIGN(8);
    *(.ccm .ccm*)
} > CCM

To relocate a specific function inside the CCM we can use the GCC keyword __attribute__.
For example:

void __attribute__((section(".ccm"))) function_name() 
{
...
}

JojoS · October 16, 2021, 11:39am

Thanks for reminding of my older thread, I had it already forgotten
In my custom_target lib, I do not use a modified linker script yet, I must check how to include this in CLI2.

Another question about this: is it possible with CLI2 to define different linker script in the application configuration? I could create multiple custom target with different linker scripts, but I’m not sure if this is a good practice.
Different linker scripts are needed when using F7/H7 with RAM at different busses, for DMA its neccessary to have control over the used RAM sections.

For optimization of HWTimer, I wll remove now the CThunk and use a fixed number of static handlers that are assigned during Timer initialization.

hudakz · October 18, 2021, 10:00am

Is it possible with CLI2 to define different linker script in the application configuration?

What concerns the CLI2 so far I have manged to take only four steps in the online CMake tutorial. So I’m afraid that this question should be answered by more skilled guys (like Ladislas, Bora, Jamie Smith …).

JojoS · October 18, 2021, 10:39am

which online tutorial do you mean? The link is not working.

Yes, learning cmake takes some time. Yesterday, I have prepared the sources for my StepperController project for CLI2 and stepped into the same trap as before. The logfiles and generated files are hard to read.

For some commands the order in CMakeLists matters, but in general, I like it already.

About the CThunk: it is used in I2C, SPI and UART interrupt driven async API, so it looks like it is only missing in the documentation. But I’m still thinking about replacing it by static ISR handlers.

hudakz · October 18, 2021, 10:49am

which online tutorial do you mean? The link is not working.

I’m sorry. I pasted a wrong link It should work now.

ladislas · October 18, 2021, 11:38am

By “custom target”, you mean hardware or cmake? I’ll assume hardware for my answer.

I haven’t tried it, but in theory you could. Having a linker script/custom target does make sense as your custom target might be used for different things.

That being said, you will need to recompile everything when you change your custom target as configuration is global for one hardware target.

In the future it would be nice to be able to have multiple hardware targets used for different cmake targets. Especially useful when you have a MCU with two or more cores or if you have more than one mcu on your custom board and want to build firmware for each one of them.

The way we “handle” that for now is to use different build directories for cmake configuration steps. For example we have one for unit tests, one for the tools, one for the main product, one for the prototypes

JojoS · October 18, 2021, 1:45pm

with ‘custom target’ I mean it in the way as this term is used in Mbed. So yes, its hardware that is not included in the main mbed-os repo. Especially for STM32 its easy to derive from existing MCU definitions, thanks to Jerome.

When I understand it right, for each MCU a mbed-mcu is created, e.g. mbed-stm32f103x8

github.com

ARMmbed/mbed-os/blob/06f234e3af8d9718218ec2e7e53721e0b1472768/targets/TARGET_STM/TARGET_STM32F1/TARGET_STM32F103xB/CMakeLists.txt#L27


      
              INTERFACE
                  system_clock.c
                  ${STARTUP_FILE}
          )
          
          
target_include_directories(mbed-stm32f103xb
              INTERFACE
                  .
          )
          
          
mbed_set_linker_script(mbed-stm32f103xb ${CMAKE_CURRENT_SOURCE_DIR}/${LINKER_FILE})
          
          
target_link_libraries(mbed-stm32f103xb INTERFACE mbed-stm32f1)

This library contains the definitions for startup code and linkerscript. Then a particular hardware has definitions for its peripherial pins and is linked to its mbed-mcu.
So I need to create a custom target with an own mbed-mcu definition to use a different .ld file? I don’t know if mbed_set_linker_script can be overwritten, haven’t checked it yet.

And one missing piece in the puzzle for me is, how is a target like e.g. NUCLEO_STM32F103RB coming into the game so that this link chain is choosen?

ok, I found an answer for the last question:
the magic is in mbed-tools configure, this writes the

github.com

ARMmbed/mbed-tools/blob/9719a6bbbe332d2a1dc8e2d2e47f980488ad5389/src/mbed_tools/build/config.py#L42-L43


      
          cmake_config_file_path = program.files.cmake_build_dir / CMAKE_CONFIG_FILE
          write_file(cmake_config_file_path, cmake_file_contents)

Then this mbed_config.cmake is used in mbed–os/CMakeLists.txt

ladislas · October 18, 2021, 2:32pm

Yes, that would be it. Not sure it would work out of the box but with cmake we’ll find a way.

I don’t think you would need that.

Cowessess_Nuka · January 5, 2022, 9:31am

A callback function is a function passed into another function as an argument , which is then invoked inside the outer function to complete some kind of routine or action.

Topic		Replies	Views
Vary timer period programmatically Mbed OS	3	417	October 17, 2021
Timer and Timout used in a class. How to get it working Mbed OS	0	379	September 6, 2018
Is it possible to get time in the order of 10^-7 or 10^-8 second? Best practice mbed_os , stmicroelectronics	6	418	October 28, 2022
WARNING: callback 0 took 39ms Mbed OS minar , mbed_os	3	1312	June 3, 2016
Controlling several Stepper Motores in multiple threads Mbed OS	3	973	March 19, 2021

Callback runtime

Related topics