Debug print hungs the MCU

I am working with an STM32L4S7ZIT6 MCU. The MCU is on a NUCLEO-L4A6ZG board. I have to prepare a bootloader for this MCU. So I started doing some very primitive tests starting from the blinky example. Here is my main.cpp:

#include <mbed.h>
#include "serial_wire.hpp"
#include "PinNames.h"
// Blinking rate in milliseconds
#define BLINKING_RATE     500ms

#ifdef BOOTLOADER
DigitalOut led(LED1);
#else
DigitalOut led(LED2);
#endif

Thread blink_thread(osPriorityNormal, OS_STACK_SIZE, nullptr, "blinky-klinky");

FileHandle *mbed::mbed_override_console(int) {
    static SWO_Channel debugport("swo_channel");
    return &debugport;
}

void blinky(void) {
    debug("blinky starts...\n");
    while(true) {
        led.write(0);
        ThisThread::sleep_for(BLINKING_RATE);
        led.write(1);
        ThisThread::sleep_for(BLINKING_RATE);
        debug("run blinky run\n");
    }
}

int main() {
    debug("fw starts here\n");
    // ThisThread::sleep_for(1ms); <- Here is the problem
    blink_thread.start(blinky);
    ThisThread::sleep_for(2000ms);
#ifdef BOOTLOADER
    debug("switching to the application at: 0x%x\n", POST_APPLICATION_ADDR);
    mbed_start_application(POST_APPLICATION_ADDR);
#endif
    debug("app starts here\n");
    while (true) {
        ThisThread::sleep_for(BLINKING_RATE);
        debug("#GRN#infinite loop\r\n");
    }
}

It is very clear, simple and straight forward. My only purpose was being sure that I am jumping from the btl. to the app. However, I discovered that the MCU is hanging once it starts the thread when I comment out the 1ms delay. I must uncomment the delay to get it work. Or I have to remove the debug() functions either in the thread or at the beginning of main(). What I do not understand is this does not happen when I run the same code on an F407VGT MCU. Why I observe this issue? It is very annoying because I observe the same behaviour also in the release fw. Here someone had the same issue when he calls printf() in an ISR. One of the answers says: printf() is not normally reentrant. I believe I am having the same problem with him but I still do not have an answer why the same code runs on an F407VGT without any issue? Does anybody have an idea?

Quick notes: F407 works @ 168 mhz while L4S7 @ 120. At the very beginning I was building the code without bootloader functionality (without any shifting in the text section). But it was always the same from the beginning

Thanks

Hi

About POST_APPLICATION_ADDR
Is it also well managed in your linker script?

Hello,
This is actually a bit confusing for me. I am not sure whether I have to define it somewhere explicitly. I thought this is done automatically during code compilation. Here is my custom_targets.json:

{
    "NUCLEO_L4PICO_MOD": {
        "inherits": [
            "MCU_STM32L4S7xI"
        ],
        "supported_form_factors": [
            "ARDUINO_UNO"
        ],
        "device_name": "STM32L4S7ZITx",
        "device_has_add": [
            "USBDEVICE",
            "QSPI"
        ],
        "bootloader_supported": true
    },

    "NUCLEO_L4PICO_APP": {
        "inherits": ["NUCLEO_L4PICO_MOD"]
    },

    "NUCLEO_L4PICO_BTL": {
        "inherits": ["NUCLEO_L4PICO_MOD"],
        "macros_add": ["BOOTLOADER"]   
    }
}

I see all the debug messages in the correct order when I run the code and also LED1 stops flashing at correct time. I also read that the POST_APPLICATION_ADDR is correct regarding to these two "target.restrict_size": "0x40000", "target.app_offset": "0x40000" which I set for NUCLEO_L4PICO_BTL and NUCLEO_L4PICO_APP in my mbed_app.json.

At the very beginning I compiled the code for L4PICO_MOD and I discovered this problem with the thread. Then when got I sure that it is not blocking anything, I proceeded with the further tests about jumping to the application. Now everything works as desired.
Thanks