USBMSD transfer speed

Long time ago i tried USBMSD on my MAX32630FTHR with this example relying on Mbed OS 5.7.5

What i noticed back then was:
If i compiled the example with the online compiler or with Mbed Studio, i got ~350 KB/s throughput
If i compiled the same example with Mbed CLI using GCC_ARM, i got ~700 KB/s.

There is a new PR in the github repository that will implement Mbed’s new USB stack to this board and will hopefully shipped with V6.7. That is why i am revisiting the topic, however this time i simply can not reach the ~700 KB/s bandwith, i could reach last year with CLI+GCC. Despite using the same old example on the HW. Even the SD card is the same. This somewhat puzzles me.

I already tried the new PR with current USB APIs and i achieve ~350 KB/s. The board itself supports USB Full speed only (12 Mbit/s but that should allow ~700-800 KB/s actual payload). So question is how could i improve the transfer speed to close the gap?