Above code works like a charm with https://www.onlinegdb.com/ (does not run on most other online compilers).
It also does not work as expected with Mbed OS V6.12 + GNU compiler (8 2018q 4 major) or ARMC 6.15.
Not even if i invoke standard printf() via mbed_app.json.
With ARMC 6.15 i also face the problem that printf does not output anything on my boards (MAX32630FTHR and Artemis Thing Plus), that is why i modified the code to use BufferedSerial instead, but even that way only english characters are printed right, while non-english characters display wrong.
I’m on Linux and no one of the serial terminals mentioned by Simon is available to test he’s utf8 example.
I think the results depend also on the font selected in the serial terminal program.
Thanks for the tip. I am using Arduino’s serial monitor and also putty. Both output UTF-8 characters from Arduino code, so the serial monitor side seems to be OK.
Forgot to mention i am using Mbed Studio on a Windows 10 computer.
If i run the locale -a command in powerShell inside my project folder a ton of locales get listed including C, C.utf8, POSIX, etc. That is why i get a feeling these are not mistaken with windows(?) locales, though i am not sure in this regard.
If i also add
if (setlocale(LC_ALL, "C.utf8") == NULL) {
printf("setlocale failed!\n");
}
to my code, then it does not find C.utf8 locale. Only “”, “C” and “POSIX” locales do not return NULL.
So to sum it up it seems on a windows 10 computer the “C” locale will be used even if you have set “POSIX” or “”, and no other locales are recognized by the code even if locale -a lists many of those.
Would my originally posted code work if i installed a linux system on a virtual machine on top of windows 10 and i would code in the Linux version of Mbed Studio?
Meanwhile i also tried to get my hands dirty with Linux. So installed Ubuntu 20.04 in virtual machine on my Win10 computer. My code does not work even if using Linux version of Studio. So either setlocale() is meant to be used for something different or this might be a bug in Mbed OS.
I also tried Simon’s really old utf8 example (Mbed 2) Zoltan suggested. That contains #pragma import(__use_utf8_ctype)
However Mbed Studio comes with ARMC 6.15 and Studio does not even compile it because of this error:
‘#pragma import’ is an ARM Compiler 5 extension, and is not supported by ARM Compiler 6 [-Warmcc-pragma-import]
So i configured Mbed Studio to use GCC, in which case i get only a warning:
Unknown pragma ignored clang(-Wunknown-pragmas)
However that way a cardinal line has no effect and there is no magic, we get the wrong characters.
Toyo’s solution is a good fit for outputting hard coded text, however i was visioning about user input in any language.
Since my original post i managed to display Unicode characters and even emojis on the screen of my device and those also display right in serial monitor which is great in comparison where i started.
In the past few days i read further in the issue and possible C/C++ solutions. Based on my new knowledge i extended the Mbed code i posted in the 1st comment. The updated Mbed code is:
#include "mbed.h"
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main() {
setlocale(LC_ALL, "");
wchar_t hello_eng[] = L"Hello World!";
wchar_t hello_china[] = L"世界, 你好!";
wchar_t hello_japan[] = L"こんにちは日本!";
char plainString_asText[] = "ሄሎ ዓለም!"; //Local encoding, whatever that may be.
char plainString_asHex[] = "\xe1\x88\x84\xe1\x88\x8e\x20\xe1\x8b\x93\xe1\x88\x88\xe1\x88\x9d\x21";
wchar_t wideString[] = L"Përshëndetje botë!"; //Wide characters, usually UTF-16 or UTF-32.
//char utf8String[] = u8"Всем привет!"; //UTF-8 encoding. error: use of undeclared identifier 'u8'
//char16_t utf16String[] = u"Përshëndetje botë!"; //UTF-16 encoding. error: use of undeclared identifier 'u'
//char32_t utf32String[] = U"Сәлемет пе әлем!"; //UTF-32 encoding. error: use of undeclared identifier 'U'
while (1) {
printf("%ls\n", hello_eng);
printf("%ls\n", hello_china);
printf("%ls\n", hello_japan);
printf("%s\n", plainString_asText);
printf("%s\n", plainString_asHex);
printf("%ls\n", wideString);
//printf("%s\n", utf8String);
//printf("%ls\n", (wchar_t*)utf16String); // corrupt output
//printf("%ls\n", (wchar_t*)utf32String);
wait_ms(2000);
}
}
The above code works on Mbed Simulator (except for the commented lines which work however on onlinegdb.com), but the same code just outputs garbage when compiled locally with GCC_ARM compiler + Mbed Studio V1.4.1 + Mbed OS 6.12 on my Artemis board.
So there are some inconsistencies, which does not help much.