I didn’t notice this comment previously. I want to follow up on this:
wchar does not carry any meaning of text encoding.
char is a 8bit type and
wchar is a 16bit type. Either can represent any number of encodings.
However, if you have US ASCII you don’t use
US ASCII is 7bit. Using a 16bit type is wasting bits.
UTF-8 and UTF-16 are both encoding types where each characters is represented of a varying length.
UTF-8 can be from 1 to 4 bytes per character. Because of that you normally use
char* so you can represent the lower ranges without excess bits. (The 1-byte code UTF-8 characters matches the US ASCII characters.)
UTF-16 can be from 2 to 4 bytes - so you typically use
wchar* for this. But note that one
wchar does not represent a character (or code point). So to get the string size you cannot count bytes.
UTF-32 is 4 bits - so this one is of fixed length. But I don’t know what might use this encoding. For latin based languages this wastes a lot of bytes. For other languages the difference might be less. But still I think UTF-32 is rather rare.
That being said, Windows was one of the first to implement unicode, and originally it was in the form of UCS-2. I think this was a fixed 16bit per character (So
wchar* fit well for that.) These days the Win32 API uses UTF-16 which can be seen as a superset of UCS-2.