Isn’t on big endian machines UTF-8’s byte order different than on little endian machines? So why then doesn’t UTF-8 require a BOM?

The byte order is different on big endian vs little endian machines for words/integers larger than a byte.

e.g. on a big-endian machine a short integer of 2 bytes stores the 8 most significant bits in the first byte, the 8 least significant bits in the second byte. On a little-endian machine the 8 most significant bits will the second byte, the 8 least significant bits in the first byte.

So, if you write the memory content of such a short int directly to a file/network, the byte ordering within the short int will be different depending on the endianness.

UTF-8 is byte oriented, so there’s not an issue regarding endianness. the first byte is always the first byte, the second byte is always the second byte etc. regardless of endianness.

Leave a Comment