2013-05-07

JavaScript Character Encoding, UTF-8

If you don't know already, let me tell you that JavaScript is, well, let's say "quirky".

As I write this entry, it appears that the world has finally settled on UTF-8 as the new character encoding standard to replace ASCII. JavaScript, however, came into being in a different age -- a long, dark period of transition when everyone had realized that ASCII needed to be replaced, but nobody knew what the replacement would look like.

One of the first ideas floated was to double the size of the character, from 8 bits to 16. There are various ways to refer to this (like "UCS-2"), some more "kosher" than others, but they all share one thing: they are all obsolete. And JavaScript, in its quirkiness, uses this obsolete convention. Yes, in JavaScript, internally all characters are 16 bits long.

This can lead to no end of grief. Happily, all the grief can be avoided by putting the following line in the <HEAD> block of the HTML document carrying your JavaScript code:
<meta charset='utf-8'>

(You can use double-quotes instead of single-quotes to quote the string, if you like.)

This is all I needed, but if you more verbiage concerning JavaScript character encoding, you can read:

No comments:

Post a Comment