Languages, Unicode and Charset
If your application needs to support multiple languages or if it needs to support languages with different character sets such as simplified Chinese (GB2312, GBK, GB18030, HZ,..) or traditional Chinese(BIG5, HKSCS, EUC-TW) you’ll need to make yourself familiar with Unicode and the different character sets. In this article, we’ll focus on introducing character sets, manipulating and converting charsets and the possible challenges you may encounter while handling Unicode text files. If you plan to support multiple languages, you’ll also have to internationalize your application, for example by using Po files for different languages, a Po file editor and possibly have the translations done in launchpad if your project is open source. But this would be another subject. Go for Unicode If you are building a new application make sure its structure is based on Unicode (UTF-8, UCS-2, UTF-16 or UTF-32 ) since those charsets can handle most written languages (UTF: […]