Guide for avoiding Unicode/UTF-8 issues

0 1950

It has been observed that usually the error is more of an issue in PHP itself rather than something that you might run into while debugging PHP. However, it has never been adequately addressed so far.

Initially, PHP6's was supposed to launched with Unicode-aware version but that was stalled and put on hold when its development was suspended back in 2010.

But this shouldn’t stop the developers from properly handling UTF-8 error or avoiding the error assuming that all strings arising in due course will be "plain old ASCII".

One must understand that any code that fails to properly examine and address the non-ASCII string is notorious and holds strong potential for introducing multiple heisenbugs into your code. This means even simple strlen($_POST['name']) calls can cause problem, if someone with a last name like "Singh" tries to sign-up into your system.

Therefore, it's important to address the issue critically. Here's a small checklist that help you avoid such problems in your code-

First and foremost, step is to understand and make yourself aware about Unicode and UTF-8 environment.
Make sure your PHP code files are also UTF-8 encoded to avoid collisions when concatenating strings with hardcoded or configured string constants.
Be sure to always use the mb_* functions instead of the old string functions (make sure the "multibyte" extension is included in your PHP build).
Make sure your database and tables are set to use Unicode (many builds of MySQL still use latin1 by default).
Remember that json_encode( ) converts non-ASCII symbols (e.g., "Schrödinger" becomes "Schr\u00f6dinger") but serialize( ) does not.
A particularly valuable resource in this regard is the UTF-8 Primer for PHP and MySQL.

These simple steps can help you resolve and mitigate undue errors while coding your software.

As a software developer it's important that you constantly challenging and update your knowledge to stay ahead of the competition.