Tags: charset
A PHP string is series of characters, where a character is the same as a byte. This means that PHP only supports a 256-character set, and hence does not offer native Unicode support.
1. Encoding: string will be encoded in whatever fashion it is encoded in the script file
Given that PHP does not dictate a specific encoding for strings, one might wonder how string literals are encoded. For instance, is the string “á” equivalent to “\xE1″ (ISO-8859-1), “\xC3\xA1″ (UTF-8, C form), “\x61\xCC\x81″ (UTF-8, D form) or any other possible representation? The answer is that string will be encoded in whatever fashion it is encoded in the script file. Thus, if the script is written in ISO-8859-1, the string will be encoded in ISO-8859-1 and so on. However, this does not apply if Zend Multibyte is enabled;
2. BOM issue
BOM(Byte-Order-Marker) is at the beginning of some encoding file/stream. e.g., UTF-8′s BOM is “EFBBBF”.
PHP doesn’t like the BOM( e.g. session_start function). You can use tools like UltraEdit save-as to remove BOM.
3. string functions
$chars = htmlentities($chars, ENT_QUOTES, ‘utf-8′);
$chars = htmlspecialchars($chars, ENT_QUOTES, ‘utf-8′);
$chars = html_entity_decode($chars, ENT_QUOTES, ‘utf-8′);
$chars = iconv(‘Big5′,’utf-8′,$chars); //convert string from Big5 to UTF-8
iconv_strlen — Returns the character count of string
iconv_strpos — Finds position of first occurrence of a needle within a haystack
iconv_strrpos — Finds the last occurrence of a needle within a haystack
iconv_substr — Cut out part of a string
4. UTF-8 get_utf8_sub_string() if it contains Chinese characters
The length of a Chinese character is 3 in UTF-8
echo get_utf8_sub_string("a我是标题", 8);
function get_utf8_sub_string($str, $max_length)
{
if(strlen($str) > $max_length)
{
$check_num = 0;
for($i=0; $i < $max_length; $i++)
{
if (ord($str[$i]) > 128)
$check_num++;
}
if($check_num % 3 == 0)
$str = substr($str, 0, $max_length)."...";
else if($check_num % 3 == 1)
$str = substr($str, 0, $max_length + 2)."...";
else if($check_num % 3 == 2)
$str = substr($str, 0, $max_length + 1)."...";
}
return $str;
}