How To Read And Write Windows Folder Or File in UTF8

On February 6, 2015, in Other Online Technology, by James Liu
unicode-utf8

When you are working on Windows, sometimes you may meet a problem like your code cannot find the folder or file by the giving name. But when you manually check it, the specific folder or file exists in the give path. This problem always comes out when your folder name or file name is Unicode characters, such as Chinese, Japanese, or Tamil.

For example, the Chinese character in Windows is not encoded by UTF8, but CP936 (CP936 is considered the same as GBK, though they are different). Code page 936 (CP936) is Microsoft’s character encoding for simplified Chinese. Though it was superseded by BG18030(code page 54936), but it was still used in Windows 7 and later version.

This encoding problem also happens when you copy some files from windows to linux. In Linux, the locale is set to UTF-8. Hence, when you copy the files from windows, whose file name is encoded in gbk or gb18030, the files name will become unreadable.

Here is the solution for reading and writing folder or file whose name is in Chinese. In this solution, I assume the chinese name given by UTF8. To find the right folder or file, we can convert the chinese string from UTF8 to CP936.

<?php
	$char = trim($_POST&#91;'char'&#93;);
	$windowsChar = iconv("utf-8", "cp936", $char); //or $windowsChar = iconv("UTF-8", "GBK", $char);
	$filePath = $windowsChar.".txt";
	$content = file_get_contents($filePath);
?>
 

Leave a Reply

Premium WordPress Themes

Premium WordPress Themes