When you are working on Windows, sometimes you may meet a problem like your code cannot find the folder or file by the giving name. But when you manually check it, the specific folder or file exists in the give path. This problem always comes out when your folder name or file name is Unicode characters, such as Chinese, Japanese, or Tamil.
For example, the Chinese character in Windows is not encoded by UTF8, but CP936 (CP936 is considered the same as GBK, though they are different). Code page 936 (CP936) is Microsoft’s character encoding for simplified Chinese. Though it was superseded by BG18030(code page 54936), but it was still used in Windows 7 and later version.
This encoding problem also happens when you copy some files from windows to linux. In Linux, the locale is set to UTF-8. Hence, when you copy the files from windows, whose file name is encoded in gbk or gb18030, the files name will become unreadable.
Read more