Description |
Chardet: The Universal Character Encoding Detector
Detects:
• ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)
• Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified
Chinese)
• EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP (Japanese)
• EUC-KR, ISO-2022-KR, Johab (Korean)
• KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)
• ISO-8859-5, windows-1251 (Bulgarian)
• ISO-8859-1, windows-1252 (Western European languages)
• ISO-8859-7, windows-1253 (Greek)
• ISO-8859-8, windows-1255 (Visual and Logical Hebrew)
• TIS-620 (Thai)
ISO-8859-2 and windows-1250 (Hungarian) probers have been temporarily
disabled.
|