How to auto detect text file encoding?
Problem
There are many plain text files which were encoded in variant charsets. I want to convert them all to UTF-8, but before running iconv, I need to know its original encoding. Most browsers have an option in encodings, however, I can't check those text files one by one because there are too many. Only having known the original encoding, I then can convert the texts by . Is there any utility to detect the encoding of plain text files? It DOES NOT have to be 100% perfect, I don't mind if there're 100 files misconverted in 1,000,000 files.
Error Output
Auto Detect
Unverified for your environment
Select your OS to check compatibility.
1 Fix
Fix for: How to auto detect text file encoding?
Try the chardet Python module, which is available on PyPI: Then run . Chardet is based on the detection code used by Mozilla, so it should give reasonable results, provided that the input text is long enough for statistical analysis. Do read the project documentation. As mentioned in comments it is quite slow, but some distributions also ship the original C++ version as @Xavier has found in https://superuser.com/a/609056. There is also a Java version somewhere.
Awaiting Verification
Be the first to verify this fix
Sign in to verify this fix