What is the most accurate encoding detector? [closed]

I’ve checked juniversalchardet and ICU4J on some CSV files, and the results are inconsistent: juniversalchardet had better results: UTF-8: Both detected. Windows-1255: juniversalchardet detected when it had enough hebrew letters, ICU4J still thought it was ISO-8859-1. With even more hebrew letters, ICU4J detected it as ISO-8859-8 which is the other hebrew encoding(and so the text … Read more

Convert non-ASCII characters (umlauts, accents…) to their closest ASCII equivalent (for slug creation)

The easiest way I’ve found: var str = “Rånades på Skyttis i Ö-vik”; var combining = /[\u0300-\u036F]/g; console.log(str.normalize(‘NFKD’).replace(combining, ”)); For reference see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

What is the most efficient binary to text encoding?

This really depends on the nature of the binary data, and the constraints that “text” places on your output. First off, if your binary data is not compressed, try compressing before encoding. We can then assume that the distribution of 1/0 or individual bytes is more or less random. Now: why do you need text? … Read more

Why do symbols like apostrophes and hyphens get replaced with black diamonds on my website?

It’s an encoding problem. You have to set the correct encoding in the HTML head via meta tag: <meta http-equiv=”Content-Type” content=”text/html; charset=ISO-8859-1″> Replace “ISO-8859-1” with whatever your encoding is (e.g. ‘UTF-8’). You must find out what encoding your HTML files are. If you’re on an Unix system, just type file file.html and it should show … Read more

C programming: How can I program for Unicode?

C99 or earlier The C standard (C99) provides for wide characters and multi-byte characters, but since there is no guarantee about what those wide characters can hold, their value is somewhat limited. For a given implementation, they provide useful support, but if your code must be able to move between implementations, there is insufficient guarantee … Read more

How can I determine the character encoding of an excel file? [duplicate]

For Excel 2010 it should be UTF-8. Instruction by MS : http://msdn.microsoft.com/en-us/library/bb507946: “The basic document structure of a SpreadsheetML document consists of the Sheets and Sheet elements, which reference the worksheets in the Workbook. A separate XML file is created for each Worksheet. For example, the SpreadsheetML for a workbook that has two worksheets name … Read more