How to avoid tripping over UTF-8 BOM when reading files

With ruby 1.9.2 you can use the mode r:bom|utf-8 text_without_bom = nil #define the variable outside the block to keep the data File.open(‘file.txt’, “r:bom|utf-8”){|file| text_without_bom = file.read } or text_without_bom = File.read(‘file.txt’, encoding: ‘bom|utf-8’) or text_without_bom = File.read(‘file.txt’, mode: ‘r:bom|utf-8′) It doesn’t matter, if the BOM is available in the file or not. You may …

Read more

XDocument: saving XML to file without BOM

Use an XmlTextWriter and pass that to the XDocument’s Save() method, that way you can have more control over the type of encoding used: var doc = new XDocument( new XDeclaration(“1.0”, “utf-8”, null), new XElement(“root”, new XAttribute(“note”, “boogers”)) ); using (var writer = new XmlTextWriter(“.\\boogers.xml”, new UTF8Encoding(false))) { doc.Save(writer); } The UTF8Encoding class constructor has …

Read more

How to GetBytes() in C# with UTF8 encoding with BOM?

Try like this: public ActionResult Download() { var data = Encoding.UTF8.GetBytes(“some data”); var result = Encoding.UTF8.GetPreamble().Concat(data).ToArray(); return File(result, “application/csv”, “foo.csv”); } The reason is that the UTF8Encoding constructor that takes a boolean parameter doesn’t do what you would expect: byte[] bytes = new UTF8Encoding(true).GetBytes(“a”); The resulting array would contain a single byte with the value …

Read more

Adding UTF-8 BOM to string/Blob

Prepend \ufeff to the string. See http://msdn.microsoft.com/en-us/library/ie/2yfce773(v=vs.94).aspx See discussion between @jeff-fischer and @casey for details on UTF-8 and UTF-16 and the BOM. What actually makes the above work is that the string \ufeff is always used to represent the BOM, regardless of UTF-8 or UTF-16 being used. See p.36 in The Unicode Standard 5.0, Chapter …

Read more