Python 3: os.walk() file paths UnicodeEncodeError: ‘utf-8’ codec can’t encode: surrogates not allowed

On Linux, filenames are ‘just a bunch of bytes’, and are not necessarily encoded in a particular encoding. Python 3 tries to turn everything into Unicode strings. In doing so the developers came up with a scheme to translate byte strings to Unicode strings and back without loss, and without knowing the original encoding. They …

Read more

UnicodeDecodeError: (‘utf-8’ codec) while reading a csv file [duplicate]

Known encoding If you know the encoding of the file you want to read in, you can use pd.read_csv(‘filename.txt’, encoding=’encoding’) These are the possible encodings: https://docs.python.org/3/library/codecs.html#standard-encodings Unknown encoding If you do not know the encoding, you can try to use chardet, however this is not guaranteed to work. It is more a guess work. import …

Read more

Removing unicode \u2026 like characters in a string in python2.7 [duplicate]

Python 2.x >>> s ‘This is some \\u03c0 text that has to be cleaned\\u2026! it\\u0027s annoying!’ >>> print(s.decode(‘unicode_escape’).encode(‘ascii’,’ignore’)) This is some text that has to be cleaned! it’s annoying! Python 3.x >>> s=”This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying!” >>> s.encode(‘ascii’, ‘ignore’) b”This is some text that has to be …

Read more

Python string argument without an encoding

You are passing in a string object to a bytearray(): bytearray(content[current_pos:(final_pos)]) You’ll need to supply an encoding argument (second argument) so that it can be encoded to bytes. For example, you could encode it to UTF-8: bytearray(content[current_pos:(final_pos)], ‘utf8’) From the bytearray() documentation: The optional source parameter can be used to initialize the array in a …

Read more

“SyntaxError: Non-ASCII character …” or “SyntaxError: Non-UTF-8 code starting with …” trying to use non-ASCII text in a Python script

I’d recommend reading that PEP the error gives you. The problem is that your code is trying to use the ASCII encoding, but the pound symbol is not an ASCII character. Try using UTF-8 encoding. You can start by putting # -*- coding: utf-8 -*- at the top of your .py file. To get more …

Read more

Why does ENcoding a string result in a DEcoding error (UnicodeDecodeError)?

“你好”.encode(‘utf-8′) encode converts a unicode object to a string object. But here you have invoked it on a string object (because you don’t have the u). So python has to convert the string to a unicode object first. So it does the equivalent of “你好”.decode().encode(‘utf-8′) But the decode fails because the string isn’t valid ascii. …

Read more

Python – ‘ascii’ codec can’t decode byte

“你好”.encode(‘utf-8′) encode converts a unicode object to a string object. But here you have invoked it on a string object (because you don’t have the u). So python has to convert the string to a unicode object first. So it does the equivalent of “你好”.decode().encode(‘utf-8′) But the decode fails because the string isn’t valid ascii. …

Read more