encoding – Row Coding

One hot encoding of string categorical features

November 29, 2023 by Tarik

If you are on sklearn>0.20.dev0 In [11]: from sklearn.preprocessing import OneHotEncoder …: cat = OneHotEncoder() …: X = np.array([[‘a’, ‘b’, ‘a’, ‘c’], [0, 1, 0, 1]], dtype=object).T …: cat.fit_transform(X).toarray() …: Out[11]: array([[1., 0., 0., 1., 0.], [0., 1., 0., 0., 1.], [1., 0., 0., 1., 0.], [0., 0., 1., 0., 1.]]) If you are on … Read more

UTF-8 in Windows 7 CMD [duplicate]

November 28, 2023 by Tarik

This question has been already answered in Unicode characters in Windows command line – how? You missed one step -> you need to use Lucida console fonts in addition to executing chcp 65001 from cmd console.

How to config visual studio to use UTF-8 as the default encoding for all projects?

November 27, 2023 by Tarik

Visual Studio supports EditorConfig files (https://editorconfig.org/) Visual Studio (VS2017 and later) searches for a file named ‘.editorconfig’ in the directory containing your source files, or anywhere above this directory in the hierarchy. This file can be used to direct the editor to use utf-8. I use the following: [*] end_of_line = lf charset = utf-8 … Read more

What is the most efficient binary to text encoding?

November 27, 2023 by Tarik

This really depends on the nature of the binary data, and the constraints that “text” places on your output. First off, if your binary data is not compressed, try compressing before encoding. We can then assume that the distribution of 1/0 or individual bytes is more or less random. Now: why do you need text? … Read more

What is the difference between #encode and #force_encoding in ruby?

November 26, 2023 by Tarik

Difference is pretty big. force_encoding sets given string encoding, but does not change the string itself, i.e. does not change it representation in memory: ‘łał’.bytes #=> [197, 130, 97, 197, 130] ‘łał’.force_encoding(‘ASCII’).bytes #=> [197, 130, 97, 197, 130] ‘łał’.force_encoding(‘ASCII’) #=> “\xC5\x82a\xC5\x82” encode assumes that the current encoding is correct and tries to change the string … Read more

Base64 Encoding safe for filenames?

November 25, 2023 by Tarik

Modified Base64 (when /,= and + are replaced) is safe to create names but does not guarantee reverse transformation due to case insensitivity of many file systems and urls. Base64 is case sensitive, so it will not guarantee 1-to-1 mapping in cases of case insensitive file systems (all Windows files systems, ignoring POSIX subsystem cases). … Read more

How to Base64 encoding on the iPhone

November 24, 2023 by Tarik

You can see an example here. This is for iOS7+. I copy the code here, just in case: // Create NSData object NSData *nsdata = [@”iOS Developer Tips encoded in Base64″ dataUsingEncoding:NSUTF8StringEncoding]; // Get NSString from NSData object in Base64 NSString *base64Encoded = [nsdata base64EncodedStringWithOptions:0]; // Print the Base64 encoded string NSLog(@”Encoded: %@”, base64Encoded); // … Read more

XML and JSON tags for a Golang struct?

November 22, 2023 by Tarik

Go tags are space-separated. From the manual: By convention, tag strings are a concatenation of optionally space-separated key:”value” pairs. Each key is a non-empty string consisting of non-control characters other than space (U+0020 ‘ ‘), quote (U+0022 ‘”‘), and colon (U+003A ‘:’). Each value is quoted using U+0022 ‘”‘ characters and Go string literal syntax. … Read more

Python 3 CSV file giving UnicodeDecodeError: ‘utf-8’ codec can’t decode byte error when I print

November 21, 2023 by Tarik

We know the file contains the byte b’\x96′ since it is mentioned in the error message: UnicodeDecodeError: ‘utf-8′ codec can’t decode byte 0x96 in position 7386: invalid start byte Now we can write a little script to find out if there are any encodings where b’\x96’ decodes to ñ: import pkgutil import encodings import os … Read more

Text files uploaded to S3 are encoded strangely?

November 21, 2023 by Tarik