One hot encoding of string categorical features

If you are on sklearn>0.20.dev0 In [11]: from sklearn.preprocessing import OneHotEncoder …: cat = OneHotEncoder() …: X = np.array([[‘a’, ‘b’, ‘a’, ‘c’], [0, 1, 0, 1]], dtype=object).T …: cat.fit_transform(X).toarray() …: Out[11]: array([[1., 0., 0., 1., 0.], [0., 1., 0., 0., 1.], [1., 0., 0., 1., 0.], [0., 0., 1., 0., 1.]]) If you are on … Read more

How to config visual studio to use UTF-8 as the default encoding for all projects?

Visual Studio supports EditorConfig files (https://editorconfig.org/) Visual Studio (VS2017 and later) searches for a file named ‘.editorconfig’ in the directory containing your source files, or anywhere above this directory in the hierarchy. This file can be used to direct the editor to use utf-8. I use the following: [*] end_of_line = lf charset = utf-8 … Read more

What is the most efficient binary to text encoding?

This really depends on the nature of the binary data, and the constraints that “text” places on your output. First off, if your binary data is not compressed, try compressing before encoding. We can then assume that the distribution of 1/0 or individual bytes is more or less random. Now: why do you need text? … Read more

What is the difference between #encode and #force_encoding in ruby?

Difference is pretty big. force_encoding sets given string encoding, but does not change the string itself, i.e. does not change it representation in memory: ‘łał’.bytes #=> [197, 130, 97, 197, 130] ‘łał’.force_encoding(‘ASCII’).bytes #=> [197, 130, 97, 197, 130] ‘łał’.force_encoding(‘ASCII’) #=> “\xC5\x82a\xC5\x82” encode assumes that the current encoding is correct and tries to change the string … Read more

Base64 Encoding safe for filenames?

Modified Base64 (when /,= and + are replaced) is safe to create names but does not guarantee reverse transformation due to case insensitivity of many file systems and urls. Base64 is case sensitive, so it will not guarantee 1-to-1 mapping in cases of case insensitive file systems (all Windows files systems, ignoring POSIX subsystem cases). … Read more

How to Base64 encoding on the iPhone

You can see an example here. This is for iOS7+. I copy the code here, just in case: // Create NSData object NSData *nsdata = [@”iOS Developer Tips encoded in Base64″ dataUsingEncoding:NSUTF8StringEncoding]; // Get NSString from NSData object in Base64 NSString *base64Encoded = [nsdata base64EncodedStringWithOptions:0]; // Print the Base64 encoded string NSLog(@”Encoded: %@”, base64Encoded); // … Read more

XML and JSON tags for a Golang struct?

Go tags are space-separated. From the manual: By convention, tag strings are a concatenation of optionally space-separated key:”value” pairs. Each key is a non-empty string consisting of non-control characters other than space (U+0020 ‘ ‘), quote (U+0022 ‘”‘), and colon (U+003A ‘:’). Each value is quoted using U+0022 ‘”‘ characters and Go string literal syntax. … Read more

Python 3 CSV file giving UnicodeDecodeError: ‘utf-8’ codec can’t decode byte error when I print

We know the file contains the byte b’\x96′ since it is mentioned in the error message: UnicodeDecodeError: ‘utf-8′ codec can’t decode byte 0x96 in position 7386: invalid start byte Now we can write a little script to find out if there are any encodings where b’\x96’ decodes to ñ: import pkgutil import encodings import os … Read more