One hot encoding of string categorical features

If you are on sklearn>0.20.dev0 In [11]: from sklearn.preprocessing import OneHotEncoder …: cat = OneHotEncoder() …: X = np.array([[‘a’, ‘b’, ‘a’, ‘c’], [0, 1, 0, 1]], dtype=object).T …: cat.fit_transform(X).toarray() …: Out[11]: array([[1., 0., 0., 1., 0.], [0., 1., 0., 0., 1.], [1., 0., 0., 1., 0.], [0., 0., 1., 0., 1.]]) If you are on …

Read more

How to config visual studio to use UTF-8 as the default encoding for all projects?

Visual Studio supports EditorConfig files (https://editorconfig.org/) Visual Studio (VS2017 and later) searches for a file named ‘.editorconfig’ in the directory containing your source files, or anywhere above this directory in the hierarchy. This file can be used to direct the editor to use utf-8. I use the following: [*] end_of_line = lf charset = utf-8 …

Read more

What is the difference between #encode and #force_encoding in ruby?

Difference is pretty big. force_encoding sets given string encoding, but does not change the string itself, i.e. does not change it representation in memory: ‘łał’.bytes #=> [197, 130, 97, 197, 130] ‘łał’.force_encoding(‘ASCII’).bytes #=> [197, 130, 97, 197, 130] ‘łał’.force_encoding(‘ASCII’) #=> “\xC5\x82a\xC5\x82” encode assumes that the current encoding is correct and tries to change the string …

Read more

How to Base64 encoding on the iPhone

You can see an example here. This is for iOS7+. I copy the code here, just in case: // Create NSData object NSData *nsdata = [@”iOS Developer Tips encoded in Base64″ dataUsingEncoding:NSUTF8StringEncoding]; // Get NSString from NSData object in Base64 NSString *base64Encoded = [nsdata base64EncodedStringWithOptions:0]; // Print the Base64 encoded string NSLog(@”Encoded: %@”, base64Encoded); // …

Read more

XML and JSON tags for a Golang struct?

Go tags are space-separated. From the manual: By convention, tag strings are a concatenation of optionally space-separated key:”value” pairs. Each key is a non-empty string consisting of non-control characters other than space (U+0020 ‘ ‘), quote (U+0022 ‘”‘), and colon (U+003A ‘:’). Each value is quoted using U+0022 ‘”‘ characters and Go string literal syntax. …

Read more

Python 3 CSV file giving UnicodeDecodeError: ‘utf-8’ codec can’t decode byte error when I print

We know the file contains the byte b’\x96′ since it is mentioned in the error message: UnicodeDecodeError: ‘utf-8′ codec can’t decode byte 0x96 in position 7386: invalid start byte Now we can write a little script to find out if there are any encodings where b’\x96’ decodes to ñ: import pkgutil import encodings import os …

Read more