Get week start date (Monday) from a date column in Python (pandas)?

Another alternative: df[‘week_start’] = df[‘myday’].dt.to_period(‘W’).apply(lambda r: r.start_time) This will set ‘week_start’ to be the first Monday before the time in ‘myday’. You can choose different week starts via anchored offsets e.g. ’W-THU’ to start the week on Thursday instead. (Thanks @Henry Ecker for that suggestion)

What is the difference between numpy.linalg.lstsq and scipy.linalg.lstsq?

If I read the source code right (Numpy 1.8.2, Scipy 0.14.1 ), numpy.linalg.lstsq() uses the LAPACK routine xGELSD and scipy.linalg.lstsq() usesxGELSS. The LAPACK Manual Sec. 2.4 states The subroutine xGELSD is significantly faster than its older counterpart xGELSS, especially for large problems, but may require somewhat more workspace depending on the matrix dimensions. That means … Read more

in Numpy, how to zip two 2-D arrays?

You can use dstack: >>> np.dstack((a,b)) array([[[0, 0], [1, 1], [2, 2], [3, 3]], [[4, 4], [5, 5], [6, 6], [7, 7]]]) If you must have tuples: >>> np.array(zip(a.ravel(),b.ravel()), dtype=(‘i4,i4’)).reshape(a.shape) array([[(0, 0), (1, 1), (2, 2), (3, 3)], [(4, 4), (5, 5), (6, 6), (7, 7)]], dtype=[(‘f0’, ‘<i4’), (‘f1’, ‘<i4’)]) For Python 3+ you need … Read more

Find out if/which BLAS library is used by Numpy

numpy.show_config() doesn’t always give reliable information. For example, if I apt-get install python-numpy on Ubuntu 14.04, the output of np.show_config() looks like this: blas_info: libraries = [‘blas’] library_dirs = [‘/usr/lib’] language = f77 lapack_info: libraries = [‘lapack’] library_dirs = [‘/usr/lib’] language = f77 atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = [‘blas’] library_dirs = [‘/usr/lib’] language = … Read more

Pandas pd.Series.isin performance with set versus array

This might not be obvious, but pd.Series.isin uses O(1)-look up per element. After an analysis, which proves the above statement, we will use its insights to create a Cython-prototype which can easily beat the fastest out-of-the-box-solution. Let’s assume that the “set” has n elements and the “series” has m elements. The running time is then: … Read more