Calculate summary statistics of columns in dataframe

describe may give you everything you want otherwise you can perform aggregations using groupby and pass a list of agg functions: http://pandas.pydata.org/pandas-docs/stable/groupby.html#applying-multiple-functions-at-once In [43]: df.describe() Out[43]: shopper_num is_martian number_of_items count_pineapples count 14.0000 14 14.000000 14 mean 7.5000 0 3.357143 0 std 4.1833 0 6.452276 0 min 1.0000 False 0.000000 0 25% 4.2500 0 0.000000 0 …

Read more

Max and Min date in pandas groupby

You need to combine the functions that apply to the same column, like this: In [116]: gb.agg({‘sum_col’ : np.sum, …: ‘date’ : [np.min, np.max]}) Out[116]: date sum_col amin amax sum type weekofyear A 25 2014-06-22 2014-06-22 1 26 2014-06-25 2014-06-25 1 27 2014-07-05 2014-07-05 2 B 26 2014-06-24 2014-06-24 2 27 2014-07-02 2014-07-02 1 C …

Read more

Truncate `TimeStamp` column to hour precision in pandas `DataFrame`

In pandas 0.18.0 and later, there are datetime floor, ceil and round methods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use: >>> df[‘dt2’] = df[‘dt’].dt.floor(‘h’) >>> df dt dt2 0 2014-10-01 10:02:45 2014-10-01 10:00:00 1 2014-10-01 13:08:17 2014-10-01 13:00:00 2 2014-10-01 17:39:24 2014-10-01 17:00:00 Here’s another …

Read more

Python pandas: how to remove nan and -inf values

Use pd.DataFrame.isin and check for rows that have any with pd.DataFrame.any. Finally, use the boolean array to slice the dataframe. df[~df.isin([np.nan, np.inf, -np.inf]).any(1)] time X Y X_t0 X_tp0 X_t1 X_tp1 X_t2 X_tp2 4 0.037389 3 10 3 0.333333 2.0 0.500000 1.0 1.000000 5 0.037393 4 10 4 0.250000 3.0 0.333333 2.0 0.500000 1030308 9.962213 256 …

Read more