time-series
Extract date and time from pandas timestamp
Do this first: df[‘time’] = pd.to_datetime(df[‘timestamp’]) Before you do your extraction as usual: df[‘dates’] = df[‘time’].dt.date
How to convert Pandas Series of dates string into date objects?
Essentially equivalent to @waitingkuo, but I would use pd.to_datetime here (it seems a little cleaner, and offers some additional functionality e.g. dayfirst): In [11]: df Out[11]: a time 0 1 2013-01-01 1 2 2013-01-02 2 3 2013-01-03 In [12]: pd.to_datetime(df[‘time’]) Out[12]: 0 2013-01-01 00:00:00 1 2013-01-02 00:00:00 2 2013-01-03 00:00:00 Name: time, dtype: datetime64[ns] In … Read more
Apache Spark Moving Average
You can use the sliding function from MLLIB which probably does the same thing as Daniel’s answer. You will have to sort the data by time before using the sliding function. import org.apache.spark.mllib.rdd.RDDFunctions._ sc.parallelize(1 to 100, 10) .sliding(3) .map(curSlice => (curSlice.sum / curSlice.size)) .collect()
Python & Pandas – Group by day and count for each day
You can use dt.floor for convert to dates and then value_counts or groupby with size: df = (pd.to_datetime(df[‘date & time of connection’]) .dt.floor(‘d’) .value_counts() .rename_axis(‘date’) .reset_index(name=”count”)) print (df) date count 0 2017-06-23 6 1 2017-06-21 5 2 2017-06-19 3 3 2017-06-22 3 4 2017-06-20 2 Or: s = pd.to_datetime(df[‘date & time of connection’]) df = … Read more
How to properly add hours to a pandas.tseries.index.DatetimeIndex?
You can use pd.DateOffset: test[1].index + pd.DateOffset(hours=16) pd.DateOffset accepts the same keyword arguments as dateutil.relativedelta. The problem you encountered was due to this bug which has been fixed in Pandas version 0.14.1: In [242]: pd.to_timedelta(16, unit=”h”) Out[242]: numpy.timedelta64(16,’ns’) If you upgrade, your original code should work.
Pandas compare next row
Looks like you want to use the Series.shift method. Using this method, you can generate new columns which are offset to the original columns. Like this: df[‘qty_s’] = df[‘qty’].shift(-1) df[‘t_s’] = df[‘t’].shift(-1) df[‘z_s’] = df[‘z’].shift(-1) Now you can compare these: df[‘is_something’] = (df[‘qty’] == df[‘qty_s’]) & (df[‘t’] < df[‘t_s’]) & (df[‘z’] == df[‘z_s’]) Here is … Read more