time-series – Row Coding

storing massive ordered time series data in bigtable derivatives

November 27, 2023 by Tarik

Extract date and time from pandas timestamp

November 25, 2023 by Tarik

Do this first: df[‘time’] = pd.to_datetime(df[‘timestamp’]) Before you do your extraction as usual: df[‘dates’] = df[‘time’].dt.date

How to convert Pandas Series of dates string into date objects?

October 31, 2023 by Tarik

Essentially equivalent to @waitingkuo, but I would use pd.to_datetime here (it seems a little cleaner, and offers some additional functionality e.g. dayfirst): In [11]: df Out[11]: a time 0 1 2013-01-01 1 2 2013-01-02 2 3 2013-01-03 In [12]: pd.to_datetime(df[‘time’]) Out[12]: 0 2013-01-01 00:00:00 1 2013-01-02 00:00:00 2 2013-01-03 00:00:00 Name: time, dtype: datetime64[ns] In … Read more

How to convert dataframe into time series?

September 14, 2023 by Tarik

Apache Spark Moving Average

September 6, 2023 by Tarik

You can use the sliding function from MLLIB which probably does the same thing as Daniel’s answer. You will have to sort the data by time before using the sliding function. import org.apache.spark.mllib.rdd.RDDFunctions._ sc.parallelize(1 to 100, 10) .sliding(3) .map(curSlice => (curSlice.sum / curSlice.size)) .collect()

Basic lag in R vector/dataframe

September 3, 2023 by Tarik

Python & Pandas – Group by day and count for each day

August 29, 2023 by Tarik

You can use dt.floor for convert to dates and then value_counts or groupby with size: df = (pd.to_datetime(df[‘date & time of connection’]) .dt.floor(‘d’) .value_counts() .rename_axis(‘date’) .reset_index(name=”count”)) print (df) date count 0 2017-06-23 6 1 2017-06-21 5 2 2017-06-19 3 3 2017-06-22 3 4 2017-06-20 2 Or: s = pd.to_datetime(df[‘date & time of connection’]) df = … Read more

How to properly add hours to a pandas.tseries.index.DatetimeIndex?

August 24, 2023 by Tarik

You can use pd.DateOffset: test[1].index + pd.DateOffset(hours=16) pd.DateOffset accepts the same keyword arguments as dateutil.relativedelta. The problem you encountered was due to this bug which has been fixed in Pandas version 0.14.1: In [242]: pd.to_timedelta(16, unit=”h”) Out[242]: numpy.timedelta64(16,’ns’) If you upgrade, your original code should work.

Pandas compare next row

August 12, 2023 by Tarik

Looks like you want to use the Series.shift method. Using this method, you can generate new columns which are offset to the original columns. Like this: df[‘qty_s’] = df[‘qty’].shift(-1) df[‘t_s’] = df[‘t’].shift(-1) df[‘z_s’] = df[‘z’].shift(-1) Now you can compare these: df[‘is_something’] = (df[‘qty’] == df[‘qty_s’]) & (df[‘t’] < df[‘t_s’]) & (df[‘z’] == df[‘z_s’]) Here is … Read more

rolling joins data.table in R

August 9, 2023 by Tarik