How to fetch the first and last record of a grouped record in a MySQL query with aggregate functions?

You want to use GROUP_CONCAT and SUBSTRING_INDEX: SUBSTRING_INDEX( GROUP_CONCAT(CAST(open AS CHAR) ORDER BY datetime), ‘,’, 1 ) AS open SUBSTRING_INDEX( GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime DESC), ‘,’, 1 ) AS close This avoids expensive sub queries and I find it generally more efficient for this particular problem. Check out the manual pages for both …

Read more

How to define and use a User-Defined Aggregate Function in Spark SQL?

Supported methods Spark >= 3.0 Scala UserDefinedAggregateFunction is being deprecated (SPARK-30423 Deprecate UserDefinedAggregateFunction) in favor of registered Aggregator. Spark >= 2.3 Vectorized udf (Python only): from pyspark.sql.functions import pandas_udf from pyspark.sql.functions import PandasUDFType from pyspark.sql.types import * import pandas as pd df = sc.parallelize([ (“a”, 0), (“a”, 1), (“b”, 30), (“b”, -50) ]).toDF([“group”, “power”]) def …

Read more

Using GROUP BY with FIRST_VALUE and LAST_VALUE

SELECT MIN(MinuteBar) AS MinuteBar5, Opening, MAX(High) AS High, MIN(Low) AS Low, Closing, Interval FROM ( SELECT FIRST_VALUE([Open]) OVER (PARTITION BY DATEDIFF(MINUTE, ‘2015-01-01 00:00:00’, MinuteBar) / 5 ORDER BY MinuteBar) AS Opening, FIRST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, ‘2015-01-01 00:00:00’, MinuteBar) / 5 ORDER BY MinuteBar DESC) AS Closing, DATEDIFF(MINUTE, ‘2015-01-01 00:00:00’, MinuteBar) / 5 AS Interval, …

Read more

Spark SQL replacement for MySQL’s GROUP_CONCAT aggregate function

Before you proceed: This operations is yet another another groupByKey. While it has multiple legitimate applications it is relatively expensive so be sure to use it only when required. Not exactly concise or efficient solution but you can use UserDefinedAggregateFunction introduced in Spark 1.5.0: object GroupConcat extends UserDefinedAggregateFunction { def inputSchema = new StructType().add(“x”, StringType) …

Read more

count without group

Update for 8.0+: This answer was written well before MySQL version 8, which introduced window functions with mostly the same syntax as the existing ones in Oracle. In this new syntax, the solution would be SELECT t.name, t.phone, COUNT(‘x’) OVER (PARTITION BY t.name) AS namecounter FROM Guys t The answer below still works on newer …

Read more