aggregate-functions - Row Coding

How to fetch the first and last record of a grouped record in a MySQL query with aggregate functions?

You want to use GROUP_CONCAT and SUBSTRING_INDEX: SUBSTRING_INDEX( GROUP_CONCAT(CAST(open AS CHAR) ORDER BY datetime), ‘,’, 1 ) AS open SUBSTRING_INDEX( GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime DESC), ‘,’, 1 ) AS close This avoids expensive sub queries and I find it generally more efficient for this particular problem. Check out the manual pages for both …

Aggregate SQL Function to grab only the first from each group

by Tarik

Rather than grouping, go about it like this… select * from account a join ( select account_id, row_number() over (order by account_id, id) – rank() over (order by account_id) as row_num from user ) first on first.account_id = a.id and first.row_num = 0

Two SQL LEFT JOINS produce incorrect result

by Tarik

Joins are processed left to right (unless parentheses dictate otherwise). If you LEFT JOIN (or just JOIN, similar effect) three groceries to one user you get 3 rows (1 x 3). If you then join 4 fishmarkets for the same user, you get 12 (3 x 4) rows, multiplying the previous count in the result, …

Create a pivot table with PostgreSQL

by Tarik

First compute the average with the aggregate function avg(): SELECT neighborhood, bedrooms, avg(price) FROM listings GROUP BY 1,2 ORDER BY 1,2; Then feed the result to the crosstab() function as instructed in great detail in this related answer: PostgreSQL Crosstab Query

date_trunc 5 minute interval in PostgreSQL [duplicate]

by Tarik

SELECT date_trunc(‘hour’, date1) AS hour_stump , (extract(minute FROM date1)::int / 5) AS min5_slot , count(*) FROM table1 GROUP BY 1, 2 ORDER BY 1, 2; You could GROUP BY two columns: a timestamp truncated to the hour and a 5-minute-slot. The example produces slots 0 – 11. Add 1 if you prefer 1 – 12. …

How to define and use a User-Defined Aggregate Function in Spark SQL?

by Tarik

Supported methods Spark >= 3.0 Scala UserDefinedAggregateFunction is being deprecated (SPARK-30423 Deprecate UserDefinedAggregateFunction) in favor of registered Aggregator. Spark >= 2.3 Vectorized udf (Python only): from pyspark.sql.functions import pandas_udf from pyspark.sql.functions import PandasUDFType from pyspark.sql.types import * import pandas as pd df = sc.parallelize([ (“a”, 0), (“a”, 1), (“b”, 30), (“b”, -50) ]).toDF([“group”, “power”]) def …

COUNT CASE and WHEN statement in MySQL

by Tarik

Use: SELECT SUM(CASE WHEN t.your_column IS NULL THEN 1 ELSE 0 END) AS numNull, SUM(CASE WHEN t.your_column IS NOT NULL THEN 1 ELSE 0 END) AS numNotNull FROM YOUR_TABLE t That will sum up the column NULL & not NULL for the entire table. It’s likely you need a GROUP BY clause, depending on needs.

Spark SQL replacement for MySQL’s GROUP_CONCAT aggregate function

by Tarik

Before you proceed: This operations is yet another another groupByKey. While it has multiple legitimate applications it is relatively expensive so be sure to use it only when required. Not exactly concise or efficient solution but you can use UserDefinedAggregateFunction introduced in Spark 1.5.0: object GroupConcat extends UserDefinedAggregateFunction { def inputSchema = new StructType().add(“x”, StringType) …

count without group

by Tarik

Update for 8.0+: This answer was written well before MySQL version 8, which introduced window functions with mostly the same syntax as the existing ones in Oracle. In this new syntax, the solution would be SELECT t.name, t.phone, COUNT(‘x’) OVER (PARTITION BY t.name) AS namecounter FROM Guys t The answer below still works on newer …