Deterministic function in MySQL

From the MySQL 5.0 Reference: Assessment of the nature of a routine is based on the “honesty” of the creator: MySQL does not check that a routine declared DETERMINISTIC is free of statements that produce nondeterministic results. However, misdeclaring a routine might affect results or affect performance. Declaring a nondeterministic routine as DETERMINISTIC might lead … Read more

How to define and use a User-Defined Aggregate Function in Spark SQL?

Supported methods Spark >= 3.0 Scala UserDefinedAggregateFunction is being deprecated (SPARK-30423 Deprecate UserDefinedAggregateFunction) in favor of registered Aggregator. Spark >= 2.3 Vectorized udf (Python only): from pyspark.sql.functions import pandas_udf from pyspark.sql.functions import PandasUDFType from pyspark.sql.types import * import pandas as pd df = sc.parallelize([ (“a”, 0), (“a”, 1), (“b”, 30), (“b”, -50) ]).toDF([“group”, “power”]) def … Read more

How can I create a user-defined function in SQLite?

SQLite does not have support for user-defined functions in the way that Oracle or MS SQL Server does. For SQLite, you must create a callback function in C/C++ and hook the function up using the sqlite3_create_function call. Unfortunately, the SQLite API for Android does not allow for the sqlite3_create_function call directly through Java. In order … Read more

Applying UDFs on GroupedData in PySpark (with functioning python example)

Since Spark 2.3 you can use pandas_udf. GROUPED_MAP takes Callable[[pandas.DataFrame], pandas.DataFrame] or in other words a function which maps from Pandas DataFrame of the same shape as the input, to the output DataFrame. For example if data looks like this: df = spark.createDataFrame( [(“a”, 1, 0), (“a”, -1, 42), (“b”, 3, -1), (“b”, 10, -2)], … Read more

Derive multiple columns from a single column in a Spark DataFrame

Generally speaking what you want is not directly possible. UDF can return only a single column at the time. There are two different ways you can overcome this limitation: Return a column of complex type. The most general solution is a StructType but you can consider ArrayType or MapType as well. import org.apache.spark.sql.functions.udf val df … Read more