How are UDF and UDAF defined and used in Hive?
In Hive, user-defined functions (UDFs) and user-defined aggregate functions (UDAFs) can be defined and implemented by writing Java code or using Hive’s custom function language (UDF/UDAF).
To define a UDF, you need to first write a Java class that inherits from Hive’s UDF class and implement the evaluate method. Then register this function in Hive using the CREATE FUNCTION statement.
For example, here is a simple example of a UDF:
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public class MyUDF extends UDF {
public Text evaluate(Text input) {
if (input == null) {
return null;
}
return new Text(input.toString().toUpperCase());
}
}
Next, register this UDF in Hive.
CREATE FUNCTION my_udf AS 'com.example.MyUDF';
To define a UDAF, you need to write a Java class that extends Hive’s UDAF class and implement the evaluate method to define the aggregation logic. Then register this aggregation function in Hive using the CREATE FUNCTION statement.
For example, below is a simple example of a UDAF.
import org.apache.hadoop.hive.ql.exec.UDAF;
import org.apache.hadoop.io.IntWritable;
public class MyUDAF extends UDAF {
public static class MyUDAFEvaluator extends UDAFResolver2 {
private IntWritable result;
public MyUDAFEvaluator() {
reset();
}
public void reset() {
result = null;
}
public boolean iterate(IntWritable value) {
if (value == null) {
return true;
}
if (result == null) {
result = new IntWritable(value.get());
} else {
result.set(result.get() + value.get());
}
return true;
}
public IntWritable terminatePartial() {
return result;
}
public boolean merge(IntWritable other) {
if (other == null) {
return true;
}
if (result == null) {
result = new IntWritable(other.get());
} else {
result.set(result.get() + other.get());
}
return true;
}
public IntWritable terminate() {
return result;
}
}
}
Next, register this UDAF in Hive.
CREATE FUNCTION my_udaf AS 'com.example.MyUDAF';
Queries can be executed in Hive using predefined UDF and UDAF functions.
SELECT my_udf(column_name) FROM table_name;
SELECT my_udaf(column_name) FROM table_name;