Read an HDFS File from a HIVE UDF - Execution Error, return code 101 FunctionTask. Could not initialize class

Raf*_*ios 2 java hadoop hive hue

We have been trying to create a simple Hive UDF to mask some fields in a Hive Table. We are using an external file (placed on HDFS) to grab a piece of text to make a salting to the masking process. It seems we are doing everything ok but when we tried to create the external function it throws the error:

org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. Could not initialize class co.company.Mask
Run Code Online (Sandbox Code Playgroud)

This is our code for the UDF:

package co.company;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.BufferedReader;
import java.io.InputStreamReader;

import org.apache.commons.codec.digest.DigestUtils;

@Description( 
        name = "masker",
        value = "_FUNC_(str) - mask a string",      
        extended = "Example: \n" +
                " SELECT masker(column) FROM hive_table; "      
        )
public class Mask extends UDF  {

    private static final String arch_clave = "/user/username/filename.dat";
    private static String clave = null; 

    public static String getFirstLine( String arch ) {

        try {
            FileSystem fs = FileSystem.get(new Configuration());
            FSDataInputStream in = fs.open(new Path(arch));
            BufferedReader br = new BufferedReader(new InputStreamReader(in));    

            String ret = br.readLine();
            br.close();
            return ret;

        } catch (Exception e) { 

        System.out.println("out: Error Message: " + arch + " exc: " + e.getMessage());
        return null;
    } 
}

public Text evaluate(Text s) {

    clave = getFirstLine( arch_clave );

    Text to_value = new Text( DigestUtils.shaHex( s + clave) );
    return to_value;
}
}
Run Code Online (Sandbox Code Playgroud)

We are uploading the jar file and creating the UDF through HUE's interface (Sadly, we don't have yet console access to the Hadoop cluster.

On Hue's Hive Interface, our commands are:

add jar hdfs:///user/my_username/myJar.jar
Run Code Online (Sandbox Code Playgroud)

And then to create the Function we execute:

CREATE TEMPORARY FUNCTION masker as 'co.company.Mask';
Run Code Online (Sandbox Code Playgroud)

Sadly the error thrown when we tried to create the UDF is not very helpful. This is the log for the creation of the UDF. Any Help is greatly appreciated. Thank you very much.

org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.FunctionTask. Could not initialize class co.company.Mask
Run Code Online (Sandbox Code Playgroud)

Raf*_*ios 5

此问题已解决,但与代码无关。上面的代码可以从HIVE UDF读取HDFS中的文件,这很好(这是完全无效的,因为它每次调用评估函数时都会读取文件,但是它设法读取了文件)。

事实证明,通过HUE创建Hive UDF时,您上载了jar,然后创建了函数。但是,如果您更改了函数并重新上载了jar,它仍然保留该函数的先前定义。

我们在罐子的另一个包中定义了相同的UDF类,将原始函数放在HIVE中,然后通过HUE再次创建了该函数(使用新类):

add jar hdfs:///user/my_username/myJar2.jar;
drop function if exists masker;
create temporary function masker as 'co.company.otherpackage.Mask';
Run Code Online (Sandbox Code Playgroud)

看来HIVE(或HUE?,Thrift?)需要一个错误报告,我仍然需要更好地了解系统的哪一部分有故障。

我希望它对将来的人有所帮助。