在Apache Pig中使用Python UDF

Gre*_*een 2 python apache-pig udf

我是Apache Pig和Python的新手.当我尝试在Pig中注册Python函数时,它会给出一些与Jython相关的错误.我的python脚本udf1.py将任何字符串转换为大写.

from pig_util import outputSchema

@outputSchema('output_field_name:chararray')
def charupper(x):
    b = x.upper()
    return b

c=charlower('bbbb')

print(c)
Run Code Online (Sandbox Code Playgroud)

当我尝试在Grunt shell中注册Pig本地模式时,它会抛出以下错误

grunt> REGISTER '/home/cloudera/PycharmProjects/Project1/udf1.py' USING jython as pyudf                      
2015-04-06 22:31:45,792 [main] WARN  org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2015-04-06 22:31:45,793 [main] WARN  org.apache.hadoop.conf.Configuration - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2015-04-06 22:31:45,836 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - pig.cmd.args.remainders is empty. This is not expected unless on testing.
2015-04-06 22:31:45,842 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - module file does not exist: encodings, /usr/lib/pig/lib/jython-standalone-2.5.2.jar/Lib/encodings/__init__.py
2015-04-06 22:31:45,842 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - module file does not exist: encodings.utf_8, /usr/lib/pig/lib/jython-standalone-2.5.2.jar/Lib/encodings/utf_8.py
2015-04-06 22:31:45,842 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - module file does not exist: types, /usr/lib/pig/lib/jython-standalone-2.5.2.jar/Lib/types.py
2015-04-06 22:31:45,842 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - module file does not exist: encodings.aliases, /usr/lib/pig/lib/jython-standalone-2.5.2.jar/Lib/encodings/aliases.py
2015-04-06 22:31:45,842 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - module file does not exist: codecs, /usr/lib/pig/lib/jython-standalone-2.5.2.jar/Lib/codecs.py
2015-04-06 22:31:46,026 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1121: Python Error. Traceback (most recent call last):
  File "/home/cloudera/PycharmProjects/Project1/udf1.py", line 3, in <module>
    from pig_util import outputSchema
ImportError: No module named pig_util

Details at logfile: /home/cloudera/pig_1428381449281.log
Run Code Online (Sandbox Code Playgroud)

我已经导入了pig_util.py.我是否必须在我的CDH中安装与jython相关的任何内容?.我无法知道错误.

猪版:Apache Pig版0.11.0-cdh4.7.0

使用PyCharm Community Edition 4.0.4创建的Python脚本

Python版本:Python 2.6.6(r266:84292,2014年1月22日,09:42:36)

Hem*_*van 5

我遇到过同样的问题.这就是我所做的.

pig_util.py这里下载了文件.然后我将Pig_util.py文件放在保存python udf的同一目录中,然后执行.这解决了我的导入错误.

注意:这与jython无关.