我使用此文件建议http://spark.apache.org/docs/1.1.1/submitting-applications.html
spsark版本1.1.0
./spark/bin/spark-submit --py-files /home/hadoop/loganalysis/parser-src.zip \
/home/hadoop/loganalysis/ship-test.py
Run Code Online (Sandbox Code Playgroud)
和代码中的conf:
conf = (SparkConf()
.setMaster("yarn-client")
.setAppName("LogAnalysis")
.set("spark.executor.memory", "1g")
.set("spark.executor.cores", "4")
.set("spark.executor.num", "2")
.set("spark.driver.memory", "4g")
.set("spark.kryoserializer.buffer.mb", "128"))
Run Code Online (Sandbox Code Playgroud)
和slave节点抱怨ImportError
14/12/25 05:09:53 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-172-31-10-8.cn-north-1.compute.internal): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/home/hadoop/spark/python/pyspark/worker.py", line 75, in main
command = pickleSer._read_with_length(infile)
File "/home/hadoop/spark/python/pyspark/serializers.py", line 150, in _read_with_length
return self.loads(obj)
ImportError: No module named parser
Run Code Online (Sandbox Code Playgroud)
和parser-src.zip在本地测试.
[hadoop@ip-172-31-10-231 ~]$ python
Python 2.7.8 (default, Nov 3 2014, 10:17:30)
[GCC 4.8.2 20140120 …Run Code Online (Sandbox Code Playgroud) 简短的代码是这样的:
class Word(Base):
__tablename__ = 'word'
eng = Column(String(32),primary_key=True)
chinese = Column(String(128))
word = Word(eng='art',chinese=[u'??',u'??'])
session.add(word)
session.commit()
Run Code Online (Sandbox Code Playgroud)
我正在尝试将word.chinese存储为字符串.在python中它是一个列表...好吧,当我自己编写sql时,我可以str(word.chinese)然后插入到数据库中.当需要得到它时,我可以简单地使用eval(result)来获取原始的python对象.但由于我使用sqlalchemy存储我的物体,我想知道在哪里改变以达到我的目标......