我的测试脚本的当前目录结构如下:
project/
script/
__init__.py
map.py
test/
__init.py__
test_map.py
Run Code Online (Sandbox Code Playgroud)
我的 map.py 定义如下:
def add(x,y):
return x+y
def map_add(df):
result = df.map(lambda x: (x.key, x.value)).reduceByKey(add)
return result
Run Code Online (Sandbox Code Playgroud)
test_map.py 看起来像这样:
def add_pyspark_path():
"""
Add PySpark to the PYTHONPATH
"""
import sys
import os
try:
sys.path.append(os.path.join(os.environ['SPARK_HOME'], "python"))
# Spark 1.6
sys.path.append(os.path.join(os.environ['SPARK_HOME'],
"python", "lib", "py4j-0.9-src.zip"))
except KeyError:
print("SPARK_HOME not set")
sys.exit(1)
# To import pyspark
add_pyspark_path()
import unittest
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
from script.map import map_add
class PySparkTestCase(unittest.TestCase):
def setUp(self): …
Run Code Online (Sandbox Code Playgroud)