use*_*463 3 apache-spark pyspark aws-glue
我尝试将我的 spark 数据帧转换为动态以输出为glueparquet 文件,但出现错误
'DataFrame' 对象没有属性 'fromDF'"
我的代码大量使用火花数据帧。有没有办法从火花数据帧转换为动态帧,这样我就可以写出glueparquet?如果是这样,您能否提供一个示例,并指出我在下面做错了什么?
代码:
# importing libraries
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
glueContext = GlueContext(SparkContext.getOrCreate())
# updated 11/19/19 for error caused in error logging function
spark = glueContext.spark_session
from pyspark.sql import Window
from pyspark.sql.functions import col
from pyspark.sql.functions import first
from pyspark.sql.functions import date_format
from pyspark.sql.functions import lit,StringType
from pyspark.sql.types import *
from pyspark.sql.functions import substring, length, min,when,format_number,dayofmonth,hour,dayofyear,month,year,weekofyear,date_format,unix_timestamp
base_pth='s3://test/'
bckt_pth1=base_pth+'test_write/glueparquet/'
test_df=glueContext.create_dynamic_frame.from_catalog(
database='test_inventory',
table_name='inventory_tz_inventory').toDF()
test_df.fromDF(test_df, glueContext, "test_nest")
glueContext.write_dynamic_frame.from_options(frame = test_nest,
connection_type = "s3",
connection_options = {"path": bckt_pth1+'inventory'},
format = "glueparquet")
Run Code Online (Sandbox Code Playgroud)
错误:
'DataFrame' object has no attribute 'fromDF'
Traceback (most recent call last):
File "/mnt/yarn/usercache/livy/appcache/application_1574556353910_0001/container_1574556353910_0001_01_000001/pyspark.zip/pyspark/sql/dataframe.py", line 1300, in __getattr__
"'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'fromDF'
Run Code Online (Sandbox Code Playgroud)
小智 17
fromDF是一个类函数。她是你如何转换Dataframe为DynamicFrame
from awsglue.dynamicframe import DynamicFrame
DynamicFrame.fromDF(test_df, glueContext, "test_nest")
Run Code Online (Sandbox Code Playgroud)
只是为了巩固 Scala 用户的答案,这里是如何将 Spark Dataframe 转换为 DynamicFrame(DynamicFrame 的 scala API 中不存在 fromDF 方法):
import com.amazonaws.services.glue.DynamicFrame
val dynamicFrame = DynamicFrame(df, glueContext)
Run Code Online (Sandbox Code Playgroud)
我希望它有帮助!
| 归档时间: |
|
| 查看次数: |
14593 次 |
| 最近记录: |