我找到了boto + MFA的示例:
http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_mfa_sample-code.html
但是我找不到如何使用boto3的示例。任何等效的boto3示例?
谢谢!
我有一个用 C 编写的可执行文件,但没有源代码。我可以在 python 中调用(在 ubuntu linux 中):
from subprocess import Popen, PIPE
# variable 'output' is the string I need.
p = Popen(["./excutable",par1,par2]], stdin=PIPE, stdout=PIPE, stderr=PIPE)
output, err = p.communicate()
rc = p.returncode
Run Code Online (Sandbox Code Playgroud)
这对我来说很好用,除非我需要这样做数百万次。运行需要很多时间。有没有办法运行得更快?
谢谢!
我有一个非常简单的pyspark程序,该程序使用dataframe从一组ORC文件中查询数据。我在Windows中使用anaconda python并在其上安装了pyspark。
该程序是这样的:
from pyspark.sql import SparkSession
spark_session = SparkSession.builder.appName("test").getOrCreate()
df_orc = spark_session .read.orc("./raw_data/")
df_orc.createOrReplaceTempView("orc")
Run Code Online (Sandbox Code Playgroud)
这很好用:
spark.sql("select count(*) from orc").show()
Run Code Online (Sandbox Code Playgroud)
但这会产生错误:
spark.sql("select count(*) from orc").collect()
Run Code Online (Sandbox Code Playgroud)
错误消息是:
WARN Utils: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.debug.maxToStringFields' in SparkEnv.conf.
Py4JJavaError: An error occurred while calling o81.collectToPython.
: java.lang.IllegalArgumentException
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:46)
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.sca
at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.sca
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103) …Run Code Online (Sandbox Code Playgroud) 我正在将SQL查询开发到基于一组ORC文件的spark数据帧。该程序是这样的:
from pyspark.sql import SparkSession
spark_session = SparkSession.builder.appName("test").getOrCreate()
sdf = spark_session.read.orc("../data/")
sdf.createOrReplaceTempView("test")
Run Code Online (Sandbox Code Playgroud)
现在,我有一个名为“测试”的表。如果我做类似的事情:
spark_session.sql("select count(*) from test")
Run Code Online (Sandbox Code Playgroud)
那么结果会很好。但是我需要在查询中获取更多列,包括数组中的某些字段。
In [8]: sdf.take(1)[0]["person"]
Out[8]:
[Row(name='name', value='tom'),
Row(name='age', value='20'),
Row(name='gender', value='m')]
Run Code Online (Sandbox Code Playgroud)
我已经尝试过类似的东西:
spark_session.sql("select person.age, count(*) from test group by person.age")
Run Code Online (Sandbox Code Playgroud)
但这是行不通的。我的问题是:如何访问“人”数组中的字段?
谢谢!
编辑:
sdf.printSchema()的结果
In [3]: sdf.printSchema()
root
|-- person: integer (nullable = true)
|-- customtags: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- name: string (nullable = true)
| | |-- value: string (nullable = true) …Run Code Online (Sandbox Code Playgroud)