这是我当前的数据集:
from pyspark.sql import Window
import pyspark.sql.functions as psf
df = spark.createDataFrame([("2","1",1),
("3","2",2)],
schema = StructType([StructField("Data", StringType()),
StructField("Source",StringType()),
StructField("Date", IntegerType())]))
display(df.withColumn("Result",psf.collect_set("Data").over(Window.partitionBy("Source").orderBy("Date"))))
Run Code Online (Sandbox Code Playgroud)
输出:
| 数据 | 来源 | 日期 | 结果 |
|---|---|---|---|
| 2 | 1 | 1 | [“2”] |
| 3 | 1 | 2 | [“2”,“3”] |
为什么在窗口上使用collect_set函数时,3列的第一行中缺少值?Resultordered
我也尝试过使用collect_list,但得到了相同的结果。
我想要的输出是:
| 数据 | 来源 | 日期 | 结果 |
|---|---|---|---|
| 2 | 1 | 1 | [“2”,“3”] |
| 3 | 1 | 2 | [“2”,“3”] |
其中值的顺序Result被保留 - 第一个是 where Date = 1,第二个是Date = 2
我在 iOS 上使用 Pythonista 应用程序,即使使用最基本的代码也无法使 Flask 模块工作。
我的代码是:
from flask import Flask
Flaskapp = Flask(__name__)
@Flaskapp.route('/')
def helloWorld():
return 'Woala'
Flaskapp.run(debug=True)
Run Code Online (Sandbox Code Playgroud)
但继续收到:
* Serving Flask app "app" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: on
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
* Restarting with stat
Traceback (most recent call last):
File "/private/var/mobile/Library/Mobile Documents/iCloud~com~omz-software~Pythonista3/Documents/app.py", line 9, in <module> …Run Code Online (Sandbox Code Playgroud)