我做了一些googleing但没有发现任何相关内容.任何帮助赞赏.尝试使用裸vm以确保没有nodejs安装或依赖问题.
基督教
sudo apt-get install nodejs
sudo apt-get install nodejs-legacy
sudo npm install -g phonegap
sudo npm install -g cordova
sudo apt-get install ant
chris@mint16 ~/project/dev $ phonegap create my-app
[phonegap] create called with the options /home/chris/project/dev/my-app com.phonegap.helloworld HelloWorld
[phonegap] Customizing default config.xml file
[phonegap] created project at /home/chris/project/dev/my-app
chris@mint16 ~/project/dev $ cd my-app/
chris@mint16 ~/project/dev/my-app $ phonegap run android
[phonegap] detecting Android SDK environment...
[phonegap] using the local environment
[phonegap] adding the Android platform...
/home/chris/.cordova/lib/android/cordova/3.5.0/bin/node_modules/q/q.js:126
throw e;
^
Error: …Run Code Online (Sandbox Code Playgroud) 在同一个 jupyter 会话(无数据块)中使用“spark.sql.warehouse.dir”是有效的。但是在 jupyter 中重新启动内核后,目录数据库和表不再被识别。是否可以使用元存储逻辑与数据块外部的 delta-lake 来实现会话独立性(我知道使用路径的可能性)?
谢谢,克里斯蒂安
spark = (
SparkSession.builder
.appName("tmp")
.config("spark.jars.packages", "io.delta:delta-core_2.12:1.0.0")
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config(
"spark.sql.catalog.spark_catalog",
"org.apache.spark.sql.delta.catalog.DeltaCatalog",
)
.config("spark.sql.warehouse.dir", "/home/user/data")
.getOrCreate()
)
df = spark.range(100)
df.write.format("delta").mode("overwrite").saveAsTable("rnd")
spark.sql("Select * from rnd").show()
spark.catalog.listDatabases()
spark.catalog.listTables()
Run Code Online (Sandbox Code Playgroud) 是否有更快的条件选择方式?也许更好的将data.frame转换为另一种类型?在这个测试版本中,我有大约700k行,但可能是数百万?
我想知道基准测试,因为一切都在记忆中.替代方案可能是通过db进行一些额外的工作(ddl,索引).
> str(df.test)
'data.frame': 694118 obs. of 4 variables:
$ uid : chr "ZyVOZrPOXwkuGSPv" "qBwuxhbrszRcISSRmIlYaQXHRUZE" "azCESULsUinrAeFkGIjEZpOLhrJcnB" "yLXPfpGlnLrtKmCRERj" ...
$ g1 : chr "group_70" "group_85" "group_150" "group_32" ...
$ g2 : chr "D" "A" "A" "C" ...
$ value: num 0.7756 0.1389 0.8924 0.2278 0.0709 ...
> df.test[200,]
uid g1 g2 value
200 appoBThmLxqFTyjFWyAqzsyJh group_2 E 0.604
>
> benchmark(replications = 100,df.test[(df.test$uid=='appoBThmLxqFTyjFWyAqzsyJh') &
+ (df.test$g1 == 'group_2') &
+ (df.test$g2 == 'E'),'value'])
test replications elapsed relative user.self sys.self user.child sys.child …Run Code Online (Sandbox Code Playgroud) android ×1
apache-spark ×1
cordova ×1
data.table ×1
dataframe ×1
delta-lake ×1
node.js ×1
performance ×1
r ×1
selection ×1