从spark.sql.Row获取第一个值

use*_*807 2 apache-spark apache-spark-sql

我有以下json格式:

{"Request": {"TrancheList": {"Tranche": [{"TrancheId": "500192163","OwnedAmt": "26500000",    "Curr": "USD" }, {  "TrancheId": "500213369", "OwnedAmt": "41000000","Curr": "USD"}]},"FxRatesList": {"FxRatesContract": [{"Currency": "CHF","FxRate": "0.97919983706115"},{"Currency": "AUD", "FxRate": "1.2966804979253"},{ "Currency": "USD","FxRate": "1"},{"Currency": "SEK","FxRate": "8.1561012531034"},{"Currency": "NOK", "FxRate": "8.2454981641398"},{"Currency": "JPY","FxRate": "111.79999785344"},{"Currency": "HKD","FxRate": "7.7568025218916"},{"Currency": "GBP","FxRate": "0.69425159677867"}, {"Currency": "EUR","FxRate": "0.88991723769689"},{"Currency": "DKK", "FxRate": "6.629598372301"}]},"isExcludeDeals": "true","baseCurrency": "USD"}}
Run Code Online (Sandbox Code Playgroud)

从hdfs读取json:

val hdfsRequest = spark.read.json("hdfs://localhost/user/request.json")
val baseCurrency = hdfsRequest.select("Request.baseCurrency").map(_.getString(0)).collect.headOption
var fxRates = hdfsRequest.select("Request.FxRatesList.FxRatesContract")
val fxRatesDF = fxRates.select(explode(fxRates("FxRatesContract"))).toDF("FxRatesContract").select("FxRatesContract.Currency", "FxRatesContract.FxRate").filter($"Currency"===baseCurrency.get)
fxRatesDF.show()
Run Code Online (Sandbox Code Playgroud)

我为fxRatesDF获得的输出是:

fxRatesDF: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [Currency: string, FxRate: string]
+--------+------+
|Currency|FxRate|
+--------+------+
|     USD|     1|
Run Code Online (Sandbox Code Playgroud)

如何获取第一行Fxrate列的值?

aba*_*hel 19

您可以使用

fxRatesDF.select(col("FxRate")).first.getString(0)
Run Code Online (Sandbox Code Playgroud)

  • 这需要这样的导入:“from pyspark.sql.functions import col”正确吗? (2认同)
  • 在函数中找不到 getString (2认同)

Thi*_*dim 9

是您需要使用的功能

使用这样:

fxRatesDF.first().FxRate
Run Code Online (Sandbox Code Playgroud)

  • 我之前尝试过,fxRatesDF.first()给出此输出[USD,1],当您运行fxRatesDF.first()。FxRate时,它说FxRate不是sparche.sql.Row的成员。 (2认同)

小智 7

我知道这是一篇旧文章,但我让它以这种方式工作fxRatesDF.first()[0]


Val*_*Val 6

也许这样:

fxRatesDF.take(1)[0][1]
Run Code Online (Sandbox Code Playgroud)

或者

fxRatesDF.collect()[0][1]
Run Code Online (Sandbox Code Playgroud)

或者

fxRatesDF.first()[1]
Run Code Online (Sandbox Code Playgroud)