我在这里看到了一个解决方案但是当我尝试它时对我不起作用.
首先我导入cars.csv文件:
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.load("/usr/local/spark/cars.csv")
Run Code Online (Sandbox Code Playgroud)
如下所示:
+----+-----+-----+--------------------+-----+
|year| make|model| comment|blank|
+----+-----+-----+--------------------+-----+
|2012|Tesla| S| No comment| |
|1997| Ford| E350|Go get one now th...| |
|2015|Chevy| Volt| null| null|
Run Code Online (Sandbox Code Playgroud)
然后我这样做:
df.na.fill("e",Seq("blank"))
Run Code Online (Sandbox Code Playgroud)
但是空值没有改变.
谁能帮我 ?
我想删除字符串从col1存在于col2:
val df = spark.createDataFrame(Seq(
("Hi I heard about Spark", "Spark"),
("I wish Java could use case classes", "Java"),
("Logistic regression models are neat", "models")
)).toDF("sentence", "label")
Run Code Online (Sandbox Code Playgroud)
使用regexp_replace或translateref:spark函数api
val res = df.withColumn("sentence_without_label", regexp_replace
(col("sentence") , "(?????)", "" ))
Run Code Online (Sandbox Code Playgroud)
所以res看起来如下: