相关疑难解决方法(0)

将非结构化 csv 文件转换为数据框

我正在学习 R 进行文本挖掘。我有一个 CSV 格式的电视节目时间表。节目通常从早上 06:00 开始，一直持续到第二天早上 05:00，这被称为广播日。例如：15/11/2015 的节目从早上 06:00 开始，到第二天早上 05:00 结束。

这是一个示例代码，显示了日程安排的样子：

 read.table(textConnection("Sunday|\n 01-Nov-15|\n 6|Tom\n some information about the program|\n 23.3|Jerry\n some information about the program|\n 5|Avatar\n some information about the program|\nMonday|\n 02-Nov-15|\n 6|Tom\n some information about the program|\n 23.3|Jerry\n some information about the program|\n 5|Avatar\n some information about the program|"), header = F, sep = "|", stringsAsFactors = F)

Run Code Online (Sandbox Code Playgroud)

其输出如下：

  V1|V2
Sunday |  
01-Nov-15 |       
6 | Tom  
some information about the program |       
23.3 |  Jerry …

Run Code Online (Sandbox Code Playgroud)

r reshape dataframe

Nav*_*MSN

2017 12-05

5
推荐指数

2
解决办法

1468
查看次数

R - 链接 data.table 操作的最佳实践

到处搜索，但没有找到任何用于安排 data.table 链式代码的一般准则，这些代码可能跨越多行以提高可读性。

拿f.ex。（仅用于说明目的的玩具示例）

iris.dt[sepal.length > 5 & sepal.width > 3 & petal.length > 2 & petal.width > 2 & species == "virginica"]

Run Code Online (Sandbox Code Playgroud)

由于这一切都对应于相同的参数 (dt[i])，因此将其拆分为多行很容易，我只会这样做：

iris.dt[sepal.length > 5 & 
        sepal.width  > 3 & 
        petal.length > 2 & 
        petal.width  > 2 & 
        species == "virginica"]

Run Code Online (Sandbox Code Playgroud)

或者

iris.dt[sepal.length > 5 & 
          sepal.width  > 3 & 
          petal.length > 2 & 
          petal.width  > 2 & 
          species == "virginica"]

Run Code Online (Sandbox Code Playgroud)

但是拿f.ex。像这样。您将如何清理此代码片段以及在哪里缩进/换行？注意：这只是一个关于长 data.table 链式代码块在实践中的样子的玩具示例。

    iris.dt[, id := 1:.N, by = species][, comb_area_sepal := (sepal.length * …

Run Code Online (Sandbox Code Playgroud)

r data.table

aim*_*r21

2021 02-14

4
推荐指数

1
解决办法

71
查看次数

标签统计

r ×2

data.table ×1

dataframe ×1

reshape ×1

将非结构化 csv 文件转换为数据框

R - 链接 data.table 操作的最佳实践

标签 统计

标签统计