如何使用R或PowerShell从文本文件中提取数据?

jra*_*ara 6 powershell text-processing r powershell-2.0

我有一个包含这样的数据的文本文件:

This is just text
-------------------------------
Username:          SOMETHI           C:                 [Text]
Account:           DFAG              Finish time:        1-JAN-2011 00:31:58.91
Process ID:        2028aaB           Start time:        31-DEC-2010 20:27:15.30

This is just text
-------------------------------
Username:          SOMEGG            C:                 [Text]
Account:           DFAG              Finish time:        1-JAN-2011 00:31:58.91
Process ID:        20dd33DB          Start time:        12-DEC-2010 20:27:15.30

This is just text
-------------------------------
Username:          SOMEYY            C:                 [Text]
Account:           DFAG              Finish time:        1-JAN-2011 00:31:58.91
Process ID:        202223DB          Start time:        15-DEC-2010 20:27:15.30
Run Code Online (Sandbox Code Playgroud)

有没有办法从这种数据中提取用户名,完成时间,开始时间?我正在寻找一些起点使用R或Powershell.

Vin*_*ynd 8

R可能不是处理文本文件的最佳工具,但您可以按以下步骤操作:通过将文件读取为固定宽度文件来标识两列,通过拆分冒号上的字符串将字段与其值分开,添加"id"列,并将所有内容整理回来.

# Read the file
d <- read.fwf("A.txt", c(37,100), stringsAsFactors=FALSE)

# Separate fields and values
d <- d[grep(":", d$V1),]
d <- cbind( 
  do.call( rbind, strsplit(d$V1, ":\\s+") ), 
  do.call( rbind, strsplit(d$V2, ":\\s+") ) 
)

# Add an id column
d <- cbind( d, cumsum( d[,1] == "Username" ) )

# Stack the left and right parts
d <- rbind( d[,c(5,1,2)], d[,c(5,3,4)] )
colnames(d) <- c("id", "field", "value")
d <- as.data.frame(d)
d$value <- gsub("\\s+$", "", d$value)

# Convert to a wide data.frame
library(reshape2)
d <- dcast( d, id ~ field )
Run Code Online (Sandbox Code Playgroud)