如何读取R中不属于格式表的文件?
数据包含某些值的空白数据.空白需要有价值.
"关于"和"名称"是唯一永远存在的价值观.
例如,文本文件如下:
Name
Type
Color
About
Spiderman
Marvel
Red
Swings from webs
Superman
DC
Likes to fly around
Hulk
Marvel
Green
I told you not top make him mad.
Batman
Black
He is a good fighter and detective
Martian Manhunter
DC
He is from Mars
Deadpool
Black Red
Kinda Crazy
Run Code Online (Sandbox Code Playgroud)
第一个条目是标题.我想把它变成一个数据框
Name Type Color About
Spiderman Marvel Red Swings from webs
Superman DC Likes to fly around
Hulk Marvel Green I told you not top make him mad.
Batman Black He is a good fighter and detective
Mar...ter DC He is from Mars
Deadpool Black Red Kinda Crazy
Run Code Online (Sandbox Code Playgroud)
在多线模式下使用扫描(对于由空行分隔的三个项目的非常规的组):
filename="myPath/myFile.txt"
inp <- scan(filename, , what=as.list(rep("",3) ))
dinp <- as.data.frame(inp, stringsAsFactors=FALSE)
names(dinp) <- dinp[1,] # use first set as the column names
dinp <- dinp[-1,] # then remove from the data
Run Code Online (Sandbox Code Playgroud)
第二次尝试(不同的问题)
dat <- readLines(filename)
# Matrices are column-major order, hence the t(). I suppose I could have used byrow=TRUE.
mydf <- as.data.frame( t(matrix(dat, nrow=5) )[-1,-5] )
names(mydf) <- dat[1:4]
#-----------------------------
> mydf
Name Type Color About
1 Spiderman Marvel Red Swings from webs
2 Superman DC Likes to fly around
3 Hulk Marvel Green I told you not top make him mad.
4 Batman Black He is a good fighter and detective
5 Martian Manhunter DC He is from Mars
6 Deadpool Black Red Kinda Crazy
Run Code Online (Sandbox Code Playgroud)