pit*_*las 1 csv r import-from-csv
我不知道为什么标题名称得到"X".使用quote =""导入时的前缀.这是代码:
xhead = read.csv("~/Desktop/dbdump/users.txt", na.strings = "\\N", quote="", nrows = 1000)
Run Code Online (Sandbox Code Playgroud)
这给了我:
names(xhead)
[1] "X.userId." "X.fullName." "X.email." "X.password."
[5] "X.activated." "X.registrationDate." "X.locale." ...
Run Code Online (Sandbox Code Playgroud)
鉴于:
yhead = read.csv("~/Desktop/dbdump/users.txt", na.strings = "\\N", nrows = 1000)
names(yhead)
[1] "userId" "fullName" "email" "password"
[5] "activated" "registrationDate" "locale" ...
Run Code Online (Sandbox Code Playgroud)
我之所以引用=""的原因是我的记录被截断,大概是因为埋在我的15000条记录中有一个流浪的引用.
这是我的数据文件的样子:
"userId", "fullName","email","password","activated","registrationDate","locale","notifyOnUpdates","lastSyncTime","plan_id","plan_period_months","plan_price","plan_exp_date","plan_is_trial","plan_is_trial_used","q_hear","q_occupation","pp_subid","pp_payments","pp_since","pp_cancelled","apikey"
"2","Adam Smith","a@mail.com","*****","1","2004-07-23 14:19:32","en_US","1","2011-04-07 07:29:17","3",\N,\N,\N,"0","1",\N,\N,\N,\N,\N,\N,"d7734dce-4ae2-102a-8951-0040ca38ff83"
Run Code Online (Sandbox Code Playgroud)
列名称make.names在返回之前运行.引号不是列名的有效字符.您可以通过运行来查看差异:
make.names(c('"userId"', "fullName"))
[1] "X.userId." "fullName"
Run Code Online (Sandbox Code Playgroud)
从make.names帮助:
语法上有效的名称由字母,数字和点或下划线字符组成,并以字母或点开头,后面没有数字....如有必要,前缀字符"X".所有无效字符都被翻译为".".
建议是read.csv跳过第一行,不包括标题来获取大部分数据.
dd <- read.csv("~/Desktop/dbdump/users.txt", na.strings = "\\N",
quote="", nrows = 1000, header = FALSE, skip = 1)
Run Code Online (Sandbox Code Playgroud)
然后,您可以使用scan(read.csv在引擎盖下调用的内容)读取列名称
names(dd) <- scan("~/Desktop/dbdump/users.txt", what = character(), nlines=1,sep =',')
Run Code Online (Sandbox Code Playgroud)