如何在R中读取文本文件并创建数据框

She*_*don 2 r text-parsing

需要阅读https://raw.githubusercontent.com/fonnesbeck/Bios6301/master/datasets/addr.txt中的txt文件

并将其转换为数据框R,其列号为:LastName,FirstName,streetno,streetname,城市,州和邮政编码...

试图使用sep命令将它们分开,但是失败了。

eip*_*i10 5

扩展我的评论,这是另一种方法。如果您的完整数据集具有更广泛的模式范围,则可能需要调整一些代码。

library(stringr) # For str_trim 

# Read string data and split into data frame
dat = readLines("addr.txt")
dat = as.data.frame(do.call(rbind, strsplit(dat, split=" {2,10}")), stringsAsFactors=FALSE)
names(dat) = c("LastName", "FirstName", "address", "city", "state", "zip")

# Separate address into number and street (if streetno isn't always numeric,
# or if you don't want it to be numeric, then just remove the as.numeric wrapper).
dat$streetno = as.numeric(gsub("([0-9]{1,4}).*","\\1",  dat$address))
dat$streetname = gsub("[0-9]{1,4} (.*)","\\1",  dat$address)

# Clean up zip
dat$zip = gsub("O","0", dat$zip)
dat$zip = str_trim(dat$zip)

dat = dat[,c(1:2,7:8,4:6)]

dat
      LastName  FirstName streetno           streetname       city state        zip
1        Bania  Thomas M.      725    Commonwealth Ave.     Boston    MA      02215
2      Barnaby      David      373        W. Geneva St.   Wms. Bay    WI      53191
3       Bausch       Judy      373        W. Geneva St.   Wms. Bay    WI      53191
...
41      Wright       Greg      791  Holmdel-Keyport Rd.    Holmdel    NY 07733-1988
42     Zingale    Michael     5640        S. Ellis Ave.    Chicago    IL      60637
Run Code Online (Sandbox Code Playgroud)