我正在网上书http://tidytextmining.com/上学习文本挖掘.在第五章:http: //tidytextmining.com/dtm.html#financial
以下代码:
library(tm.plugin.webmining)
library(purrr)
company <- c("Microsoft", "Apple", "Google", "Amazon", "Facebook",
             "Twitter", "IBM", "Yahoo", "Netflix")
symbol <- c("MSFT", "AAPL", "GOOG", "AMZN", "FB", "TWTR", "IBM", "YHOO", "NFLX")
download_articles <- function(symbol) {
    WebCorpus(GoogleFinanceSource(paste0("NASDAQ:", symbol)))
}
stock_articles <- data_frame(company = company,
                             symbol = symbol) %>%
    mutate(corpus = map(symbol, download_articles))
给我错误:
StartTag: invalid element name
Extra content at the end of the document
Error: 1: StartTag: invalid element name
2: Extra content at the end of the document
任何提示?有人建议删除与"Twitter"相关的公司和符号,但它仍然不起作用并返回相同的错误.提前谢谢了
我有同样的问题,但是,它已经略微缩小了.这段代码会导致相同的错误.
GoogleFinanceSource("NASDAQ:MSFT")
Run Code Online (Sandbox Code Playgroud)StartTag: invalid element name Extra content at the end of the document Error: 1: StartTag: invalid element name 2: Extra content at the end of the document
我还看到其他人建议删除Twitter的地方.由于Twitter不在纳斯达克上市,我明白它会失败.我尝试了建议的"纽约证券交易所股票代码:TWTR"并获得了相同的结果.
我试图使用GoogleNewsSource来查看我是否会遇到同样的问题并且得到了一个不同的错误,github上的这篇文章建议是由解析器引起的.我想知道这两个问题是否有关系.github.com/mannau/tm.plugin.webmining/issues/14.
GoogleNewsSource("Microsoft")
Run Code Online (Sandbox Code Playgroud)Unknown IO error failed to load external entity "http://news.google.com/news?hl=en&q=Microsoft&ie=utf-8&num=100&output=rss" Error: 1: Unknown IO error2: failed to load external entity "http://news.google.com/news?hl=en&q=Microsoft&ie=utf-8&num=100&output=rss"
尽管如此,我已经找到了一个使用修改后的股票代码和YahooFinanceSource的工作如下:
company <- c("Microsoft", "Apple", "Google")
symbol <- c("MSFT", "AAPL", "GOOG")
download_articles <- function(symbol) {
    WebCorpus(YahooFinanceSource(symbol))
}
stock_articles <- data_frame(company = company,
                         symbol = symbol) %>%
mutate(corpus = map(symbol, download_articles))
| 归档时间: | 
 | 
| 查看次数: | 1006 次 | 
| 最近记录: |