使用GoogleFinanceSource函数使用tm.plugin.webmining包进行文本挖掘

Question

使用GoogleFinanceSource函数使用tm.plugin.webmining包进行文本挖掘

我正在网上书http://tidytextmining.com/上学习文本挖掘.在第五章:http: //tidytextmining.com/dtm.html#financial

以下代码:

library(tm.plugin.webmining)
library(purrr)

company <- c("Microsoft", "Apple", "Google", "Amazon", "Facebook",
             "Twitter", "IBM", "Yahoo", "Netflix")
symbol <- c("MSFT", "AAPL", "GOOG", "AMZN", "FB", "TWTR", "IBM", "YHOO", "NFLX")

download_articles <- function(symbol) {
    WebCorpus(GoogleFinanceSource(paste0("NASDAQ:", symbol)))
}
stock_articles <- data_frame(company = company,
                             symbol = symbol) %>%
    mutate(corpus = map(symbol, download_articles))

Run Code Online (Sandbox Code Playgroud)

给我错误:

StartTag: invalid element name
Extra content at the end of the document
Error: 1: StartTag: invalid element name
2: Extra content at the end of the document

Run Code Online (Sandbox Code Playgroud)

任何提示？有人建议删除与"Twitter"相关的公司和符号,但它仍然不起作用并返回相同的错误.提前谢谢了

Answer 1

sto*_*per 5

我有同样的问题,但是,它已经略微缩小了.这段代码会导致相同的错误.

GoogleFinanceSource("NASDAQ:MSFT")

Run Code Online (Sandbox Code Playgroud)

StartTag: invalid element name
Extra content at the end of the document
Error: 1: StartTag: invalid element name
2: Extra content at the end of the document

Run Code Online (Sandbox Code Playgroud)

我还看到其他人建议删除Twitter的地方.由于Twitter不在纳斯达克上市,我明白它会失败.我尝试了建议的"纽约证券交易所股票代码:TWTR"并获得了相同的结果.

我试图使用GoogleNewsSource来查看我是否会遇到同样的问题并且得到了一个不同的错误,github上的这篇文章建议是由解析器引起的.我想知道这两个问题是否有关系.github.com/mannau/tm.plugin.webmining/issues/14.

GoogleNewsSource("Microsoft")

Run Code Online (Sandbox Code Playgroud)

Unknown IO error failed to load external entity "http://news.google.com/news?hl=en&q=Microsoft&ie=utf-8&num=100&output=rss"
Error: 1: Unknown IO error2: failed to load external entity "http://news.google.com/news?hl=en&q=Microsoft&ie=utf-8&num=100&output=rss"

Run Code Online (Sandbox Code Playgroud)

尽管如此,我已经找到了一个使用修改后的股票代码和YahooFinanceSource的工作如下:

company <- c("Microsoft", "Apple", "Google")
symbol <- c("MSFT", "AAPL", "GOOG")

download_articles <- function(symbol) {
    WebCorpus(YahooFinanceSource(symbol))
}

stock_articles <- data_frame(company = company,
                         symbol = symbol) %>%
mutate(corpus = map(symbol, download_articles))

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年前
查看次数：	1006 次
最近记录：	7 年，2 月前