R中非UTF-8和ASCII字符twitteR包的问题

Tho*_*mas 5 twitter r utf-8

在之前的一个问题中,我询问是否使用R中的twitteR软件包从Haaretz Twitter订阅源(@haaretzcom)下载大量Twitter关注者(及其位置,创建日期,关注者数量等)(请参阅工作率在R)中使用twitteR包提取大量用户信息的限制.Twitter提要有超过90,000个关注者,我可以使用下面的代码下载完整的关注者列表没问题.

   require(twitteR)
   require(ROAuth)
   #Loading the Twitter OAuthorization
   load("~/Dropbox/Twitter/my_oauth")

   #Confirming the OAuth
   registerTwitterOAuth(my_oauth)

  # opening list to download
  haaretz_followers<-getUser("haaretzcom")$getFollowerIDs(retryOnRateLimit=9999999)

  for (follower in haaretz_followers){
   Sys.sleep(5)
   haaretz_followers_info<-lookupUsers(haaretz_followers)

   haaretz_followers_full<-twListToDF(haaretz_followers_info)

   #Export data to csv
  write.table(haaretz_followers_full, file = "haaretz_twitter_followers.csv",  sep=",")
 }
Run Code Online (Sandbox Code Playgroud)

该代码用于提取许多用户.但是,每当我遇到某个用户时,我都会收到以下错误:

Error in twFromJSON(out) :
RMate stopped at line 51
Error: Malformed response from server, was not JSON.
RMate stopped at line 51
The most likely cause of this error is Twitter returning a character which
can't be properly parsed by R. Generally the only remedy is to wait long
enough for the offending character to disappear from searches (e.g. if
using searchTwitter()).
Calls: twListToDF ... lookupUsers -> lapply -> FUN -> <Anonymous> -> twFromJSON
Execution halted
Run Code Online (Sandbox Code Playgroud)

即使我在twitteR包之后加载了RJSONIO包,我也遇到了这个问题.从进行一些研究看来,twitteR和RJSONIO包在解析非UTF-8或ASCII字符(阿拉伯语等)时出现问题http://lists.hexdump.org/pipermail/twitter-users-hexdump.org/ 2013-May/000335.html.有没有办法在我的代码中简单地忽略非UTF-8或ASCII,并仍然提取所有关注者信息?任何帮助将非常感激.

SPi*_*SPi 1

有一个软件包更新 (1.1.7) 解决了这个问题。请参阅: https: //github.com/geoffjentry/twitteR/blob/master/NEWS