gdo*_*371 21 python twitter tweepy
我发现下面这段代码非常适合让我在Python Shell中查看twitter firehose的标准1%:
import sys
import tweepy
consumer_key=""
consumer_secret=""
access_key = ""
access_secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
print status.text
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=['manchester united'])
Run Code Online (Sandbox Code Playgroud)
如何添加过滤器以仅解析特定位置的推文?我见过人们将GPS添加到其他与Twitter相关的Python代码中,但我无法在Tweepy模块中找到任何特定于sapi的内容.
有任何想法吗?
谢谢
Jua*_* E. 27
流API不允许同时按位置和关键字进行过滤.
边界框不作为其他过滤器参数的过滤器.例如track = twitter&locations = -122.75,36.8,-121.75,37.8将匹配任何包含来自旧金山地区的Twitter(甚至非地理推文)OR的推文.
资料来源:https://dev.twitter.com/docs/streaming-apis/parameters#locations
您可以做的是向流式API询问关键字或定位的推文,然后通过查看每条推文来过滤应用中的结果流.
如果您修改代码如下,您将捕获英国的推文,然后这些推文被过滤,只显示包含"曼彻斯特联合"的那些
import sys
import tweepy
consumer_key=""
consumer_secret=""
access_key=""
access_secret=""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
if 'manchester united' in status.text.lower():
print status.text
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(locations=[-6.38,49.87,1.77,55.81])
Run Code Online (Sandbox Code Playgroud)
小智 19
胡安给出了正确的答案.我只使用这个来过滤德国:
# Bounding boxes for geolocations
# Online-Tool to create boxes (c+p as raw CSV): http://boundingbox.klokantech.com/
GEOBOX_WORLD = [-180,-90,180,90]
GEOBOX_GERMANY = [5.0770049095, 47.2982950435, 15.0403900146, 54.9039819757]
stream.filter(locations=GEOBOX_GERMANY)
Run Code Online (Sandbox Code Playgroud)
这是一个相当粗糙的盒子,包括其他一些国家的部分.如果你想要更精细的谷物,你可以组合多个盒子来填写你需要的位置.
应该注意的是,如果按地理标记进行过滤,则会限制推文的数量.这是来自我的测试数据库的大约500万条推文(查询应该返回实际包含地理位置的推文的%年龄):
> db.tweets.find({coordinates:{$ne:null}}).count() / db.tweets.count()
0.016668392651547598
Run Code Online (Sandbox Code Playgroud)
因此,我的1%流样本中只有1.67%包含地理标记.然而,还有其他方法来确定用户的位置:http: //arxiv.org/ftp/arxiv/papers/1403/1403.2345.pdf