使用Tweepy避免Twitter API限制

4m1*_*4j1 28 python twitter tweepy python-2.7

我在Stack Exchange上的一些问题中看到,限制可以是每15分钟的请求数量的函数,并且还取决于算法的复杂性,除了这不是一个复杂的.

所以我使用这段代码:

import tweepy
import sqlite3
import time

db = sqlite3.connect('data/MyDB.db')

# Get a cursor object
cursor = db.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS MyTable(id INTEGER PRIMARY KEY, name TEXT, geo TEXT, image TEXT, source TEXT, timestamp TEXT, text TEXT, rt INTEGER)''')
db.commit()

consumer_key = ""
consumer_secret = ""
key = ""
secret = ""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(key, secret)

api = tweepy.API(auth)

search = "#MyHashtag"

for tweet in tweepy.Cursor(api.search,
                           q=search,
                           include_entities=True).items():
    while True:
        try:
            cursor.execute('''INSERT INTO MyTable(name, geo, image, source, timestamp, text, rt) VALUES(?,?,?,?,?,?,?)''',(tweet.user.screen_name, str(tweet.geo), tweet.user.profile_image_url, tweet.source, tweet.created_at, tweet.text, tweet.retweet_count))
        except tweepy.TweepError:
                time.sleep(60 * 15)
                continue
        break
db.commit()
db.close()
Run Code Online (Sandbox Code Playgroud)

我总是得到Twitter限制错误:

Traceback (most recent call last):
  File "stream.py", line 25, in <module>
    include_entities=True).items():
  File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 153, in next
    self.current_page = self.page_iterator.next()
  File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 98, in next
    data = self.method(max_id = max_id, *self.args, **self.kargs)
  File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 200, in _call
    return method.execute()
  File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 176, in execute
    raise TweepError(error_msg, resp)
tweepy.error.TweepError: [{'message': 'Rate limit exceeded', 'code': 88}]
Run Code Online (Sandbox Code Playgroud)

Dan*_*yen 67

对于在Google上偶然发现这种情况的人来说,tweepy 3.2+还有针对tweepy.api类的其他参数,特别是:

  • wait_on_rate_limit - 是否自动等待速率限制补充
  • wait_on_rate_limit_notify - 当Tweepy等待速率限制补充时是否打印通知

设置这些标志True会将等待委托给API实例,这对于大多数简单的用例来说已经足够了.

  • 现在这应该是公认的答案,@4m1nh4j1。另外,你的名字打起来很麻烦。 (4认同)
  • 使用此方法,Cursor 对象在补充后是否会获得不同的推文,或者是否有机会在达到速率限制之前获得上一次“迭代”中获得的推文?@dan-nguyen (2认同)

Aar*_*ill 24

问题是您的try: except:块位于错误的位置.将数据插入到数据库中永远不会引发数据TweepError- 它会迭代Cursor.items()它.我建议重构你的代码来调用无限循环中的next方法Cursor.items().该调用应放在try: except:块中,因为它可能引发错误.

这是(大致)代码应该是什么样子:

# above omitted for brevity
c = tweepy.Cursor(api.search,
                       q=search,
                       include_entities=True).items()
while True:
    try:
        tweet = c.next()
        # Insert into db
    except tweepy.TweepError:
        time.sleep(60 * 15)
        continue
    except StopIteration:
        break
Run Code Online (Sandbox Code Playgroud)

这是有效的,因为当Tweepy引发a时TweepError,它没有更新任何游标数据.下次发出请求时,它将使用与触发速率限制的请求相同的参数,有效地重复它直到它通过.

  • `wait_on_rate_limit`将停止异常.Tweepy会睡觉,但需要很长时间才能补充速率限制. (4认同)
  • @jenn:在创建`API~实例时将其作为keword参数传递. (2认同)
  • 使用`wait_on_rate_limit = True`是正确的方法。如果您继续达到速率限制并保持睡眠状态,Twitter最终会将您的帐户列入黑名单。我经历过很多次。 (2认同)

Til*_*ann 17

如果要避免错误并遵守速率限制,可以使用以下函数将api对象作为参数.它检索与上一个请求相同类型的剩余请求数,并等待,直到根据需要重置速率限制.

def test_rate_limit(api, wait=True, buffer=.1):
    """
    Tests whether the rate limit of the last request has been reached.
    :param api: The `tweepy` api instance.
    :param wait: A flag indicating whether to wait for the rate limit reset
                 if the rate limit has been reached.
    :param buffer: A buffer time in seconds that is added on to the waiting
                   time as an extra safety margin.
    :return: True if it is ok to proceed with the next request. False otherwise.
    """
    #Get the number of remaining requests
    remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
    #Check if we have reached the limit
    if remaining == 0:
        limit = int(api.last_response.getheader('x-rate-limit-limit'))
        reset = int(api.last_response.getheader('x-rate-limit-reset'))
        #Parse the UTC time
        reset = datetime.fromtimestamp(reset)
        #Let the user know we have reached the rate limit
        print "0 of {} requests remaining until {}.".format(limit, reset)

        if wait:
            #Determine the delay and sleep
            delay = (reset - datetime.now()).total_seconds() + buffer
            print "Sleeping for {}s...".format(delay)
            sleep(delay)
            #We have waited for the rate limit reset. OK to proceed.
            return True
        else:
            #We have reached the rate limit. The user needs to handle the rate limit manually.
            return False 

    #We have not reached the rate limit
    return True
Run Code Online (Sandbox Code Playgroud)

  • 请注意,在最新的tweepy版本中,`getheader()`函数被`headers` dict替换,因此`api.last_response.getheader('x-rate-limit-limit')`需要替换为`api.last_response.报头["的x限速残留"]` (2认同)

小智 14

只需更换

api = tweepy.API(auth)
Run Code Online (Sandbox Code Playgroud)

api = tweepy.API(auth, wait_on_rate_limit=True)
Run Code Online (Sandbox Code Playgroud)


Mal*_*aiq 5

import tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# will notify user on ratelimit and will wait by it self no need of sleep.
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
Run Code Online (Sandbox Code Playgroud)