由于id没有更新，所以twitter4j

Question

由于id没有更新，所以twitter4j

我正在尝试使用since_id通过Twitter搜索api获取推文。下面是我的代码，在这里我正在创建一个查询对象的地图，因为ID。我默认将since ID设置为0，而我的目标是每次运行查询时都要更新since ID。这样，当我下次运行查询时，它不会得到相同的推文，而应从上一条推文开始。

import java.io.{PrintWriter, StringWriter}
import java.util.Properties
import com.google.common.io.Resources
import twitter4j._
import scala.collection.JavaConversions._
// reference: http://bcomposes.com/2013/02/09/using-twitter4j-with-scala-to-access-streaming-tweets/
object Util {
    val props = Resources.getResource("twitter4j.props").openStream()
    val properties = new Properties()
    properties.load(props)

    val config = new twitter4j.conf.ConfigurationBuilder()
        .setDebugEnabled(properties.getProperty("debug").toBoolean)
        .setOAuthConsumerKey(properties.getProperty("consumerKey"))
        .setOAuthConsumerSecret(properties.getProperty("consumerSecret"))
        .setOAuthAccessToken(properties.getProperty("accessToken"))
        .setOAuthAccessTokenSecret(properties.getProperty("accessTokenSecret"))
    val tempKeys =List("Yahoo","Bloomberg","Messi", "JPM Chase","Facebook")
    val sinceIDmap : scala.collection.mutable.Map[String, Long] = collection.mutable.Map(tempKeys map { ix => s"$ix" -> 0.toLong } : _*)
    //val tweetsMap: scala.collection.mutable.Map[String, String]
    val configBuild = (config.build())
    val MAX_TWEET=100
    getTweets()

    def getTweets(): Unit ={
        sinceIDmap.keys.foreach((TickerId) => getTweets(TickerId))
    }

    def getTweets(TickerId: String): scala.collection.mutable.Map[String, scala.collection.mutable.Buffer[String]] = {
        println("Search key is:"+TickerId)
        var tweets = scala.collection.mutable.Map[String, scala.collection.mutable.Buffer[String]]()
        try {
            val twitter: Twitter = new TwitterFactory(configBuild).getInstance
            val query = new Query(TickerId)
            query.setSinceId(sinceIDmap.get(TickerId).get)
            query.setLang("en")
            query.setCount(MAX_TWEET)
            val result = twitter.search(query)
            tweets += ( TickerId -> result.getTweets().map(_.getText))

            //sinceIDmap(TickerId)=result.getSinceId
            println("-----------Since id is :"+result.getSinceId )
            //println(tweets)
        }
        catch {
            case te: TwitterException =>
                println("Failed to search tweets: " + te.getMessage)
        }
        tweets
    }
}

object StatusStreamer {
    def main(args: Array[String]) {
        Util
    }
}

Run Code Online (Sandbox Code Playgroud)

输出：

Search key is:Yahoo    
log4j:WARN No appenders could be found for logger (twitter4j.HttpClientImpl).
log4j:WARN Please initialize the log4j system properly.
-----------Since id is :0
Search key is:JPM Chase
-----------Since id is :0
Search key is:Facebook
-----------Since id is :0
Search key is:Bloomberg
-----------Since id is :0
Search key is:Messi
-----------Since id is :0

Run Code Online (Sandbox Code Playgroud)

问题是当我尝试运行查询后尝试打印since ID时，它给出的值与我最初设置的值相同。有人可以指出我在这里做错了什么吗？或者如果我的方法有误，如果他们知道可以在这里工作，那么有人可以分享其他方法吗？

谢谢

Answer 1

xna*_*kos 5

首先，从您对方法的最初描述中，我可以告诉您，您使用的since_id方法不正确。我过去犯过同样的错误，无法正常工作。此外，您的方法与官方的“使用时间轴”不一致。当时，官方指南对我有用，我建议您遵循它们。长话短说，您不能since_id独自使用推文的时间轴（根据GET search / tweets您的情况返回的时间轴）。您绝对需要max_id做您描述的事情。而且，实际上，我相信它since_id具有完全辅助/可选的功能（也可以在您的代码中实现）。该API文档让我相信我可以用since_id完全一样，我可以用max_id，但是我错了。仅指定since_id，我注意到返回的推文非常新鲜，好像since_id已经被完全忽略了。这是另一个问题，证明了此意外行为。如我所见，它since_id仅用于修剪，不用于时间轴移动。since_id单独使用会获得最新/最新的推文，但会将返回的推文限制为ID大于的推文since_id。不是你想要的。从官方指南中获得的最后证据是以下特定请求的图形表示：

不仅不会since_id在时间轴上移动您，而且在此特定请求中碰巧完全没有用。但是，它在下一个请求中不会没有用，因为它会修剪Tweet 10（以及之前的任何操作）。但是事实是，since_id这不会使您在时间轴上移动。

通常，您需要考虑从最新的推文到最旧的推文，而不是相反。要从最新的tweet到最早的tweet，在您的请求中，您需要指定max_id要返回的tweet的ID含上限，并在后续请求之间更新此参数。

max_id请求中是否存在将设置要返回的推文的ID含上限。从返回的tweet中，您可以获得出现的最小ID，并将其用作max_id后续请求中的值（您可以将最小ID减1，然后将该值用于下一个请求的max_id，因为max_id包含在内，因此您不会再次从先前的请求中获取最早的tweet）。应该max_id没有指定第一个请求，以便将返回最新/最新的tweet。使用这种方法，在第一个请求之后的每个请求将使您更深入地了解过去。

since_id当您需要限制过去时，可以派上用场。想象一下，在某个时间点t0，您开始搜索推文。让我们假设，从您的首次搜索开始，您最大的tweet ID为id0。第一次搜索后，后续搜索中的所有推特ID都会变小，因为您要返回。一段时间后，您将获得大约一周的推文，搜索您的推文将不会返回任何内容。在那个时间点t1，您知道过去的旅程已经结束。但是，与t0和t1Twitter推文将被啾啾。因此，另一趟过去的旅程应该从开始t1，直到您找到ID为的推文id0（之前已在twee上发布t0）。此行可以通过使用被限制id0为since_id在旅行的要求中，依此类推。或者，since_id如果您确定一旦ID小于或等于的id0推文（确保可以删除推文），就可以结束旅程，则可以避免使用。但是我建议您尝试使用它，since_id以简化操作并提高效率。请记住，since_id排他性max_id是包含性的。

有关更多信息，请参见《使用时间表》。您会注意到，“ max_id参数”部分首先出现，而“使用since_id以获得最大效率”部分随后出现。后一部分的标题指示该since_id内容不适用于在时间轴上移动。

下面是一个未经测试的粗略示例，该示例使用Java中的Twitter4J来打印从最新开始一直到过去的推文：

// Make sure this is initialized correctly.
Twitter twitter;

/**
 * Searches and prints tweets starting from now and going back to the past.
 * 
 * @param q
 *            the search query, e.g. "#yolo"
 */
private void searchAndPrintTweets(String q) throws TwitterException {
    // `max_id` needed by `GET search/tweets`. If it is 0 (first iteration),
    // it will not be used for the query.
    long maxId = 0;
    // Let us assume that it will run forever.
    while (true) {
        Query query = new Query();
        query.setCount(100);
        query.setLang("en");
        // Set `max_id` as an inclusive upper limit, unless this is the
        // first iteration. If this is the first iteration (maxId == 0), the
        // freshest/latest tweets will come.
        if (maxId != 0)
            query.setMaxId(maxId);
        QueryResult qr = twitter.search(query);
        printTweets(qr.getTweets());
        // For next iteration. Decrement smallest ID by 1, so that we will
        // not get the oldest tweet of this iteration in the next iteration
        // as well, since `max_id` is inclusive.
        maxId = calculateSmallestId(qr.getTweets()) - 1;
    }
}

/**
 * Calculates the smallest ID among a list of tweets.
 * 
 * @param tweets
 *            the list of tweets
 * @return the smallest ID
 */
private long calculateSmallestId(List<Status> tweets) {
    long smallestId = Long.MAX_VALUE;
    for (Status tweet : tweets) {
        if (tweet.getId() < smallestId)
            smallestId = tweet.getId();
    }
    return smallestId;
}

/**
 * Prints the content of the tweets.
 * 
 * @param tweets
 *            the tweets
 */
private void printTweets(List<Status> tweets) {
    for (Status tweet : tweets) {
        System.out.println(tweet.getText());
    }
}

Run Code Online (Sandbox Code Playgroud)

没有错误处理，没有特殊条件检查（例如，查询结果中的tweets为空列表），也没有使用since_id，但是它可以帮助您入门。

归档时间：	9 年，7 月前
查看次数：	717 次
最近记录：	9 年，7 月前