极慢播放框架2.3请求处理代码

Question

极慢播放框架2.3请求处理代码

rsa*_*san 1 java performance netty playframework playframework-2.0

我面临着方法性能极慢的问题：

HttpRequestDecoder.unfoldAndFireMessageReceived()

Run Code Online (Sandbox Code Playgroud)

和

Future$PromiseCompletingRunnable.run()

Run Code Online (Sandbox Code Playgroud)

这两种方法大约使用服务器中每个事务一半的时间。它发生在低吞吐量和高使用时间期间。

例如，凌晨 1 点，只有我向应用程序发出请求，我在 new relic 中得到如下图表：

在这个事务中，只有这两个方法消耗了 1 整秒，通过 hibernate 访问数据库甚至更慢！再次强调，应用程序中只有一个用户。

如果交易量较大，则需要更多时间：

在本例中，这两种方法平均耗时 2.5 秒，而我自己的代码耗时 1.5 秒，总共耗时 4 秒。我当时想，也许这只是新遗迹度量的误导。也许 newrelic 显示了这个方法的名称，但它实际上是我编写的代码。所以我决定获得一个像这样的自定义指标：

playController(){
//Start timer
//do the job
//stop the timer() and send metric to new relic
//return;
}

Run Code Online (Sandbox Code Playgroud)

结果是我的代码花了 1.5 秒。所以真正消耗这个时间的是播放请求处理程序。

当负载较高时，这种行为会杀死我的应用程序。当吞吐量约为每分钟 500 个请求时（并不是真正的高吞吐量！），这两种方法可能会消耗长达 20 秒的时间，但我的代码保持稳定在最大 3 秒。

我真的不认为这是一个线程问题，因为它甚至在只有一个用户时也会发生，但当有许多并发请求时，它就会变得非常有问题。我尝试更改“同步应用程序”的线程数（如文档中提到的），但我没有得到任何性能变化，甚至变得更糟。

我真的很担心这个问题，因为在 play 的邮件列表中也有类似的案例，已经两年多了，却没有任何答复！：

http://grokbase.com/t/gg/play-framework/159bzf7r9p/help-to-understand-newrelic-report-for-slow-transactions-2-1-4

StackOverflow 中甚至有一个类似的问题，但对于 play 2.1 没有答案，也没有明显的活动：

以 Play Framework 作为后端的 NewRelic 中事务处理缓慢

有什么想法可能导致这种行为吗？

Answer 1

rsa*_*san 5

所以一个月后我终于可以说这个问题已经解决了。答案是完全没有问题。New relic 默认检测无法正确报告 Play Framework 2 事务消耗的时间，我什至可以说通过 Netty 运行的任何异步框架。

为了得出这个结论，我必须在最有问题的交易中包含一些自定义指标，只是为了发现我的自定义仪器使用的时间比 new relic 报告的要少得多。

之后，我使用 firebug 在客户端进行测试，报告的时间与我的自定义指标相符。

就在一周前，我在 newrelic 论坛中发现了这篇文章：

https://docs.newrelic.com/docs/agents/java-agent/frameworks/disable-scala-netty-akka-play-2-instrumentation

在禁用 netty、akka 的所有检测并使用 newrelic 配置文件中的这一行之后，我终于开始通过默认检测获得实际时间：

common: &default_settings

  class_transformer:
    # Disable all Akka instrumentations
    com.newrelic.instrumentation.akka-2.0:
      enabled: false
    com.newrelic.instrumentation.akka-2.1:
      enabled: false
    com.newrelic.instrumentation.akka-2.2:
      enabled: false

    # Disable all Netty instrumentations
    com.newrelic.instrumentation.netty-3.4:
      enabled: false
    com.newrelic.instrumentation.netty-3.8:
      enabled: false
    com.newrelic.instrumentation.netty-4.0.0:
      enabled: false
    com.newrelic.instrumentation.netty-4.0.8:
      enabled: false

    # Disable all Play 2 instrumentations
    com.newrelic.instrumentation.play-2.1:
      enabled: false
    com.newrelic.instrumentation.play-2.2:
      enabled: false
    com.newrelic.instrumentation.play-2.3:
      enabled: false
    # New in Release 3.22, the Play 2.4 instrumentation does not respect
    # the older play2_instrumentation configuration setting 
    com.newrelic.instrumentation.play-2.4:
      enabled: false

    # Disable all Scala-language instrumentations
    com.newrelic.instrumentation.scala-2.9.3:
      enabled: false

Run Code Online (Sandbox Code Playgroud)

在 newrelic 文件中说：

如果您发现报告的指标对您没有价值，或者检测产生的开销超出您的预期，您可以选择禁用部分或全部检测。如果您有选择地禁用某些检测，则某些活动片段将不会被报告，并且您的总时间将被低估。

但恕我直言，它应该说：

如果您想获得实际的指标，您可以选择禁用所有这些检测。

为什么会有这种行为？我只能猜测 Play 和 Netty 通过许多事务重用池中的一些线程，并且 newrelic 代理无法正确区分数据库和 Netty 消耗的时间，从而重复甚至有时是应用程序消耗的实际时间的三倍。

这个问题极大地误导了我的团队（以及项目的发起人）。我并不完全责怪 Newrelic，这个工具很有帮助，但这给我留下了一个教训，那就是不要只相信一个工具。

归档时间：	9 年，9 月前
查看次数：	1377 次
最近记录：	9 年，8 月前