在单独的AWS实例上连接到PostGres数据库时"无法从服务器接收数据:连接超时"或"连接未打开"错误

use*_*458 5 ruby postgresql amazon-ec2 amazon-web-services

我在我的应用服务器上使用Ruby 1.9.3,该服务器在AWS EC2实例上运行.我在单独的EC2实例上运行Postgres DB,但两个实例都在同一个安全组中.当Ruby代码连接到DB时,它使用Sequel ORM gem(http://sequel.rubyforge.org/).

现在,我已将Postgres 9.1.4 DB配置为能够从应用服务器实例正确接受连接.

但是,我时不时地在app服务器的日志中注意到它将无法连接到Postgres数据库实例,我会看到如下错误消息:

PG::Error: could not receive data from server: Connection timed out
Run Code Online (Sandbox Code Playgroud)

要么

PG::Error: connection not open
Run Code Online (Sandbox Code Playgroud)

所以我去了Postgres数据库实例并查看了/var/log/postgresql/postgresql-9.1-main.log,我看到了一堆这样的消息:

2012-11-07 08:15:17 UTC LOG:  could not receive data from client: Connection timed out
2012-11-07 08:15:17 UTC LOG:  unexpected EOF on client connection
Run Code Online (Sandbox Code Playgroud)

我在网上搜索包括堆栈溢出,并确保我的PostgreSQL没有启用SSL(我的postgresql.conf文件中有"ssl = off")

在这一点上,我不确定Postgres配置中究竟是什么问题.如果没有充分证明的原因,我不会弄乱生产服务器上的最大连接数或最大超时值.

应用服务器大多数时间都可以连接到数据库,此问题只会间歇性地出现.

在Ruby方面,这是在进行Postgres调用时"连接未打开"的错误跟踪:

PG::Error: connection not open
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:145:in `async_exec'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:145:in `block in execute_query'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/logging.rb:33:in `log_yield'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:145:in `execute_query'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:132:in `block in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:111:in `check_disconnect_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:132:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:372:in `_execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `block (2 levels) in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:379:in `check_database_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `block in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/connecting.rb:229:in `block in synchronize'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/connection_pool/threaded.rb:105:in `hold'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/connecting.rb:229:in `synchronize'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/dataset/actions.rb:744:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:483:in `fetch_rows'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/model/base.rb:785:in `primary_key_lookup'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/model/base.rb:124:in `[]'
Run Code Online (Sandbox Code Playgroud)

同样,这是"无法从服务器接收数据"的跟踪:

    PG::Error: could not receive data from server: Connection timed out
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:124:in `block'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:124:in `ensure in check_disconnect_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:124:in `check_disconnect_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:132:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:372:in `_execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `block (2 levels) in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:379:in `check_database_errors'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `block in execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/connecting.rb:229:in `block in synchronize'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/connection_pool/threaded.rb:105:in `hold'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/database/connecting.rb:229:in `synchronize'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:234:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/dataset/actions.rb:744:in `execute'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/adapters/postgres.rb:483:in `fetch_rows'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/model/base.rb:785:in `primary_key_lookup'
/var/lib/gems/1.9.1/gems/sequel-3.38.0/lib/sequel/model/base.rb:124:in `[]'
Run Code Online (Sandbox Code Playgroud)

我观察到如果我同时在同一个实例上运行App服务器和Postgres DB,那么就没有连接问题,至少现在还没有.也许Postgres对非本地数据库连接不太宽容?

请让我知道我可能错过了什么,我很感激!

IS

Cra*_*ger 2

对此的通常解释是连接问题。

或者,如果不是连接问题,则可能是协议同步问题。看起来两端可能都在尝试从套接字读取数据,但都没有尝试写入。因此,也许客户端期望服务器发送响应,而服务器期望客户端发送数据。

如果它是间歇性和偶然性的,那么调试起来会非常困难,因为你不能真的只是 tcpdump 并分析它。

我会在服务器端添加更多日志记录 -log_statement = 'all'以及log_line_prefix显示客户端 IP、后端启动时间和后端 pid 的日志记录。这样,您就可以开始尝试将这些故障与故障之前发生的会话活动进行匹配,确定是否是特定的客户端、特定的作业,或者实际上只是随机的。

这个Sequel ORM gem是libpq在底层使用的,还是它自己的PostgreSQL协议实现?如果是后者,那很可能就是罪魁祸首。

更新:看起来它可以使用pggem (libpq基于)、postgresgem 或可能postgres-pr(无论是什么)。pg如果安装了它会更喜欢。

由于您似乎已经在使用pggem,因此您可能需要进行一些诊断工作来追踪问题出现的位置 - 特定查询、特定客户端等 - 并尝试找到一种方法来重现问题。PostgreSQLcsvlog可能很有用,因此您可以更轻松地加载和分析日志。