Aus*_*ers 6 server-room electrical-power surge-protection
We have been experiencing a very strange problem in our new office's server room across all the power outlets.
Specifically, when all the equipment is up and running (i.e. the air conditioning system, 2x rack mounted servers, 5x 48-port PoE switches and also the door access system - which has its backup batteries and main control circuits based inside the server room) we occasionally see the servers spontaneously reboot, the door access system reboots and the PoE switches simultaneously lurch into a non-functional state for 20 minutes or more at a time. When this happens, all three systems reboot simultaneously. All three systems are on the same circuit.
The servers and switches are running on a UPS device and the card access system also has a backup battery of its own - so a simple momentary loss of power would not explain this as everything should just continue to run from the UPS without interruption. We've disconnected the UPS from the wall and have seen the servers continue to run, as expected - so the UPS seems to be working properly as far as power outages are concerned.
None of the circuit breakers have ever tripped or needed to be reset.
The air conditioning system is apparently on a separate circuit to the servers and network equipment; however, its power cables share a conduit with the power cables which run to wall outlets used by the servers etc. Could there be a risk of a voltage being induced from one circuit to the other when the AC switches on or off as they are parallel to each other for quite a few metres?
I talked to one of the electricians who was trying to work out what was happening and he said that, although the air conditioning unit is on a separate circuit to the servers and other systems, the two circuits actually share a common neutral - something he thought could potentially causing problems. Is this a normal configuration or would it be considered bad practice to have something like an AC unit share a neutral with sensitive equipment in a server room?
目前,该问题已自行消退。服务器已停止自发重启,交换机重新联机,但没有进行真正的更改,因此潜在的问题仍然存在,并且迟早会重新出现。
鉴于我们在这些事件中看到多个带有独立备用电池单元的系统重新启动,除了电涌或尖峰之外还有什么可能的解释?
虽然这不是您所希望的直接“这就是您的问题”答案,但这是我的建议。
看来,虽然你很高贵,但你对找出问题所在的追求并不会很快得到解决。
您可以像其他人建议的那样,尝试记录任何可以记录的内容,并希望出现一种模式。
我喜欢德罗伯特关于雇用某人来测量电能质量的建议......
然而,这是我的实际建议,你已经做了一些。 交给电工吧。
严重地。合格的电工(即使您必须外包)应该能够为您提供根本原因,无论它是否本质上是电气的。他们可以测试每个电路,以确保它们没有过载(特别是在尖峰/启动时),他们可以确保接线充足,并且电路的大小适合您所连接的内容。等等等等
大多数时候,IT 部门没有自己合格的电工,我们常常喜欢只是“插入东西”,而没有意识到我们是否使用了正确的电路、平衡电路等。
如果您的 UPS 支持日志收集,我会这样做,哪怕只是为了帮助证明问题。虽然您的 UPS 可能不够高端,无法足够(快速)正确地补偿峰值/谷值,但这并不意味着它是根本原因。对我来说这听起来像是一个电气问题。如果您正在运行一个不错的在线 UPS,并且它似乎能够正确补偿输入电压(根据其日志),那么所有插入其中的 IT 设备和读卡器系统同时重新启动就会很奇怪。
与您的老板交谈并解释需要专业电工进行诊断的问题。期望电工设置 BGP 路由是不公平的,反过来也不期望系统管理员成为合格的电工。