Mic*_*eem 5 c# azure-service-fabric
我们有一个向 Service Fabric 无状态服务广播消息的类。这个无状态服务只有一个分区,但有很多副本。该消息应该发送到系统中的所有副本。因此,我们查询 FabricClient 以获取单个分区以及该分区的所有副本。我们使用标准 HTTP 通信(无状态服务具有带有自托管 OWIN 侦听器的通信侦听器,使用 WebListener/HttpSys)与共享 HttpClient 实例。在负载测试期间,我们在发送消息期间遇到许多错误。请注意,我们在同一应用程序中还有其他服务,它们也在进行通信(WebListener/HttpSys、ServiceProxy 和 ActorProxy)。
我们看到异常的代码是(堆栈跟踪在代码示例下方):
private async Task SendMessageToReplicas(string actionName, string message)
{
var fabricClient = new FabricClient();
var eventNotificationHandlerServiceUri = new Uri(ServiceFabricSettings.EventNotificationHandlerServiceName);
var promises = new List<Task>();
// There is only one partition of this service, but there are many replica's
Partition partition = (await fabricClient.QueryManager.GetPartitionListAsync(eventNotificationHandlerServiceUri).ConfigureAwait(false)).First();
string continuationToken = null;
do
{
var replicas = await fabricClient.QueryManager.GetReplicaListAsync(partition.PartitionInformation.Id, continuationToken).ConfigureAwait(false);
foreach(Replica replica in replicas)
{
promises.Add(SendMessageToReplica(replica, actionName, message));
}
continuationToken = replicas.ContinuationToken;
} while(continuationToken != null);
await Task.WhenAll(promises).ConfigureAwait(false);
}
private async Task SendMessageToReplica(Replica replica, string actionName, string message)
{
if(replica.TryGetEndpoint(out Uri replicaUrl))
{
Uri requestUri = UriUtility.Combine(replicaUrl, actionName);
using(var response = await _httpClient.PostAsync(requestUri, message == null ? null : new JsonContent(message)).ConfigureAwait(false))
{
string responseContent = await response.Content.ReadAsStringAsync().ConfigureAwait(false);
if(!response.IsSuccessStatusCode)
{
throw new Exception();
}
}
}
else
{
throw new Exception();
}
}
Run Code Online (Sandbox Code Playgroud)
抛出以下异常:
System.Fabric.FabricTransientException: Could not ping any of the provided Service Fabric gateway endpoints. ---> System.Runtime.InteropServices.COMException: Exception from HRESULT: 0x80071C49
at System.Fabric.Interop.NativeClient.IFabricQueryClient9.EndGetPartitionList2(IFabricAsyncOperationContext context)
at System.Fabric.FabricClient.QueryClient.GetPartitionListAsyncEndWrapper(IFabricAsyncOperationContext context)
at System.Fabric.Interop.AsyncCallOutAdapter2`1.Finish(IFabricAsyncOperationContext context, Boolean expectedCompletedSynchronously)
--- End of inner exception stack trace ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at Company.ServiceFabric.ServiceFabricEventNotifier.<SendMessageToReplicas>d__7.MoveNext() in c:\work\ServiceFabricEventNotifier.cs:line 138
Run Code Online (Sandbox Code Playgroud)
在同一时期,我们还看到抛出了这个异常:
System.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.) ---> System.ComponentModel.Win32Exception (0x80004005): An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full
at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry)
at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
at System.Data.SqlClient.SqlConnection.OpenAsync(CancellationToken cancellationToken)
Run Code Online (Sandbox Code Playgroud)
集群中机器上的事件日志显示以下警告:
Event ID: 4231
Source: Tcpip
Level: Warning
A request to allocate an ephemeral port number from the global TCP port space has failed due to all such ports being in use.
Event ID: 4227
Source: Tcpip
Level: Warning
TCP/IP failed to establish an outgoing connection because the selected local endpoint was recently used to connect to the same remote endpoint. This error typically occurs when outgoing connections are opened and closed at a high rate, causing all available local ports to be used and forcing TCP/IP to reuse a local port for an outgoing connection. To minimize the risk of data corruption, the TCP/IP standard requires a minimum time period to elapse between successive connections from a given local endpoint to a given remote endpoint.
Run Code Online (Sandbox Code Playgroud)
最后,Microsoft-Service Fabric 管理员日志显示了数百个类似于
Event 4121
Source Microsoft-Service-Fabric
Level: Warning
client-02VM4.company.nl:19000/192.168.10.36:19000: error = 2147942452, failureCount=160522. Filter by (type~Transport.St && ~"(?i)02VM4.company.nl:19000") to get listener lifecycle. Connect failure is expected if listener was never started, or listener/its process was stopped before/during connecting.
Event 4097
Source Microsoft-Service-Fabric
Level: Warning
client-02VM4.company.nl:19000 : connect failed, having tried all addresses
Run Code Online (Sandbox Code Playgroud)
过了一会儿,警告变成了错误:
Event 4096
Source Microsoft-Service-Fabric
Level: Error
client-02VM4.company.nl:19000 failed to bind to local port for connecting: 0x80072747
Run Code Online (Sandbox Code Playgroud)
谁能告诉我们为什么会发生这种情况,以及我们可以做些什么来解决这个问题?我们做错了什么吗?
小智 4
我们(我与 OP 一起工作)一直在对此进行测试,结果证明它是 Esben Bach 建议的 FabricClient。
FabricClient的文档还指出:
强烈建议您尽可能共享 FabricClient。这是因为 FabricClient 具有多种优化功能,例如缓存和批处理,否则您将无法充分利用这些优化。
FabricClient 的行为似乎与 HttpClient 类类似,您也应该共享实例,如果不共享实例,您将遇到相同的问题:端口耗尽。
然而,使用 FabricClient文档的常见异常也提到,当发生 FabricObjectClosedException 时,您应该:
处理您正在使用的 FabricClient 对象并实例化一个新的 FabricClient 对象。
共享 FabricClient 修复了端口耗尽问题。
| 归档时间: |
|
| 查看次数: |
1771 次 |
| 最近记录: |