您如何以编程方式为多播发现机制配置hazelcast?

Dav*_*Far 22 java configuration multicast hazelcast

您如何以编程方式为多播发现机制配置hazelcast?


细节:

文档仅提供TCP/IP的示例并且已过时:它使用不再存在的Config.setPort().

我的配置看起来像这样,但发现不起作用(即我得到输出"Members: 1":

        Config cfg = new Config();                  
        NetworkConfig network = cfg.getNetworkConfig();
        network.setPort(PORT_NUMBER);

        JoinConfig join = network.getJoin();
        join.getTcpIpConfig().setEnabled(false);
        join.getAwsConfig().setEnabled(false);
        join.getMulticastConfig().setEnabled(true);

        join.getMulticastConfig().setMulticastGroup(MULTICAST_ADDRESS);
        join.getMulticastConfig().setMulticastPort(PORT_NUMBER);
        join.getMulticastConfig().setMulticastTimeoutSeconds(200);

        HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg);
        System.out.println("Members: "+hazelInst.getCluster().getMembers().size());
Run Code Online (Sandbox Code Playgroud)

更新1,考虑到asimarslan的答案

如果我偶然发现MulticastTimeout,我要么得到"Members: 1"或者

2013年12月5日下午8:50:42 com.hazelcast.nio.ReadHandler警告:[192.168.0.9]:4446 [dev] hz._hzInstance_1_dev.IO.thread-in-0关闭套接字到端点地址[192.168.0.7] :4446,原因:java.io.EOFException:远程套接字已关闭!2013年12月5日下午8:57:24 com.hazelcast.instance.Node严重:[192.168.0.9]:4446 [dev]无法加入群集,关闭!com.hazelcast.core.HazelcastException:300秒内无法加入!


更新2,采取pveentjer关于使用tcp/ip的答案

如果我将配置更改为以下内容,我仍然只能获得1个成员:

Config cfg = new Config();                  
NetworkConfig network = cfg.getNetworkConfig();
network.setPort(PORT_NUMBER);

JoinConfig join = network.getJoin();

join.getMulticastConfig().setEnabled(false);
join.getTcpIpConfig().addMember("192.168.0.1").addMember("192.168.0.2").
addMember("192.168.0.3").addMember("192.168.0.4").
addMember("192.168.0.5").addMember("192.168.0.6").
addMember("192.168.0.7").addMember("192.168.0.8").
addMember("192.168.0.9").addMember("192.168.0.10").
addMember("192.168.0.11").setRequiredMember(null).setEnabled(true);

//this sets the allowed connections to the cluster? necessary for multicast, too?
network.getInterfaces().setEnabled(true).addInterface("192.168.0.*");

HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg);
System.out.println("debug: joined via "+join+" with "+hazelInst.getCluster()
.getMembers().size()+" members.");
Run Code Online (Sandbox Code Playgroud)

更确切地说,此运行产生输出

debug:通过JoinConfig加入{multicastConfig = MulticastConfig [enabled = false,multicastGroup = 224.2.2.3,multicastPort = 54327,multicastTimeToLive = 32,multicastTimeoutSeconds = 2,trustedInterfaces = []],tcpIpConfig = TcpIpConfig [enabled = true,connectionTimeoutSeconds = 5, members = [192.168.0.1,192.168.0.2,192.168.0.3,192.168.0.4,192.168.0.5,192.168.0.6,192.168.0.7,19​​2.168.0.8,192.168.0.9,192.168.0.10,192.168.0.11],requiredMember = null],awsConfig = AwsConfig {enabled = false,region ='us-east-1',securityGroupName ='null',tagKey ='null',tagValue ='null',hostHeader ='ec2.amazonaws.com',connectionTimeoutSeconds = 5}}有1名成员.

我的非hazelcast实现使用UDP多播并且工作正常.防火墙真的可以成为问题吗?


更新3,考虑pveentjer关于检查网络的答案

由于我没有iptables的权限或安装iperf,我com.hazelcast.examples.TestApp用来检查我的网络是否正常工作,如第2章"直接显示"中的Hazelcast入门中所述:

我打电话java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp给192.168.0.1并获得输出

...Dec 10, 2013 11:31:21 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Prefer IPv4 stack is true.
Dec 10, 2013 11:31:21 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Picked Address[192.168.0.1]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
Dec 10, 2013 11:31:22 PM com.hazelcast.system
INFO: [192.168.0.1]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.0.1]:5701
Dec 10, 2013 11:31:22 PM com.hazelcast.system
INFO: [192.168.0.1]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
Dec 10, 2013 11:31:22 PM com.hazelcast.instance.Node
INFO: [192.168.0.1]:5701 [dev] Creating MulticastJoiner
Dec 10, 2013 11:31:22 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.1]:5701 [dev] Address[192.168.0.1]:5701 is STARTING
Dec 10, 2013 11:31:24 PM com.hazelcast.cluster.MulticastJoiner
INFO: [192.168.0.1]:5701 [dev] 

Members [1] {
    Member [192.168.0.1]:5701 this
}

Dec 10, 2013 11:31:24 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.1]:5701 [dev] Address[192.168.0.1]:5701 is STARTED
Run Code Online (Sandbox Code Playgroud)

然后我调用java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp192.168.0.2并获取输出

...Dec 10, 2013 9:50:22 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Prefer IPv4 stack is true.
Dec 10, 2013 9:50:22 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Picked Address[192.168.0.2]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
Dec 10, 2013 9:50:23 PM com.hazelcast.system
INFO: [192.168.0.2]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.0.2]:5701
Dec 10, 2013 9:50:23 PM com.hazelcast.system
INFO: [192.168.0.2]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
Dec 10, 2013 9:50:23 PM com.hazelcast.instance.Node
INFO: [192.168.0.2]:5701 [dev] Creating MulticastJoiner
Dec 10, 2013 9:50:23 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.2]:5701 [dev] Address[192.168.0.2]:5701 is STARTING
Dec 10, 2013 9:50:23 PM com.hazelcast.nio.SocketConnector
INFO: [192.168.0.2]:5701 [dev] Connecting to /192.168.0.1:5701, timeout: 0, bind-any: true
Dec 10, 2013 9:50:23 PM com.hazelcast.nio.TcpIpConnectionManager
INFO: [192.168.0.2]:5701 [dev] 38476 accepted socket connection from /192.168.0.1:5701
Dec 10, 2013 9:50:28 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.0.2]:5701 [dev] 

Members [2] {
    Member [192.168.0.1]:5701
    Member [192.168.0.2]:5701 this
}

Dec 10, 2013 9:50:30 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.2]:5701 [dev] Address[192.168.0.2]:5701 is STARTED
Run Code Online (Sandbox Code Playgroud)

所以多播发现通常在我的集群上工作,对吗?5701也是发现的端口吗?是38476在最后输出的ID或端口?

加入仍然不适用于我自己的程序配置代码:(


更新4,考虑pveentjer关于使用默认配置的答案

修改后的TestApp提供输出

joinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, 
multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, 
trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, 
connectionTimeoutSeconds=5, members=[], requiredMember=null], 
awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', 
tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}}
Run Code Online (Sandbox Code Playgroud)

并且在几秒钟之后检测其他成员(在每个实例一次只列出自己作为成员之后,如果所有成员同时启动),而

myProgram给出了输出

joined via JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multica\
stTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSecond\
s=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='nu\
ll', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}} with 1 members.
Run Code Online (Sandbox Code Playgroud)

并且在大约1分钟的运行时间内没有检测到成员(我大约每5秒计算一次成员).

但是,如果至少有一个TestApp实例在集群上并发运行,则会检测所有TestApp实例和所有myProgram实例,并且我的程序运行正常.如果我启动TestApp一次然后myProgram并行两次,TestApp会提供以下输出:

java -cp ~/CaseStudy/jtorx-1.10.0-beta8/lib/hazelcast-3.1.2.jar:. TestApp
Dec 12, 2013 12:02:15 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Prefer IPv4 stack is true.
Dec 12, 2013 12:02:15 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Picked Address[192.168.180.240]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
Dec 12, 2013 12:02:15 PM com.hazelcast.system
INFO: [192.168.180.240]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.180.240]:5701
Dec 12, 2013 12:02:15 PM com.hazelcast.system
INFO: [192.168.180.240]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
Dec 12, 2013 12:02:15 PM com.hazelcast.instance.Node
INFO: [192.168.180.240]:5701 [dev] Creating MulticastJoiner
Dec 12, 2013 12:02:15 PM com.hazelcast.core.LifecycleService
INFO: [192.168.180.240]:5701 [dev] Address[192.168.180.240]:5701 is STARTING
Dec 12, 2013 12:02:21 PM com.hazelcast.cluster.MulticastJoiner
INFO: [192.168.180.240]:5701 [dev] 


Members [1] {
    Member [192.168.180.240]:5701 this
}

Dec 12, 2013 12:02:22 PM com.hazelcast.core.LifecycleService
INFO: [192.168.180.240]:5701 [dev] Address[192.168.180.240]:5701 is STARTED
Dec 12, 2013 12:02:22 PM com.hazelcast.management.ManagementCenterService
INFO: [192.168.180.240]:5701 [dev] Hazelcast will connect to Management Center on address: http://localhost:8080/mancenter-3.1.2/
Join: JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSeconds=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}}
Dec 12, 2013 12:02:22 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Initializing cluster partition table first arrangement...
hazelcast[default] > Dec 12, 2013 12:03:27 PM com.hazelcast.nio.SocketAcceptor
INFO: [192.168.180.240]:5701 [dev] Accepting socket connection from /192.168.0.8:38764
Dec 12, 2013 12:03:27 PM com.hazelcast.nio.TcpIpConnectionManager
INFO: [192.168.180.240]:5701 [dev] 5701 accepted socket connection from /192.168.0.8:38764
Dec 12, 2013 12:03:27 PM com.hazelcast.nio.SocketAcceptor
INFO: [192.168.180.240]:5701 [dev] Accepting socket connection from /192.168.0.7:54436
Dec 12, 2013 12:03:27 PM com.hazelcast.nio.TcpIpConnectionManager
INFO: [192.168.180.240]:5701 [dev] 5701 accepted socket connection from /192.168.0.7:54436
Dec 12, 2013 12:03:32 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Re-partitioning cluster data... Migration queue size: 181
Dec 12, 2013 12:03:32 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] 

Members [3] {
    Member [192.168.180.240]:5701 this
    Member [192.168.0.8]:5701
    Member [192.168.0.7]:5701
}

Dec 12, 2013 12:03:43 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Re-partitioning cluster data... Migration queue size: 181
Dec 12, 2013 12:03:45 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] All migration tasks has been completed, queues are empty.
Dec 12, 2013 12:03:46 PM com.hazelcast.nio.TcpIpConnection
INFO: [192.168.180.240]:5701 [dev] Connection [Address[192.168.0.8]:5701] lost. Reason: Socket explicitly closed
Dec 12, 2013 12:03:46 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] Removing Member [192.168.0.8]:5701
Dec 12, 2013 12:03:46 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] 

Members [2] {
    Member [192.168.180.240]:5701 this
    Member [192.168.0.7]:5701
}

Dec 12, 2013 12:03:48 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Partition balance is ok, no need to re-partition cluster data... 
Dec 12, 2013 12:03:48 PM com.hazelcast.nio.TcpIpConnection
INFO: [192.168.180.240]:5701 [dev] Connection [Address[192.168.0.7]:5701] lost. Reason: Socket explicitly closed
Dec 12, 2013 12:03:48 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] Removing Member [192.168.0.7]:5701
Dec 12, 2013 12:03:48 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] 

Members [1] {
    Member [192.168.180.240]:5701 this
}

Dec 12, 2013 12:03:48 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Partition balance is ok, no need to re-partition cluster data... 
Run Code Online (Sandbox Code Playgroud)

我在TestApp的配置中看到的唯一区别是

config.getManagementCenterConfig().setEnabled(true);
    config.getManagementCenterConfig().setUrl("http://localhost:8080/mancenter-"+version);

    for(int k=1;k<= LOAD_EXECUTORS_COUNT;k++){
        config.addExecutorConfig(new ExecutorConfig("e"+k).setPoolSize(k));
    }
Run Code Online (Sandbox Code Playgroud)

所以我也拼命地尝试了myProgram.但它并没有解决问题 - 仍然每个实例仅在整个运行期间检测到自己为成员.


更新myProgram运行的时间

可能是程序运行时间不够长(正如pveentjer所说的那样)?

我的实验似乎证实了这一点:如果时间tHazelcast.newHazelcastInstance(cfg);和初始化之间cleanUp()(即不再通过hazelcast进行通信而不再检查成员数量)是

  • 不到30秒,没有沟通和 members: 1
  • 超过30秒:找到所有成员并进行通信(奇怪的是,发生的时间远远超过t - 30秒).

30秒是一个真实的时间跨度,榛树群需要,还是有一些奇怪的事情发生?以下是同时运行的4个myPrograms的日志(查找hazelcast-members重叠30秒,例如1和实例3):

instance 1: 2013-12-19T12:39:16.553+0100 LOG 0 (START) engine started 
looking for members between 2013-12-19T12:39:21.973+0100 and 2013-12-19T12:40:27.863+0100  
2013-12-19T12:40:28.205+0100 LOG 35 (Torx-Explorer) Model  SymToSim is about to\  exit

instance 2: 2013-12-19T12:39:16.592+0100 LOG 0 (START) engine started 
looking for members between 2013-12-19T12:39:22.192+0100 and 2013-12-19T12:39:28.429+0100 
2013-12-19T12:39:28.711+0100 LOG 52 (Torx-Explorer) Model  SymToSim is about to\  exit

instance 3: 2013-12-19T12:39:16.593+0100 LOG 0 (START) engine started 
looking for members between 2013-12-19T12:39:22.145+0100 and 2013-12-19T12:39:52.425+0100  
2013-12-19T12:39:52.639+0100 LOG 54 (Torx-Explorer) Model  SymToSim is about to\  exit

INSTANCE 4: 2013-12-19T12:39:16.885+0100 LOG 0 (START) engine started 
looking for members between 2013-12-19T12:39:21.478+0100 and 2013-12-19T12:39:35.980+0100  
2013-12-19T12:39:36.024+0100 LOG 34 (Torx-Explorer) Model  SymToSim is about to\  exit
Run Code Online (Sandbox Code Playgroud)

只有在hazelcast集群中有足够的成员后,如何才能最好地启动我的实际分布式算法?我可以通过hazelcast.initial.min.cluster.size编程设置吗?https://groups.google.com/forum/#!topic/hazelcast/sa-lmpEDa6A听起来会阻塞,Hazelcast.newHazelcastInstance(cfg);直到达到initial.min.cluster.size.正确?不同实例如何同步(在哪个时间范围内)解除阻塞?

pve*_*jer 15

问题显然是集群启动(并停止)并且不等到集群中有足够的成员.您可以设置hazelcast.initial.min.cluster.size属性,以防止发生这种情况.

您可以使用以下方式以编程方式设置'hazelcast.initial.min.cluster.size':

Config config = new Config(); 
config.setProperty("hazelcast.initial.min.cluster.size","3");
Run Code Online (Sandbox Code Playgroud)


小智 6

您的配置是正确的但是您设置了很长的多播超时200秒,默认值为2秒.设置较小的值将解决它.

来自Hazelcast Java API Doc: MulticastConfig.html#setMulticastTimeoutSeconds(int)

指定节点在将自身声明为主节点并创建其自己的集群之前应等待来自网络中运行的另一节点的有效多播响应的时间(以秒为单位).这仅适用于尚未分配主节点的节点的启动.如果指定一个较高的值,例如60秒,则意味着在选择主控器之前,每个节点将在继续之前等待60秒,因此请小心提供高值.如果该值设置得太低,则可能是节点过早放弃并将创建自己的集群.