小编hab*_*ats的帖子

经纪建筑模式用简单的英语

有人可以用简单的英语向我解释经纪人模式吗？可能是Java或现实生活中的类比.

java design-patterns broker

hab*_*ats

lucky-day

32
推荐指数

1
解决办法

1万
查看次数

使用Streams API对Collection中的n个随机不同元素执行操作

我正在尝试使用Java 8中的Streams API从集合中检索n个唯一的随机元素以进行进一步处理,但是没有太多或任何运气.

更准确地说,我想要这样的东西:

Set<Integer> subList = new HashSet<>();
Queue<Integer> collection = new PriorityQueue<>();
collection.addAll(Arrays.asList(1,2,3,4,5,6,7,8,9));
Random random = new Random();
int n = 4;
while (subList.size() < n) {
  subList.add(collection.get(random.nextInt()));
}
sublist.forEach(v -> v.doSomethingFancy());

Run Code Online (Sandbox Code Playgroud)

我想尽可能高效地做到这一点.

可以这样做吗？

编辑:我的第二次尝试 - 虽然不是我的目标:

List<Integer> sublist = new ArrayList<>(collection);
Collections.shuffle(sublist);
sublist.stream().limit(n).forEach(v -> v.doSomethingFancy());

Run Code Online (Sandbox Code Playgroud)

编辑:第三次尝试(灵感来自Holger),如果coll.size()很大且n很小,这将消除大量的shuffle开销:

int n = // unique element count
List<Integer> sublist = new ArrayList<>(collection);   
Random r = new Random();
for(int i = 0; i < n; i++)
    Collections.swap(sublist, i, i + …

Run Code Online (Sandbox Code Playgroud)

java collections java-8

hab*_*ats

2017 05-23

12
推荐指数

1
解决办法

2547
查看次数

自动设置Dataproc Cluster后,Yarn/Spark的内存分配不正确

我正在尝试在Dataproc集群上运行Spark作业,但由于Yarn配置错误,Spark无法启动.

从shell(本地主服务器)运行"spark-shell"时,以及通过Web-GUI和本地计算机上的gcloud命令行实用程序上载作业时,我收到以下错误:

15/11/08 21:27:16 ERROR org.apache.spark.SparkContext: Error initializing     SparkContext.
java.lang.IllegalArgumentException: Required executor memory (38281+2679 MB) is above the max threshold (20480 MB) of this cluster! Please increase the value of 'yarn.s
cheduler.maximum-allocation-mb'.

Run Code Online (Sandbox Code Playgroud)

我尝试修改值,/etc/hadoop/conf/yarn-site.xml但它没有改变任何东西.我不认为它从该文件中提取配置.

我已尝试在多个站点(主要是欧洲)使用多个群集组合,而我只能使用低内存版本(4核,15 GB内存).

也就是说,这仅仅是配置为高于纱线默认允许的内存的节点上的问题.

hadoop google-cloud-platform google-cloud-dataproc

hab*_*ats

lucky-day

10
推荐指数

1
解决办法

8678
查看次数

如何使用 JMX 远程连接到 Dataproc 上的 Spark 工作线程

通过添加以下内容，我可以很好地连接到驱动程序：

spark.driver.extraJavaOptions=-Dcom.sun.management.jmxremote \
                              -Dcom.sun.management.jmxremote.port=9178 \
                              -Dcom.sun.management.jmxremote.authenticate=false \
                              -Dcom.sun.management.jmxremote.ssl=false

Run Code Online (Sandbox Code Playgroud)

但是做...

spark.executor.extraJavaOptions=-Dcom.sun.management.jmxremote \
                                -Dcom.sun.management.jmxremote.port=9178 \
                                -Dcom.sun.management.jmxremote.authenticate=false \
                                -Dcom.sun.management.jmxremote.ssl=false

Run Code Online (Sandbox Code Playgroud)

...只会在驱动程序上产生一堆错误......

Container id: container_1501548048292_0024_01_000003
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
    at org.apache.hadoop.util.Shell.run(Shell.java:869)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)


Container exited with a non-zero exit code 1

Run Code Online (Sandbox Code Playgroud)

...最终使工作崩溃。

工人没有错误，它只是退出：

[org.apache.spark.util.ShutdownHookManager] - Shutdown hook called

Run Code Online (Sandbox Code Playgroud)

Spark v2.2.0，集群是一个简单的 1m-2w 配置，我的作业在没有执行程序参数的情况下运行没有问题。

hadoop-yarn apache-spark google-cloud-dataproc

hab*_*ats

2017 08-01

7
推荐指数

1
解决办法

2970
查看次数

从文本中有效地提取WikiData实体

我有很多文本(数百万),从100到4000字不等.文本被格式化为书面作品,带有标点符号和语法.一切都用英文.

问题很简单:如何从给定文本中提取每个WikiData实体？

实体被定义为每个名词,正确或规则.即,人,组织,地点和椅子,土豆等的名称.

到目前为止,我已经尝试了以下内容:

使用OpenNLP对文本进行标记,并使用预先训练的模型来提取人员,位置,组织和常规名词.
在适用的情况下应用Porter Stemming.
将所有提取的名词与wmflabs-API匹配,以检索潜在的WikiData ID.

这有效,但我觉得我可以做得更好.一个明显的改进是在本地缓存相关的WikiData,我打算这样做.但是,在我这样做之前,我想检查是否有其他解决方案.

建议？

我标记了Scala的问题,因为我正在使用Spark来执行任务.

scala information-retrieval machine-learning wikidata wikidata-api

hab*_*ats

lucky-day

6
推荐指数

1
解决办法

578
查看次数

SimpleDateFormat不一致的解析错误

示例代码说明了一切:

private void parse() throws ParseException{
        SimpleDateFormat sdf = new SimpleDateFormat("MMM/dd/yyyy");

        Date started = sdf.parse("Sep/22/2004");
        // this triggers: java.text.ParseException: Unparseable date: "May/23/2010"
        Date ended = sdf.parse("May/23/2010");
}

Run Code Online (Sandbox Code Playgroud)

不知道我还能添加什么.我正在尝试解析"MMM/dd/yyyy"日期,并且我得到了不一致的异常行为.感觉我错过了一些明显的东西.

java simpledateformat

hab*_*ats

2013 06-20

5
推荐指数

1
解决办法

1498
查看次数

使用Streams API基于字段值对对象求和

我想知道是否可以使用新的Streams API在一行中执行以下操作:

List<MyItem> arr = new ArrayList<>();

// MyItem has a single field, which is a value
arr.add(new MyItem(3));
arr.add(new MyItem(5));

// the following operation is the one I want to do without iterating over the array
int sum = 0;
for(MyItem item : arr){
  sum += item.getValue();
}

Run Code Online (Sandbox Code Playgroud)

如果数组只包含ints,我可以做到这样的事情:

int sum = array.stream().mapToInt(Integer::intValue).sum();

Run Code Online (Sandbox Code Playgroud)

但是,我可以将相同的想法应用于任意对象列表吗？

java collections java-8

hab*_*ats

2014 11-25

0
推荐指数

1
解决办法

488
查看次数

标签统计

java ×4

collections ×2

google-cloud-dataproc ×2

java-8 ×2

apache-spark ×1

broker ×1

design-patterns ×1

google-cloud-platform ×1

hadoop ×1

hadoop-yarn ×1

information-retrieval ×1

machine-learning ×1

scala ×1

simpledateformat ×1

wikidata ×1

wikidata-api ×1

标签 统计

小编hab_ats的帖子

标签统计