是否可以使用PHP函数的正则表达式array_key_exists()
?
例如:
$exp = "my regex";
array_key_exists($exp, $array);
Run Code Online (Sandbox Code Playgroud)
谢谢!
我是scala/java的新手,我在解决这两个问题上遇到了麻烦.
通过阅读scala文档,我了解到ArrayBuffer是交互式的(追加,插入,前置等).
1)基本的实施差异是什么?
2)这两者之间是否存在性能差异?
谢谢.
我正在使用spark 1.4.0/hadoop 2.6.0(仅适用于hdfs),当运行Scala SparkPageRank示例(examples/src/main/scala/org/apache/spark/examples/SparkPageRank.scala)时,我遇到了以下错误:
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.elapsedMillis()J
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:245)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.RDD$$anonfun$distinct$2.apply(RDD.scala:329)
at org.apache.spark.rdd.RDD$$anonfun$distinct$2.apply(RDD.scala:329)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at org.apache.spark.rdd.RDD.distinct(RDD.scala:328)
at org.apache.spark.examples.SparkPageRank$.main(SparkPageRank.scala:60)
at org.apache.spark.examples.SparkPageRank.main(SparkPageRank.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:621)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at …
Run Code Online (Sandbox Code Playgroud) 注意:我在 YARN 上使用 Spark
我一直在尝试在 Spark 中实现的度量系统。我启用了 ConsoleSink 和 CsvSink,并为所有四个实例(驱动程序、主机、执行程序、工作程序)启用了 JvmSource。但是,我只有驱动程序输出,控制台和 csv 目标目录中没有工作程序/执行程序/主数据。
阅读完这个问题后,我想知道在提交工作时是否必须向执行者发送一些东西。
我的提交命令:
./bin/spark-submit --class org.apache.spark.examples.SparkPi lib/spark-examples-1.5.0-hadoop2.6.0.jar 10
波纹管是我的metric.properties文件:
# Enable JmxSink for all instances by class name
*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
# Enable ConsoleSink for all instances by class name
*.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink
# Polling period for ConsoleSink
*.sink.console.period=10
*.sink.console.unit=seconds
#######################################
# worker instance overlap polling period
worker.sink.console.period=5
worker.sink.console.unit=seconds
#######################################
# Master instance overlap polling period
master.sink.console.period=15
master.sink.console.unit=seconds
# Enable CsvSink for all instances
*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink
#driver.sink.csv.class=org.apache.spark.metrics.sink.CsvSink …
Run Code Online (Sandbox Code Playgroud) 我有一个代表一个人的案例类.
case class Person(firstName: String, lastName: String)
Run Code Online (Sandbox Code Playgroud)
我需要以不区分大小写的方式根据名字和姓氏进行人物比较,例如:
Person("John", "Doe") == Person("john", "Doe") // should return true
Run Code Online (Sandbox Code Playgroud)
或者在Seq
Seq(Person("John", "Doe")).contains(Person("john", "Doe")
Run Code Online (Sandbox Code Playgroud)
最简单的方法是覆盖Person case类中的equals和hashCode方法,但是如果在类的情况下不能覆盖equals和hashCode,那么以干净的方式执行此操作的最佳方法是什么.
有人可以推荐一种解决这种区分大小写问题的惯用方法吗?
谢谢,Suriyanto
编辑:我发现这是什么是斯卡拉的收益?(特别是第二个,最受欢迎的答案)在接受的答案解决了我的问题后非常有启发性.
==
我有一个HashMap,我想迭代它,并为每个键,使用for循环来创建新对象.
我正在尝试获取这些新对象的列表,但我总是给出一个空的"单元"序列.我想更好地理解我的代码的行为.
case class MyObject(one: String, two: String, three: Int)
val hm = new HashMap[String,Int]
hm += ("key" -> 3)
hm += ("key2" -> 4)
val newList = hm.map { case (key,value) =>
for (i <- 0 until value) {
new MyObject(key, "a string", i)
}}.toSeq
Run Code Online (Sandbox Code Playgroud)
结果:
newList:Seq[Unit] = ArrayBuffer((), ())
Run Code Online (Sandbox Code Playgroud)
如果我不在.map()中使用任何for循环,我有我期望的结构类型:
val newList = hm.map { case (key,value) =>
new MyObject(key, "a string", value)}.toSeq
Run Code Online (Sandbox Code Playgroud)
结果是:
newList:Seq[MyObject] = ArrayBuffer(MyObject(key,host,3), MyObject(key2,host,4))
Run Code Online (Sandbox Code Playgroud) 我下载并构建了 dpdk-stable-16.11.4 版本(使用 x86_64-native-linuxapp-gcc 目标)。我正在运行 Ubuntu 16.04.3 LTS。根据http://dpdk.org/doc/quick-start或http://dpdk.org/doc/guides-16.04/linux_gsg/sys_reqs.html设置大页面后
mkdir -p /mnt/huge
mount -t hugetlbfs nodev /mnt/huge
echo 64 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
Run Code Online (Sandbox Code Playgroud)
我可以很好地看到大页面。
cat /proc/meminfo | grep Huge
HugeAnonHugePages: 284672 kB
ShmemHugePages: 0 kB
HugePages_Total: 64
HugePages_Free: 64
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Run Code Online (Sandbox Code Playgroud)
但是当我运行 helloWorld 示例时,它抱怨没有免费的大页面,请参见下文。
./build/helloworld -l 0-3 -n 2
EAL: Detected 4 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
hello from core 1
hello from core 2
hello from core …
Run Code Online (Sandbox Code Playgroud) 我对Spark的分隔数据排序似乎有类似的问题,但是接受的解决方案并不能解决我的问题.
我正在尝试在一个简单的RDD上应用combineByKey:
package foo
import org.apache.spark._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext._
object HelloTest {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("Test")
val sc = new SparkContext(sparkConf)
val input = sc.textFile("/path/to/test.txt")
val result = input.combineByKey(
(v) => (v, 1),
(acc: (Int, Int), v) => (acc._1 + v, acc._2 + 1),
(acc1: (Int, Int), acc2: (Int, Int)) => (acc1._1 + acc2._1, acc1._2 + acc2._2)
).map{ case (key, value) => (key, value._1 / value._2.toFloat) }
result.collectAsMap().map(println(_))
sc.stop()
}
}
Run Code Online (Sandbox Code Playgroud)
编译时我得到(唯一的)跟随错误: …
scala ×5
apache-spark ×3
arrays ×1
dpdk ×1
guava ×1
hadoop-yarn ×1
huge-pages ×1
java ×1
linux ×1
metrics ×1
monitoring ×1
php ×1
rdd ×1
regex ×1