这是一个示例代码:
public class TestIO{
public static void main(String[] str){
TestIO t = new TestIO();
t.fOne();
t.fTwo();
t.fOne();
t.fTwo();
}
public void fOne(){
long t1, t2;
t1 = System.nanoTime();
int i = 10;
int j = 10;
int k = j*i;
System.out.println(k);
t2 = System.nanoTime();
System.out.println("Time taken by 'fOne' ... " + (t2-t1));
}
public void fTwo(){
long t1, t2;
t1 = System.nanoTime();
int i = 10;
int j = 10;
int k = j*i;
System.out.println(k);
t2 = System.nanoTime();
System.out.println("Time taken …Run Code Online (Sandbox Code Playgroud) 我在linux上测量排序算法的cpu时间和挂起时间.我getrusage用来测量cpu时间并clock_gettime CLOCK_MONOTONIC获得一个待机时间.虽然我注意到cpu时间比壁时间大 - 这是正确的吗?我一直认为cpu时间必须小于壁时间.我的例子结果:
3.000187 seconds [CPU]
3.000001 seconds [WALL]
Run Code Online (Sandbox Code Playgroud) 有一些争论,在某些情况下,Fortran可能比C更快,例如,当涉及到别名时,我经常听说它比C更好地进行自动矢量化(这里有一些很好的讨论).
然而,对于简单的函数来计算Fibonaci数和Mandelbrot在一些复数时使用直接解决方案而没有任何技巧和额外的提示/关键字到编译器,我会期望他们真的执行相同.
C实施:
int fib(int n) {
return n < 2 ? n : fib(n-1) + fib(n-2);
}
int mandel(double complex z) {
int maxiter = 80;
double complex c = z;
for (int n=0; n<maxiter; ++n) {
if (cabs(z) > 2.0) {
return n;
}
z = z*z+c;
}
return maxiter;
}
Run Code Online (Sandbox Code Playgroud)
Fortran实现:
integer, parameter :: dp=kind(0.d0) ! double precision
integer recursive function fib(n) result(r)
integer, intent(in) :: n
if (n < 2) then
r = n …Run Code Online (Sandbox Code Playgroud) 我有:像这样的方法:
@GenerateMicroBenchmark
public static void calculateArraySummary(String[] args) {
// create a random data set
/* PROBLEM HERE:
* now I measure not only pool.invoke(finder) time,
* but also generateRandomArray method time
*/
final int[] array = generateRandomArray(1000000);
// submit the task to the pool
final ForkJoinPool pool = new ForkJoinPool(4);
final ArraySummator finder = new ArraySummator(array);
System.out.println(pool.invoke(finder));
}
private static int[] generateRandomArray(int length) {
final int[] array = new int[1000000];
final Random random = new Random();
for (int i …Run Code Online (Sandbox Code Playgroud) 我用JMH测试我的程序性能.并且无法配置堆大小.我想知道为什么它不起作用.
问题:
jvmArgs方法的情况下吸收想法堆大小设置?错误:
# Run progress: 0.00% complete, ETA 00:04:30
# VM invoker: /usr/lib/jvm/java-8-oracle/jre/bin/java
# VM options: -Xms2048m -Xmx2048m -XX:MaxDirectMemorySize=512M
# Fork: 1 of 1
Invalid initial heap size: -Xms2048m -Xmx2048m -XX:MaxDirectMemorySize=512M
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
<forked VM failed with exit code 1>
Run Code Online (Sandbox Code Playgroud)
主要方法:
public static void main(String... args) throws RunnerException, IOException {
Options opt = new OptionsBuilder()
.include(".*" + ArraySummatorBenchmarking.class.getSimpleName() + ".*") …Run Code Online (Sandbox Code Playgroud) 我有这个想法,我会使用条件运算符将我的一些if块转换为单行.但是,我想知道是否存在速度差异.我运行了以下测试:
static long startTime;
static long elapsedTime;
static String s;
public static void main(String[] args) {
startTime = System.nanoTime();
s = "";
for (int i= 0; i < 1000000000; i++) {
if (s.equals("")) {
s = "";
}
}
elapsedTime = System.nanoTime() - startTime;
System.out.println("Type 1 took this long: " + elapsedTime + " ns");
startTime = System.nanoTime();
s = "";
for (int i= 0; i < 1000000000; i++) {
s = (s.equals("") ? "" : s);
}
elapsedTime = …Run Code Online (Sandbox Code Playgroud) java benchmarking if-statement conditional-operator microbenchmark
在尝试JDK 8 Streaming功能时,我决定尝试并行/串行流性能测试.我尝试使用在单位正方形上投掷随机飞镖来解决pi的值,并检查单位圆内有多少次着陆.我找到了apache-spark的例子.
这是代码.
package org.sample;
import java.util.concurrent.TimeUnit;
import java.util.stream.IntStream;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@State(Scope.Benchmark)
public class MyBenchmark {
@Param({
"1000000",
"10000000"
}) int MAX_COUNT;
@Benchmark
public double parallelPiTest() {
long count = IntStream.range(1, MAX_COUNT).parallel().filter(i -> {
double x= Math.random();
double y= Math.random(); …Run Code Online (Sandbox Code Playgroud) 我有几个关于微基准测试和自动绘图的问题
假设这是我的代码:
library("ggplot2")
tm <- microbenchmark(rchisq(100, 0),rchisq(100, 1),rchisq(100, 2),rchisq(100, 3),rchisq(100, 5), times=1000L)
autoplot(tm)
Run Code Online (Sandbox Code Playgroud)
谢谢!
如何使用函数中的列表参数microbenchmark。我想用不同的输入对相同的功能进行微基准测试
microbenchmark(j1 = {sample(1e5)},
j2 = {sample(2e5)},
j3 = {sample(3e5)})
Run Code Online (Sandbox Code Playgroud)
以下内容将不再适用,因为列表将仅包含向量而不是未求值的表达式。
microbenchmark(list = list(j1 = {sample(1e5)},
j2 = {sample(2e5)},
j3 = {sample(3e5)))
Run Code Online (Sandbox Code Playgroud)
我也想使用eg生成列表lapply。
为什么我的SIMD vector4长度函数比单纯的向量长度方法慢3倍?
SIMD vector4长度函数:
__extern_always_inline float vec4_len(const float *v) {
__m128 vec1 = _mm_load_ps(v);
__m128 xmm1 = _mm_mul_ps(vec1, vec1);
__m128 xmm2 = _mm_hadd_ps(xmm1, xmm1);
__m128 xmm3 = _mm_hadd_ps(xmm2, xmm2);
return sqrtf(_mm_cvtss_f32(xmm3));
}
Run Code Online (Sandbox Code Playgroud)
天真的实现:
sqrtf(V[0] * V[0] + V[1] * V[1] + V[2] * V[2] + V[3] * V[3])
Run Code Online (Sandbox Code Playgroud)
SIMD版本花费了16110ms来迭代10亿次。天真的版本快了约3倍,只花了4746ms。
#include <math.h>
#include <time.h>
#include <stdint.h>
#include <stdio.h>
#include <x86intrin.h>
static float vec4_len(const float *v) {
__m128 vec1 = _mm_load_ps(v);
__m128 xmm1 = _mm_mul_ps(vec1, vec1);
__m128 xmm2 = _mm_hadd_ps(xmm1, xmm1);
__m128 …Run Code Online (Sandbox Code Playgroud) microbenchmark ×10
java ×5
benchmarking ×4
c ×3
jmh ×2
r ×2
fortran ×1
if-statement ×1
java-8 ×1
java-stream ×1
julia ×1
jvm ×1
simd ×1
sse ×1