标签: benchmarking

每字节负周期？rdtsc

我写了一些代码来测量每个字节的 CPU 周期。我变得消极，cpb但不知道为什么......它告诉我cpb = -0.855553 cycles/byte

我的伪代码：

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

uint64_t rdtsc(){
    unsigned int lo,hi;
    __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
    return ((uint64_t)hi << 32) | lo;
}

int main()
{
    long double inputsSize = 1024;
    long double counter = 1;

    long double cpuCycleStart = rdtsc();

        while(counter < 3s)
            function(args);

    long double cpuCycleEnd = rdtsc();

        long double cpb = ((cpuCycleEnd - cpuCycleStart) / (counter *  inputsSize));

    printf("%Lf cycles/byte\n", cpb);

    return …

Run Code Online (Sandbox Code Playgroud)

c performance benchmarking cpu-usage

nul*_*ter

2013 07-31

0
推荐指数

1
解决办法

678
查看次数

fio iops 日志文件是如何解释的？

我使用 fio 进行存储基准测试，使用 fio2gnuplot 绘制图形，每次我运行测试并查看 iops 的日志文件时，第二列始终为 1，即 iops 值，并且由于此图形只是一条垂直于 Y 轴的直线。这没有任何意义。我尝试了各种 iodepths、ioengines 但没有用。我使用任何参数（选项）错误吗？

以下是我的工作档案。

[global]

enter code here

rw=randwrite
size=128m
thread=1
iodepth=2
ioengine=libaio
per_job_logs=0
directory=/home/fio



[job_512]
write_bw_log=logfiles_libaio/fio-test_512
write_iops_log=logfiles_libaio/fio-test_512
write_lat_log=logfiles_libaio/fio-test_512
bs=512b

Run Code Online (Sandbox Code Playgroud)

这是日志文件

1, 1, 0, 512
2, 1, 1, 512
18, 1, 1, 512
19, 1, 0, 512
31, 1, 1, 512
53, 1, 1, 512
55, 1, 1, 512
56, 1, 0, 512
59, 1, 1, 512
63, 1, 1, 512

Run Code Online (Sandbox Code Playgroud)

linux io benchmarking storage

作者

lucky-day

0
推荐指数

1
解决办法

3488
查看次数

去基准测试不只测量循环？

在我的 Go 基准测试中，我有一些设置测试数据的初始化代码，然后我有如下所示的基准测试循环。似乎输出测量整个函数的运行时间，而不仅仅是循环中的内容。那对我来说不是有用的信息。有没有办法强制测量仅循环内容的运行时间，因为那是我所关心的？这不应该很明显吗？

func BenchmarkXXX(b *testing.B){
// Some test data init code..

for i:=0; i < b.N; i++ {
      //benchmarking code..
}

Run Code Online (Sandbox Code Playgroud)

benchmarking go

pra*_*pan

lucky-day

0
推荐指数

1
解决办法

424
查看次数

解读tensorflow benchmark工具的结果

Tensorflow 有几个基准测试工具：

对于.pb 模型和.tflite 模型

我对 .pb 基准工具的参数有几个问题：

被num_threads相关的单线程实验或通过使用tensorflow内螺纹平行运行次数？
为桌面构建工具时是否可以使用 GPU，即不适用于移动设备？如果是这样，如何确保不使用GPU？

还有一些关于结果解释的问题：

什么是count在结果输出？如何Timings (microseconds): count=相关--max_num_runs的参数？

例子：

Run --num_threads=-1 --max_num_runs=1000:
    2019-03-20 14:30:33.253584: I tensorflow/core/util/stat_summarizer.cc:85] Timings (microseconds): count=1000 first=3608 curr=3873 min=3566 max=8009 avg=3766.49 std=202
    2019-03-20 14:30:33.253584: I tensorflow/core/util/stat_summarizer.cc:85] Memory (bytes): count=1000 curr=3301344(all same)
    2019-03-20 14:30:33.253591: I tensorflow/core/util/stat_summarizer.cc:85] 207 nodes observed
    2019-03-20 14:30:33.253597: I tensorflow/core/util/stat_summarizer.cc:85]
    2019-03-20 14:30:33.378352: I tensorflow/tools/benchmark/benchmark_model.cc:636] FLOPs estimate: 116.65M
    2019-03-20 14:30:33.378390: I tensorflow/tools/benchmark/benchmark_model.cc:638] FLOPs/second: 46.30B

Run --num_threads=1 --max_num_runs=1000:
    2019-03-20 …

Run Code Online (Sandbox Code Playgroud)

benchmarking tensorflow tensorflow-lite

mrg*_*oom

lucky-day

0
推荐指数

1
解决办法

1023
查看次数

C++ 获取运行时间和内存使用情况

我一直在 LeetCode 上练习 C++ 编程，每当我提交解决方案时，它都会告诉我我的程序运行了多长时间以及使用了多少内存。

我正在使用 mac 和带有 g++ 的 VSCode 在本地编译我的程序。我想找到一种工具或方法，可以用来获取有关程序的运行时间和内存使用情况的相同信息，以便我可以尝试调整它以查看对性能的影响。

是否有编译器选项或诸如命令行工具或 VSCode 扩展之类的东西可以运行我的程序，或者我是否必须向程序添加代码来跟踪时间和内存本身？

c++ performance benchmarking g++ performance-testing

Dev*_*man

lucky-day

0
推荐指数

1
解决办法

2293
查看次数

为什么 strconv.ParseUint 比 strconv.Atoi 慢？

我正在对从stringtoint和uint使用以下代码的解组进行基准测试：

package main

import (
    "strconv"
    "testing"
)

func BenchmarkUnmarshalInt(b *testing.B) {
    for i := 0; i < b.N; i++ {
        UnmarshalInt("123456")
    }
}

func BenchmarkUnmarshalUint(b *testing.B) {
    for i := 0; i < b.N; i++ {
        UnmarshalUint("123456")
    }
}

func UnmarshalInt(v string) int {
    i, _ := strconv.Atoi(v)
    return i
}

func UnmarshalUint(v string) uint {
    i, _ := strconv.ParseUint(v, 10, 64)
    return uint(i)
}

Run Code Online (Sandbox Code Playgroud)

结果：

Running tool: C:\Go\bin\go.exe test -benchmem -run=^$ myBench/main …

Run Code Online (Sandbox Code Playgroud)

performance benchmarking type-conversion go microbenchmark

Fre*_*ors

2020 12-04

0
推荐指数

1
解决办法

224
查看次数

如何在 Go 中对某些事情进行计时，这需要不到一纳秒？

如果我想比较两个函数的时间，但这些函数花费的时间不到一纳秒，我该如何进行？

t := time.Now()
_ = fmt.Sprint("Hello, World!")
d := time.Since(t)
d.Round(0)
fmt.Println(d.Nanoseconds()) // Prints 0

Run Code Online (Sandbox Code Playgroud)

我可以运行该函数几次，并将时间除以执行次数，但我更想要一种对单次执行进行计时的方法。有没有办法做到这一点？

benchmarking duration go

Mar*_*ndt

lucky-day

0
推荐指数

1
解决办法

766
查看次数

快速 C++ 符号函数

在我的代码中，我在循环中多次对 double 进行符号检查，并且该循环通常在执行期间运行数百万次。

我的符号检查是一个非常基本的计算，fabs()所以我认为必须有其他方法可以更快，因为“分割很慢”。我遇到了一个模板函数，copysign()并创建了一个简单的程序来运行速度比较。我已经用下面的代码测试了三种可能的解决方案。

// C++ program to find out execution time of  of functions 
#include <chrono> 
#include <iostream> 
#include <math.h>

using namespace std; 
using namespace std::chrono; 

template<typename Clock>

void printResult(const std::string name, std::chrono::time_point<Clock> start, std::chrono::time_point<Clock> stop, const int iterations)
{
    // Get duration. 
    std::chrono::duration my_duration = duration_cast<nanoseconds>(stop - start); 
    my_duration /= iterations;

    cout << "Time taken by "<< name <<" function: " << my_duration.count() << " ns avg. for " << iterations << " iterations." …

Run Code Online (Sandbox Code Playgroud)

c++ benchmarking timing

jpm*_*orr

2021 06-04

0
推荐指数

1
解决办法

106
查看次数

JMH - 较低级别的基准测试

我正在使用 JMH 对 JUnit 测试进行基准测试。

我的基准：

import org.openjdk.jmh.annotations.*;

public class Benchmarks {
        @Benchmark
        public void bmUnitTest1() {
                UnitTests.UnitTest1();
        }

        @Benchmark
        public void bmUnitTest2() {
                UnitTests.UnitTest2();
        }
}

Run Code Online (Sandbox Code Playgroud)

我的基准跑步者：

import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.results.format.ResultFormatType;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.TimeValue;

import java.util.concurrent.TimeUnit;

public class BenchmarkRunner {
    public static void main(String[] args) throws Exception {
        Options opt = new OptionsBuilder()
                .include(Benchmarks.class.getSimpleName())
                .mode(Mode.SingleShotTime)
                .resultFormat(ResultFormatType.CSV)
                .result("target/test-classes/benchmarkcsv/BM " + System.currentTimeMillis() + ".csv")
                .timeUnit(TimeUnit.MILLISECONDS)
                .warmupIterations(3)
                .warmupTime(TimeValue.seconds(1))
                .measurementIterations(3)
                .measurementTime(TimeValue.seconds(1))
                .timeout(TimeValue.seconds(5))
                .forks(1)
                .warmupForks(1)
                .threads(1)
                .build();

        new Runner(opt).run(); …

Run Code Online (Sandbox Code Playgroud)

java performance benchmarking jmh

Dim*_*rie

2022 04-22

0
推荐指数

1
解决办法

602
查看次数

C 和 C++ 之间运行时多态性的性能差异

我知道基准测试是一个非常微妙的主题，简单的、未经深思熟虑的基准测试对于性能比较来说大多毫无意义，但我现在所拥有的实际上是一个非常小且人为的示例，我认为应该很容易解释。因此，即使这个问题看起来没有帮助，它至少会帮助我理解基准测试。

那么，我开始了。

我试图用 C 语言尝试简单的 API 设计，通过void *. 然后我将它与使用常规虚函数在 C++ 中实现的相同内容进行了比较。这是代码：

#include <cstdlib>
#include <cstdio>
#include <cstring>

int dummy_computation()
{
    return 64 / 8;
}

/* animal library, everything is prefixed with al for namespacing */
#define AL_SUCCESS 0;
#define AL_UNKNOWN_ANIMAL 1;
#define AL_IS_TYPE_OF(animal, type) \
    strcmp(((type *)animal)->animal_type, #type) == 0\

typedef struct {
    const char* animal_type;
    const char* name;
    const char* sound;
} al_dog;

inline int make_dog(al_dog** d) {
    *d = (al_dog*) malloc(sizeof(al_dog));
    (*d)->animal_type = "al_dog";
    (*d)->name …

Run Code Online (Sandbox Code Playgroud)

c c++ benchmarking run-time-polymorphism

meg*_*uli

2022 09-07

0
推荐指数

1
解决办法

256
查看次数