ram*_*owl 10 c++ optimization go
在各种情况下,我观察到链表迭代在C++中始终比在Go中慢10-15%.我在Stack Overflow上解决这个谜团的第一次尝试就在这里.我编码的例子有问题,因为:
1)由于堆分配,内存访问是不可预测的,并且
2)因为没有实际工作,一些人的编译器正在优化主循环.
为解决这些问题,我有一个新程序,包含C++和Go中的实现.C++版本需要1.75秒,而Go版本需要1.48秒.这次,我在计时开始之前做了一个大的堆分配,并用它来操作一个对象池,我从该对象池中释放并获取链表的节点.这样,内存访问应该在两个实现之间完全类似.
希望这使得神秘更具可重复性!
C++:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <boost/timer.hpp>
using namespace std;
struct Node {
Node *next; // 8 bytes
int age; // 4 bytes
};
// Object pool, where every free slot points to the previous free slot
template<typename T, int n>
struct ObjPool
{
typedef T* pointer;
typedef pointer* metapointer;
ObjPool() :
_top(NULL),
_size(0)
{
pointer chunks = new T[n];
for (int i=0; i < n; i++) {
release(&chunks[i]);
}
}
// Giver an available pointer to the object pool
void release(pointer ptr)
{
// Store the current pointer at the given address
*(reinterpret_cast<metapointer>(ptr)) = _top;
// Advance the pointer
_top = ptr;
// Increment the size
++_size;
}
// Pop an available pointer off the object pool for program use
pointer acquire(void)
{
if(_size == 0){throw std::out_of_range("");}
// Pop the top of the stack
pointer retval = _top;
// Step back to the previous address
_top = *(reinterpret_cast<metapointer>(_top));
// Decrement the size
--_size;
// Return the next free address
return retval;
}
unsigned int size(void) const {return _size;}
protected:
pointer _top;
// Number of free slots available
unsigned int _size;
};
Node *nodes = nullptr;
ObjPool<Node, 1000> p;
void processAge(int age) {
// If the object pool is full, pop off the head of the linked list and release
// it from the pool
if (p.size() == 0) {
Node *head = nodes;
nodes = nodes->next;
p.release(head);
}
// Insert the new Node with given age in global linked list. The linked list is sorted by age, so this requires iterating through the nodes.
Node *node = nodes;
Node *prev = nullptr;
while (true) {
if (node == nullptr || age < node->age) {
Node *newNode = p.acquire();
newNode->age = age;
newNode->next = node;
if (prev == nullptr) {
nodes = newNode;
} else {
prev->next = newNode;
}
return;
}
prev = node;
node = node->next;
}
}
int main() {
Node x = {};
std::cout << "Size of struct: " << sizeof(x) << "\n"; // 16 bytes
boost::timer t;
for (int i=0; i<1000000; i++) {
processAge(i);
}
std::cout << t.elapsed() << "\n";
}
Run Code Online (Sandbox Code Playgroud)
走:
package main
import (
"time"
"fmt"
"unsafe"
)
type Node struct {
next *Node // 8 bytes
age int32 // 4 bytes
}
// Every free slot points to the previous free slot
type NodePool struct {
top *Node
size int
}
func NewPool(n int) NodePool {
p := NodePool{nil, 0}
slots := make([]Node, n, n)
for i := 0; i < n; i++ {
p.Release(&slots[i])
}
return p
}
func (p *NodePool) Release(l *Node) {
// Store the current top at the given address
*((**Node)(unsafe.Pointer(l))) = p.top
p.top = l
p.size++
}
func (p *NodePool) Acquire() *Node {
if p.size == 0 {
fmt.Printf("Attempting to pop from empty pool!\n")
}
retval := p.top
// Step back to the previous address in stack of addresses
p.top = *((**Node)(unsafe.Pointer(p.top)))
p.size--
return retval
}
func processAge(age int32) {
// If the object pool is full, pop off the head of the linked list and release
// it from the pool
if p.size == 0 {
head := nodes
nodes = nodes.next
p.Release(head)
}
// Insert the new Node with given age in global linked list. The linked list is sorted by age, so this requires iterating through the nodes.
node := nodes
var prev *Node = nil
for true {
if node == nil || age < node.age {
newNode := p.Acquire()
newNode.age = age
newNode.next = node
if prev == nil {
nodes = newNode
} else {
prev.next = newNode
}
return
}
prev = node
node = node.next
}
}
// Linked list of nodes, in ascending order by age
var nodes *Node = nil
var p NodePool = NewPool(1000)
func main() {
x := Node{};
fmt.Printf("Size of struct: %d\n", unsafe.Sizeof(x)) // 16 bytes
start := time.Now()
for i := 0; i < 1000000; i++ {
processAge(int32(i))
}
fmt.Printf("Time elapsed: %s\n", time.Since(start))
}
Run Code Online (Sandbox Code Playgroud)
输出:
clang++ -std=c++11 -stdlib=libc++ minimalPool.cpp -O3; ./a.out
Size of struct: 16
1.7548
go run minimalPool.go
Size of struct: 16
Time elapsed: 1.487930629s
Run Code Online (Sandbox Code Playgroud)
aba*_*ert 13
您的两个程序之间的最大区别在于,您的Go代码会忽略错误(如果您很幸运,如果清空池,则会出现恐慌或段错误),而C++代码会通过异常传播错误.相比:
if p.size == 0 {
fmt.Printf("Attempting to pop from empty pool!\n")
}
Run Code Online (Sandbox Code Playgroud)
与
if(_size == 0){throw std::out_of_range("");}
Run Code Online (Sandbox Code Playgroud)
至少有三种方法1,使比较公平:
panic/ abort出错.那么,让我们做所有这些并比较结果3:
所以:
为什么?在您的测试运行中实际上不会发生此异常,因此实际的错误处理代码永远不会以任何一种语言运行.但clang无法证明它不会发生.并且,由于您从未catch在任何地方发生异常,这意味着它必须为堆栈中的每个非删除帧发出异常处理程序和堆栈unwinders.所以,它做更多的工作,每个函数调用和返回,没有多少更多的工作,但那么你的函数是做了不必要的额外工作增加了那么一点实际的工作.
1.您还可以更改C++版本以执行C样式的错误处理,或使用Option类型,以及可能的其他可能性.
2.这当然需要更多的变化:您需要导入errors,改变的返回类型Acquire来(*Node, error),改变的返回类型processAge来error,改变所有的return报表,并添加至少两个if err != nil { … }检查.但这对Go来说应该是一件好事,对吗?
3.当我参与其中时,我用你的遗产取代boost::timer了boost::auto_cpu_timer,所以我们现在看到挂钟时间(和Go一样)以及CPU时间.
我不会试图解释原因,因为我不明白.从快速浏览一下装配,它显然优化了一些检查,但我不明白为什么它不能优化那些相同的检查没有panic.