这是一段看似非常特殊的C++代码.出于某种奇怪的原因,奇迹般地对数据进行排序使得代码几乎快了六倍.
#include <algorithm>
#include <ctime>
#include <iostream>
int main()
{
// Generate data
const unsigned arraySize = 32768;
int data[arraySize];
for (unsigned c = 0; c < arraySize; ++c)
data[c] = std::rand() % 256;
// !!! With this, the next loop runs faster.
std::sort(data, data + arraySize);
// Test
clock_t start = clock();
long long sum = 0;
for (unsigned i = 0; i < 100000; ++i)
{
// Primary loop
for (unsigned c = 0; c < arraySize; ++c) …Run Code Online (Sandbox Code Playgroud) 我正在运行一个循环嵌套字典的基本脚本,从每个记录中抓取数据,并将其附加到Pandas DataFrame.数据看起来像这样:
data = {"SomeCity": {"Date1": {record1, record2, record3, ...}, "Date2": {}, ...}, ...}
Run Code Online (Sandbox Code Playgroud)
总共有几百万条记录.脚本本身看起来像这样:
city = ["SomeCity"]
df = DataFrame({}, columns=['Date', 'HouseID', 'Price'])
for city in cities:
for dateRun in data[city]:
for record in data[city][dateRun]:
recSeries = Series([record['Timestamp'],
record['Id'],
record['Price']],
index = ['Date', 'HouseID', 'Price'])
FredDF = FredDF.append(recSeries, ignore_index=True)
Run Code Online (Sandbox Code Playgroud)
然而,这种情况非常缓慢.在我寻找一种并行化的方法之前,我只是想确保我没有遗漏一些明显会让它表现得更快的东西,因为我对Pandas来说还是一个新手.