当从不同大小的分布中随机抽样时,我惊讶地发现执行时间似乎主要是根据从中采样的数据集的大小而不是被采样的值的数量来缩放.例:
import pandas as pd
import numpy as np
import time as tm
#generate a small and a large dataset
testSeriesSmall = pd.Series(np.random.randn(10000))
testSeriesLarge = pd.Series(np.random.randn(10000000))
sampleSize = 10
tStart = tm.time()
currSample = testSeriesLarge.sample(n=sampleSize).values
print('sample %d from %d values: %.5f s' % (sampleSize, len(testSeriesLarge), (tm.time() - tStart)))
tStart = tm.time()
currSample = testSeriesSmall.sample(n=sampleSize).values
print('sample %d from %d values: %.5f s' % (sampleSize, len(testSeriesSmall), (tm.time() - tStart)))
sampleSize = 1000
tStart = tm.time()
currSample = testSeriesLarge.sample(n=sampleSize).values
print('sample %d from %d …Run Code Online (Sandbox Code Playgroud)