我有一列有1000万个字符串。字符串中的字符需要以某种方式重新排列。
原始字串: AAA01188P001
乱序字符串: 188A1A0AP001
现在,我正在运行一个for循环,该循环将接收每个字符串并重新定位每个字母,但这需要几个小时才能完成。有没有更快的方法来达到这个结果?
这是for循环。
for i in range(0, len(OrderProduct)):
s = list(OrderProduct['OrderProductId'][i])
a = s[1]
s[1] = s[7]
s[7] = a
a = s[3]
s[3] = s[6]
s[6] = a
a = s[2]
s[2] = s[3]
s[3] = a
a = s[5]
s[5] = s[0]
s[0] = a
OrderProduct['OrderProductId'][i] = ''.join(s)
Run Code Online (Sandbox Code Playgroud)
我使用不同的方法进行了一些性能测试:
这是我获得1000000次随机播放的结果:
188A1AA0P001 usefString 0.518183742
188A1AA0P001 useMap 1.415851829
188A1AA0P001 useConcat 0.5654986979999999
188A1AA0P001 useFormat 0.800639699
188A1AA0P001 useJoin 0.5488918539999998
Run Code Online (Sandbox Code Playgroud)
基于此,带有硬编码子字符串的格式字符串似乎是最快的。
这是我用来测试的代码:
def usefString(s): return f"{s[5:8]}{s[0]}{s[4]}{s[1:4]}{s[8:]}"
posMap = [5,6,7,0,4,1,2,3,8,9,10,11]
def useMap(s): return "".join(map(lambda i:s[i], posMap))
def useConcat(s): return s[5:8]+s[0]+s[4]+s[1:4]+s[8:]
def useFormat(s): return '{}{}{}{}{}'.format(s[5:8],s[0],s[4],s[1:4],s[8:])
def useJoin(s): return "".join([s[5:8],s[0],s[4],s[1:4],s[8:]])
from timeit import timeit
count = 1000000
s = "AAA01188P001"
t = timeit(lambda:usefString(s),number=count)
print(usefString(s),"usefString",t)
t = timeit(lambda:useMap(s),number=count)
print(useMap(s),"useMap",t)
t = timeit(lambda:useConcat(s),number=count)
print(useConcat(s),"useConcat",t)
t = timeit(lambda:useFormat(s),number=count)
print(useFormat(s),"useFormat",t)
t = timeit(lambda:useJoin(s),number=count)
print(useJoin(s),"useJoin",t)
Run Code Online (Sandbox Code Playgroud)
表演:(由@jezrael添加)
N = 1000000
OrderProduct = pd.DataFrame({'OrderProductId':['AAA01188P001'] * N})
In [331]: %timeit [f'{s[5:8]}{s[0]}{s[4]}{s[1:4]}{s[8:]}' for s in OrderProduct['OrderProductId']]
527 ms ± 16.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [332]: %timeit [s[5:8]+s[0]+s[4]+s[1:4]+s[8:] for s in OrderProduct['OrderProductId']]
610 ms ± 18.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [333]: %timeit ['{}{}{}{}{}'.format(s[5:8],s[0],s[4],s[1:4],s[8:]) for s in OrderProduct['OrderProductId']]
954 ms ± 76.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [334]: %timeit ["".join([s[5:8],s[0],s[4],s[1:4],s[8:]]) for s in OrderProduct['OrderProductId']]
594 ms ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
106 次 |
| 最近记录: |