hao*_*mao 9 python string algorithm
有一个像它的字符串
dxabcabcyyyydxycxcxz
Run Code Online (Sandbox Code Playgroud)
我想将它合并到
dxabcydxycxz
Run Code Online (Sandbox Code Playgroud)
其他例子: ddxddx - > dxdx,abbab - > abab.
规则是:
if (adjacent and same): merge
# Such as 'abc',they are same and , so I will delete one of them .
# Although 'dx' is same as 'dx',they are nonadjacent,so I do not delete any of them
# If one character has been deleted, we don't delete any sub-string include it
Run Code Online (Sandbox Code Playgroud)
我在python的代码中完成了它,但是当它在一个长字符串中时它很慢.
# original string
mystr = "dxabcabcyyyydxycxcxz"
str_len = len(mystr)
vis = [1] *str_len #Use a list to mark which char is deleted
# enumerate the size of sub-str
for i in range(1,str_len):
# enumerate the begin of the sub-str
for j in range(0, str_len):
offset = 2 #the size of sub-str + 1
current_sub_str = mystr[j:j+i]
s_begin = j+i*(offset-1)
s_end = j+(i*offset)
# delete all of the same char
while((j+(i*offset) <= str_len) and current_sub_str == mystr[s_begin:s_end]
and 0 not in vis[s_begin:s_end] and 0 not in vis[j:j+i]):
vis[s_begin:s_end] = [0] * (s_end - s_begin) #if I deleted it ,mark it as 0
offset += 1
s_begin = j + i * (offset - 1)
s_end = j + (i * offset)
res = []
for i in range(0,str_len):
if(vis[i]!=0): res.append(mystr[i])
print "".join(res)
Run Code Online (Sandbox Code Playgroud)
有没有更快的方法来解决它?
2017年4月29日更新
对不起,它似乎是一个XY问题.另一方面,它可能不是.有内容
我正在编写一个网络蜘蛛编码,并获得了许多像这样的标记路径
ul/li/a
ul/li/div/div/div/a/span
ul/li/div/div/div/a/span
ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
ul/li/ul/li/a
Run Code Online (Sandbox Code Playgroud)
正如你所看到的,有一些'tag-path'以同样的方式做了,所以我想折叠它们以发现是否有任何其他'tag-path具有相同的结构.折叠后,我得到这样的"标记路径".
ul/li/a
ul/li/div/div/div/a/span
ul/li/a
ul/li/ul/li/a
ul/li/a
ul/li/ul/li/a
ul/li/a
ul/li/ul/li/a
Run Code Online (Sandbox Code Playgroud)
这只是我的想法,我不知道这样做是否合适.(尝试之后,我选择了另一种方式来做到这一点.
然而,有一个有趣的问题,如ACM问题.
因此,我将一个"标记路径"简化为一个角色并寻求帮助.因为我自己并没有快速做到这一点.实际上,这个问题有许多我不介意的角落案例,感谢大家帮助我完成它.
谢谢大家.
jas*_*per 16
看看正则表达的力量:
>>> import re
>>> re.sub(r"(.+?)\1+", r"\1", "dxabcabcyyyydxycxcxz")
'dxabcydxycxz'
>>> re.sub(r"(.+?)\1+", r"\1", "ddxddx")
'dxdx'
>>> re.sub(r"(.+?)\1+", r"\1", "abbab")
'abab'
Run Code Online (Sandbox Code Playgroud)
这将查找一个包含1个或多个任意字符的序列(.+?)
(作为非贪婪匹配,以便首先尝试更短的序列),然后重复匹配序列的一次或多次\1+
,并将其全部替换为匹配的序列\1
.
这可以是一个开始:
for i in range(len(string)):
for j in range(i + 1, len(string)):
while string[i:j] == string[j:j + j - i]:
string = string[:j] + string[j + j - i:]
Run Code Online (Sandbox Code Playgroud)
所提供示例的结果:
abbab -> abab
ddxddx -> dxdx
abcabcabc -> abc
dxabcabcyyyydxycxcxz -> dxabcydxycxz
Run Code Online (Sandbox Code Playgroud)