use*_*394 3 python string parsing split
Note: I'm a chemist, and have very little experience coding
I would like to split every string in a list by a particular delimiter, then add each parsed string to a new list. It would be very helpful for me to keep track of where the splits occurred by adding a character to the end of one string and the beginning of the next string for every parse. For example,
TestString = 'AAABA'
parsed = TestString.split('B')
print(parsed)
Run Code Online (Sandbox Code Playgroud)
Will output:
['AAA','A']
Run Code Online (Sandbox Code Playgroud)
I would like the output to be:
['AAAx', 'xA']
Run Code Online (Sandbox Code Playgroud)
I'm looking for a solution that will likewise work for strings that contain only my delimiter. My end goal will involve parsing a large list of strings (1-10 million strings per list with a string length varying from 1 to 1000). Another example:
TestList = ['A', 'B', 'AB', 'BA', 'BBB','ABA', 'AAA']
Parsed = []
for i in range(len(TestList)):
parsed = TestList[i].split('B')
Parsed.extend(parsed)
print(Parsed)
Run Code Online (Sandbox Code Playgroud)
Will output:
['A', '', '', 'A', '', '', 'A', '', '', '', '', 'A', 'A', 'AAA']
Run Code Online (Sandbox Code Playgroud)
I would like the output to look like:
['A','x', 'x', 'Ax', 'x', 'x', 'xA', 'x', 'xx', 'xx', 'x', 'Ax', 'xA', 'AAA']
Run Code Online (Sandbox Code Playgroud)
'B' is split into ['x' and 'x']; 'BB' is split into ['x', 'xx', 'x'], etc.
Are there any simple ways to accomplish this? I've looked into regex a little bit was having trouble applying (or compiling) my patterns to the variable string lengths in my dataset. I've searched around on stackoverflow and did some online searches, but couldn't find anything supremely useful. I know I can add characters to the beginning and ends of strings, but any solution should never add a character to an unparsed string. I've considered trying to add a bunch of 'if' conditional statements to try and cover all my possibilities, but this seems like it would be a pain and would prefer a simpler soluton if its out there.
EDIT: In response to the comments, I'm really trying to take my strings and "cut the 'B' units in half" and add a character that represents half of a 'B' unit.
In the image below, the black lines represent a string of 'A's. Left of the arrow: a single string with only the 'B' units explicitly drawn. Right of the arrow: after parsing on 'B', 3 strings result. Each 'B' unit is cut in half and replaced by an 'x'.

Thinking about the example 'BBB' stepwise, the first parse will result in ['x', 'xBB']. The second step results in ['x', 'xx', 'xB']. The final step results in ['x', 'xx', 'xx', 'x']
解析后,将没有剩余的“B”单元。这是一个与化学有关的问题,而 'xx' 和 'B' 实际上是不同的实体(即使 'x' 是“一个 'B' 单位的一半”。同样值得注意的是,解析'AAABAAA'为['AAAx', 'xAAA']很重要,因为两者都不是这些字符串(应该)等价于 string 'AAA',它从一开始就从未包含过'B'单位。
小智 5
尝试这个:
首先将“B”替换为“xx”
用分隔符“-”分割字符串。
string = 'AAABA'
string = string.replace('B', 'x-x')
print( string.split('-') )
OUT: ['AAAx', 'xA']`
Run Code Online (Sandbox Code Playgroud)
Run Code Online (Sandbox Code Playgroud)string = 'AABBA' string = string.replace('B', 'x-x') print( string.split('-') )出:['AAx','xx','xA']
| 归档时间: |
|
| 查看次数: |
317 次 |
| 最近记录: |