根据关键字将字符串分解为列表元素

ohb*_*sme 6 python regex

我正在尝试创建一个函数(在Python中),它接受输入(化学式)并分成列表.例如,如果输入为"HC2H3O2",则会将其转换为:

molecule_list = ['H', 1, 'C', 2, 'H', 3, 'O', 2]
Run Code Online (Sandbox Code Playgroud)

这个,到目前为止运作良好,但如果我输入一个带有两个字母的元素,例如钠(Na),它会将其分成:

['N', 'a']
Run Code Online (Sandbox Code Playgroud)

我正在寻找一种方法让我的函数通过字符串查找名为elements的字典中的键.我也在考虑使用正则表达式,但我不确定如何实现它.这就是我现在的功能:

def split_molecule(inputted_molecule):
    """Take the input and split it into a list
    eg: C02 => ['C', 1, 'O', 2]
    """
    # step 1: convert inputted_molecule to a list
    # step 2a: if there are two periodic elements next to each other, insert a '1'
    # step 2b: if the last element is an element, append a '1'
    # step 3: convert all numbers in list to ints

    # step 1:
    # problem: it splits Na into 'N', 'a'
    # it needs to split by periodic elements
    molecule_list = list(inputted_molecule)

    # because at most, the list can double when "1" is inserted
    max_length_of_molecule_list = 2*len(molecule_list)
    # step 2a:
    for i in range(0, max_length_of_molecule_list):
        try:
            if (molecule_list[i] in elements) and (molecule_list[i+1] in elements):
                molecule_list.insert(i+1, "1")
        except IndexError:
            break
    # step2b:     
    if (molecule_list[-1] in elements):
        molecule_list.append("1")

    # step 3:
    for i in range(0, len(molecule_list)):
        if molecule_list[i].isdigit():
            molecule_list[i] = int(molecule_list[i])

    return molecule_list
Run Code Online (Sandbox Code Playgroud)

geo*_*org 5

怎么样

import re
print re.findall('[A-Z][a-z]?|[0-9]+', 'Na2SO4MnO4')
Run Code Online (Sandbox Code Playgroud)

结果

['Na', '2', 'S', 'O', '4', 'Mn', 'O', '4']
Run Code Online (Sandbox Code Playgroud)

正则表达式解释说:

Find everything that is either

    [A-Z]   # A,B,...Z, ie. an uppercase letter
    [a-z]   # followed by a,b,...z, ie. a lowercase latter
    ?       # which is optional
    |       # or
    [0-9]   # 0,1,2...9, ie a digit
    +       # and perhaps some more of them
Run Code Online (Sandbox Code Playgroud)

这个表达式非常愚蠢,因为它接受任意"元素",比如"Xy".您可以通过将[A-Z][a-z]?部分替换为元素名称的实际列表来进行改进|,例如,用Ba|Na|Mn...|C|O

当然,正则表达式只能处理非常简单的公式,以解析类似的东西

  8(NH4)3P4Mo12O40 + 64NaNO3 + 149NH4NO3 + 135H2O
Run Code Online (Sandbox Code Playgroud)

你将需要一个真正的解析器,例如pyparsing(一定要检查"例子"下的"化学公式").祝好运!