确定化学式中的原子总数

Dan*_*iel 2 regex excel vba

我有数千个化学式的列表,其中可能包含任何元素的符号。我想确定每个公式中任何元素的原子总数。示例包括:

  • CH 3 NO 3
  • 硒化硒2
  • C 22
  • C 2 Cl 2 O 2
  • C 2 Cl 3 F
  • C 2 H 2 BrF 3
  • C 2 H 2 Br 2
  • C 2 H 3 Cl 3 Si

我想要单个式中的原子总数,因此对于第一个示例 (CH 3 NO 3 ),答案为 8(1 个碳 + 3 个氢 + 1 个氮 + 3 个氧)。

我找到了 PEH(从化学式中提取数字)的代码,它使用正则表达式来提取化学式中特定元素的实例数。

这可以用来给出总原子吗?

Public Function ChemRegex(ChemFormula As String, Element As String) As Long
    Dim regEx As New RegExp
    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
    End With
    
    'first pattern matches every element once
    regEx.Pattern = "([A][cglmrstu]|[B][aehikr]?|[C][adeflmnorsu]?|[D][bsy]|[E][rsu]|[F][elmr]?|[G][ade]|[H][efgos]?|[I][nr]?|[K][r]?|[L][airuv]|[M][cdgnot]|[N][abdehiop]?|[O][gs]?|[P][abdmortu]?|[R][abefghnu]|[S][bcegimnr]?|[T][abcehilms]|[U]|[V]|[W]|[X][e]|[Y][b]?|[Z][nr])([0-9]*)"
    
    Dim Matches As MatchCollection
    Set Matches = regEx.Execute(ChemFormula)
    
    Dim m As Match
    For Each m In Matches
        If m.SubMatches(0) = Element Then
            ChemRegex = ChemRegex + IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)
        End If
    Next m
    
    'second patternd finds parenthesis and multiplies elements within
    regEx.Pattern = "(\((.+?)\)([0-9])+)+?"
    Set Matches = regEx.Execute(ChemFormula)
    For Each m In Matches
        ChemRegex = ChemRegex + ChemRegex(m.SubMatches(1), Element) * (m.SubMatches(2) - 1) '-1 because all elements were already counted once in the first pattern
    Next m
End Function
Run Code Online (Sandbox Code Playgroud)

Pᴇʜ*_*Pᴇʜ 6

您可以通过循环遍历所有字符来做到这一点。计算所有大写字符的数量,并将所有数字减去 1。这就是元素的总数。

\n
Option Explicit\n\nPublic Function ChemCountTotalElements(ByVal ChemFormula As String) As Long\n    Dim RetVal As Long\n\n    Dim c As Long\n    For c = 1 To Len(ChemFormula)\n        Dim Char As String\n        Char = Mid$(ChemFormula, c, 1)\n        \n        If IsNumeric(Char) Then\n            RetVal = RetVal + CLng(Char) - 1\n        ElseIf Char = UCase(Char) Then\n            RetVal = RetVal + 1\n        End If\n        \n    Next c\n    \n    ChemCountTotalElements = RetVal\nEnd Function\n
Run Code Online (Sandbox Code Playgroud)\n

请注意,这不处理括号!并且它不会检查该元素是否确实存在。所以XYZ2会被算作4.

\n

另外,这只能处理以下数字10。如果您有 和10以上的数字,请使用下面的正则表达式解决方案(可以处理该问题)。

\n

在此输入图像描述

\n

还可以识别带前缀的化学式,例如 Ca(OH)\xe2\x82\x82

\n

如果您需要更精确的方法(检查元素是否存在)并识别括号,您需要再次使用 RegEx 来完成。

\n

由于 VBA 不支持开箱即用的正则表达式,因此我们需要首先引用 Windows 库。

\n
    \n
  1. 在“工具”下添加对正则表达式的引用,然后在“引用”下
    添加\n在此输入图像描述

    \n
  2. \n
  3. 并选择Microsoft VBScript 正则表达式 5.5
    \n在此输入图像描述

    \n
  4. \n
  5. 将此功能添加到模块中

    \n
    Public Function ChemRegexCountTotalElements(ByVal ChemFormula As String) As Long\n    Dim RetVal As Long\n\n    Dim regEx As New RegExp\n    With regEx\n        .Global = True\n        .MultiLine = True\n        .IgnoreCase = False\n    End With\n\n    \'first pattern matches every element once\n    regEx.Pattern = "([A][cglmrstu]|[B][aehikr]?|[C][adeflmnorsu]?|[D][bsy]|[E][rsu]|[F][elmr]?|[G][ade]|[H][efgos]?|[I][nr]?|[K][r]?|[L][airuv]|[M][cdgnot]|[N][abdehiop]?|[O][gs]?|[P][abdmortu]?|[R][abefghnu]|[S][bcegimnr]?|[T][abcehilms]|[U]|[V]|[W]|[X][e]|[Y][b]?|[Z][nr])([0-9]*)"\n\n    Dim Matches As MatchCollection\n    Set Matches = regEx.Execute(ChemFormula)\n\n    Dim m As Match\n    For Each m In Matches\n        RetVal = RetVal + IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)\n    Next m\n\n    \'second patternd finds parenthesis and multiplies elements within\n    regEx.Pattern = "(\\((.+?)\\)([0-9]+)+)+?"\n    Set Matches = regEx.Execute(ChemFormula)\n    For Each m In Matches\n        RetVal = RetVal + ChemRegexCountTotalElements(m.SubMatches(1)) * (m.SubMatches(2) - 1) \'-1 because all elements were already counted once in the first pattern\n    Next m\n\n    ChemRegexCountTotalElements = RetVal\nEnd Function\n
    Run Code Online (Sandbox Code Playgroud)\n
  6. \n
\n

虽然此代码也可以识别括号,但请注意,它不能识别嵌套括号。

\n

在此输入图像描述

\n

  • 很好,但是如果原子数超过 10 呢?无论如何,我不是化学家,所以我什至不确定这是否可能。 (2认同)