我有数千个化学式的列表,其中可能包含任何元素的符号。我想确定每个公式中任何元素的原子总数。示例包括:
我想要单个式中的原子总数,因此对于第一个示例 (CH 3 NO 3 ),答案为 8(1 个碳 + 3 个氢 + 1 个氮 + 3 个氧)。
我找到了 PEH(从化学式中提取数字)的代码,它使用正则表达式来提取化学式中特定元素的实例数。
这可以用来给出总原子吗?
Public Function ChemRegex(ChemFormula As String, Element As String) As Long
Dim regEx As New RegExp
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
End With
'first pattern matches every element once
regEx.Pattern = "([A][cglmrstu]|[B][aehikr]?|[C][adeflmnorsu]?|[D][bsy]|[E][rsu]|[F][elmr]?|[G][ade]|[H][efgos]?|[I][nr]?|[K][r]?|[L][airuv]|[M][cdgnot]|[N][abdehiop]?|[O][gs]?|[P][abdmortu]?|[R][abefghnu]|[S][bcegimnr]?|[T][abcehilms]|[U]|[V]|[W]|[X][e]|[Y][b]?|[Z][nr])([0-9]*)"
Dim Matches As MatchCollection
Set Matches = regEx.Execute(ChemFormula)
Dim m As Match
For Each m In Matches
If m.SubMatches(0) = Element Then
ChemRegex = ChemRegex + IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)
End If
Next m
'second patternd finds parenthesis and multiplies elements within
regEx.Pattern = "(\((.+?)\)([0-9])+)+?"
Set Matches = regEx.Execute(ChemFormula)
For Each m In Matches
ChemRegex = ChemRegex + ChemRegex(m.SubMatches(1), Element) * (m.SubMatches(2) - 1) '-1 because all elements were already counted once in the first pattern
Next m
End Function
Run Code Online (Sandbox Code Playgroud)
您可以通过循环遍历所有字符来做到这一点。计算所有大写字符的数量,并将所有数字减去 1。这就是元素的总数。
\nOption Explicit\n\nPublic Function ChemCountTotalElements(ByVal ChemFormula As String) As Long\n Dim RetVal As Long\n\n Dim c As Long\n For c = 1 To Len(ChemFormula)\n Dim Char As String\n Char = Mid$(ChemFormula, c, 1)\n \n If IsNumeric(Char) Then\n RetVal = RetVal + CLng(Char) - 1\n ElseIf Char = UCase(Char) Then\n RetVal = RetVal + 1\n End If\n \n Next c\n \n ChemCountTotalElements = RetVal\nEnd Function\n
Run Code Online (Sandbox Code Playgroud)\n请注意,这不处理括号!并且它不会检查该元素是否确实存在。所以XYZ2
会被算作4
.
另外,这只能处理以下数字10
。如果您有 和10
以上的数字,请使用下面的正则表达式解决方案(可以处理该问题)。
如果您需要更精确的方法(检查元素是否存在)并识别括号,您需要再次使用 RegEx 来完成。
\n由于 VBA 不支持开箱即用的正则表达式,因此我们需要首先引用 Windows 库。
\n将此功能添加到模块中
\nPublic Function ChemRegexCountTotalElements(ByVal ChemFormula As String) As Long\n Dim RetVal As Long\n\n Dim regEx As New RegExp\n With regEx\n .Global = True\n .MultiLine = True\n .IgnoreCase = False\n End With\n\n \'first pattern matches every element once\n regEx.Pattern = "([A][cglmrstu]|[B][aehikr]?|[C][adeflmnorsu]?|[D][bsy]|[E][rsu]|[F][elmr]?|[G][ade]|[H][efgos]?|[I][nr]?|[K][r]?|[L][airuv]|[M][cdgnot]|[N][abdehiop]?|[O][gs]?|[P][abdmortu]?|[R][abefghnu]|[S][bcegimnr]?|[T][abcehilms]|[U]|[V]|[W]|[X][e]|[Y][b]?|[Z][nr])([0-9]*)"\n\n Dim Matches As MatchCollection\n Set Matches = regEx.Execute(ChemFormula)\n\n Dim m As Match\n For Each m In Matches\n RetVal = RetVal + IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)\n Next m\n\n \'second patternd finds parenthesis and multiplies elements within\n regEx.Pattern = "(\\((.+?)\\)([0-9]+)+)+?"\n Set Matches = regEx.Execute(ChemFormula)\n For Each m In Matches\n RetVal = RetVal + ChemRegexCountTotalElements(m.SubMatches(1)) * (m.SubMatches(2) - 1) \'-1 because all elements were already counted once in the first pattern\n Next m\n\n ChemRegexCountTotalElements = RetVal\nEnd Function\n
Run Code Online (Sandbox Code Playgroud)\n虽然此代码也可以识别括号,但请注意,它不能识别嵌套括号。
\n\n