Clang 的“你的意思是……?”怎么说?变量名修正算法有效吗?

Mee*_*ies 5 c++ compiler-errors clang

我正在使用 Clang 编译 C++ 代码。( Apple clang version 12.0.5 (clang-1205.0.22.11))。

如果您拼写错误变量,Clang 可以给出提示:

#include <iostream>

int main() {
    int my_int;
    std::cout << my_it << std::endl;
}
Run Code Online (Sandbox Code Playgroud)
spellcheck-test.cpp:5:18: error: use of undeclared identifier 'my_it'; did you mean 'my_int'?
    std::cout << my_it << std::endl;
                 ^~~~~
                 my_int
spellcheck-test.cpp:4:9: note: 'my_int' declared here
    int my_int;
        ^
1 error generated.
Run Code Online (Sandbox Code Playgroud)

我的问题是:

Clang 使用什么标准来确定何时建议另一个变量?

我的实验表明它相当复杂:

  • 如果您可能指的是另一个类似命名的变量(例如int my_in;),它不会给出建议
  • 如果建议的变量的操作类型错误(例如尝试打印my_it.size()),则不会给出建议
  • 它是否给出建议取决于变量名称的重要比较:它允许删除和插入字符,并且较长的变量名称允许更多的插入/删除被视为“相似”。

dfr*_*fri 6

您可能不会找到它的文档,但由于 Clang 是开源的,您可以求助于源代码来尝试弄清楚它。

叮当?

特定的诊断(来自DiagnosticSemaKinds.td):

def err_undeclared_var_use_suggest : Error<
  "use of undeclared identifier %0; did you mean %1?">;
Run Code Online (Sandbox Code Playgroud)

只被引用自clang-tools-extra/clangd/IncludeFixer.cpp

  // Try to fix unresolved name caused by missing declaration.
  // E.g.
  //   clang::SourceManager SM;
  //          ~~~~~~~~~~~~~
  //          UnresolvedName
  //   or
  //   namespace clang {  SourceManager SM; }
  //                      ~~~~~~~~~~~~~
  //                      UnresolvedName
  // We only attempt to recover a diagnostic if it has the same location as
  // the last seen unresolved name.
  if (DiagLevel >= DiagnosticsEngine::Error &&
      LastUnresolvedName->Loc == Info.getLocation())
    return fixUnresolvedName();
Run Code Online (Sandbox Code Playgroud)

现在,clangd是一个语言服务器,老实说,我不知道 Clang 编译器前端是否实际使用它来产生某些诊断,但您可以自由地继续深入研究,将这些细节联系在一起。上面fixUnresolvedName最终执行了模糊搜索:

if (llvm::Optional<const SymbolSlab *> Syms = fuzzyFindCached(Req))
    return fixesForSymbols(**Syms);
Run Code Online (Sandbox Code Playgroud)

如果您想深入了解细节,我建议从fuzzyFindCached函数开始:

llvm::Optional<const SymbolSlab *>
IncludeFixer::fuzzyFindCached(const FuzzyFindRequest &Req) const {
  auto ReqStr = llvm::formatv("{0}", toJSON(Req)).str();
  auto I = FuzzyFindCache.find(ReqStr);
  if (I != FuzzyFindCache.end())
    return &I->second;

  if (IndexRequestCount >= IndexRequestLimit)
    return llvm::None;
  IndexRequestCount++;

  SymbolSlab::Builder Matches;
  Index.fuzzyFind(Req, [&](const Symbol &Sym) {
    if (Sym.Name != Req.Query)
      return;
    if (!Sym.IncludeHeaders.empty())
      Matches.insert(Sym);
  });
  auto Syms = std::move(Matches).build();
  auto E = FuzzyFindCache.try_emplace(ReqStr, std::move(Syms));
  return &E.first->second;
}
Run Code Online (Sandbox Code Playgroud)

连同其单个函数参数的类型,FuzzyFindRequestclang/index/Index.h

struct FuzzyFindRequest {
  /// A query string for the fuzzy find. This is matched against symbols'
  /// un-qualified identifiers and should not contain qualifiers like "::".
  std::string Query;
  /// If this is non-empty, symbols must be in at least one of the scopes
  /// (e.g. namespaces) excluding nested scopes. For example, if a scope "xyz::"
  /// is provided, the matched symbols must be defined in namespace xyz but not
  /// namespace xyz::abc.
  ///
  /// The global scope is "", a top level scope is "foo::", etc.
  std::vector<std::string> Scopes;
  /// If set to true, allow symbols from any scope. Scopes explicitly listed
  /// above will be ranked higher.
  bool AnyScope = false;
  /// The number of top candidates to return. The index may choose to
  /// return more than this, e.g. if it doesn't know which candidates are best.
  llvm::Optional<uint32_t> Limit;
  /// If set to true, only symbols for completion support will be considered.
  bool RestrictForCodeCompletion = false;
  /// Contextually relevant files (e.g. the file we're code-completing in).
  /// Paths should be absolute.
  std::vector<std::string> ProximityPaths;
  /// Preferred types of symbols. These are raw representation of `OpaqueType`.
  std::vector<std::string> PreferredTypes;

  bool operator==(const FuzzyFindRequest &Req) const {
    return std::tie(Query, Scopes, Limit, RestrictForCodeCompletion,
                    ProximityPaths, PreferredTypes) ==
           std::tie(Req.Query, Req.Scopes, Req.Limit,
                    Req.RestrictForCodeCompletion, Req.ProximityPaths,
                    Req.PreferredTypes);
  }
  bool operator!=(const FuzzyFindRequest &Req) const { return !(*this == Req); }
};
Run Code Online (Sandbox Code Playgroud)

其他兔子洞?

以下提交可能是另一个起点:

[前端] 允许将外部 sema 源附加到编译器实例,并将额外的诊断附加到 TypoCorctions

这可用于将替代的拼写错误更正附加到现有的诊断中。include-fixer 可以使用它来建议添加包含。

差异修订:https://reviews.llvm.org/D26745

由此我们可能会得到clang/include/clang/Sema/TypoCorrection.h,这听起来像是编译器前端比(clang 额外工具)clangd 更合理使用的功能。例如:

  /// Gets the "edit distance" of the typo correction from the typo.
  /// If Normalized is true, scale the distance down by the CharDistanceWeight
  /// to return the edit distance in terms of single-character edits.
  unsigned getEditDistance(bool Normalized = true) const {
    if (CharDistance > MaximumDistance || QualifierDistance > MaximumDistance ||
        CallbackDistance > MaximumDistance)
      return InvalidDistance;
    unsigned ED =
        CharDistance * CharDistanceWeight +
        QualifierDistance * QualifierDistanceWeight +
        CallbackDistance * CallbackDistanceWeight;
    if (ED > MaximumDistance)
      return InvalidDistance;
    // Half the CharDistanceWeight is added to ED to simulate rounding since
    // integer division truncates the value (i.e. round-to-nearest-int instead
    // of round-to-zero).
    return Normalized ? NormalizeEditDistance(ED) : ED;
  }
Run Code Online (Sandbox Code Playgroud)

用于clang/lib/Sema/SemaDecl.cpp

// Callback to only accept typo corrections that have a non-zero edit distance.
// Also only accept corrections that have the same parent decl.
class DifferentNameValidatorCCC final : public CorrectionCandidateCallback {
 public:
  DifferentNameValidatorCCC(ASTContext &Context, FunctionDecl *TypoFD,
                            CXXRecordDecl *Parent)
      : Context(Context), OriginalFD(TypoFD),
        ExpectedParent(Parent ? Parent->getCanonicalDecl() : nullptr) {}

  bool ValidateCandidate(const TypoCorrection &candidate) override {
    if (candidate.getEditDistance() == 0)
      return false;
     // ...
  }
  // ...
};
Run Code Online (Sandbox Code Playgroud)