Clang Source Code Analysis (2)

前面部分分析了初始化流程，紧接着分析的是 Parse 部分。

所有 Action 的基类是 FrontendAction，在 FrontendAction 的 Execute 中发现：

bool FrontendAction::Execute() {
  CompilerInstance &CI = getCompilerInstance();

  if (CI.hasFrontendTimer()) {
    llvm::TimeRegion Timer(CI.getFrontendTimer());
    ExecuteAction();
  }
  else ExecuteAction();

  // ....

  return true;
}

流程转入了 ExecuteAction 中，目前的 Act 是 EmitLLVMAction：

class EmitLLVMAction : public CodeGenAction {
  virtual void anchor();
public:
  EmitLLVMAction(llvm::LLVMContext *_VMContext = nullptr);
};

实际上 EmitLLVMAction 只是 CodeGenAction 的一个子类。所以应该在 CodeGenAction 中找 Execute 的逻辑。

void CodeGenAction::ExecuteAction() {
  // If this is an IR file, we have to treat it specially.
  if (getCurrentFileKind() == IK_LLVM_IR) {
    // other codes.
  }

  // Otherwise follow the normal AST path.
  this->ASTFrontendAction::ExecuteAction();
}

所以在 CodeGenAction::ExecuteAction 中，直接使用了 ASTFrontendAction::ExecuteAction 的实现。找到 ASTFrontendAction 对应的 ExecuteAction：

void ASTFrontendAction::ExecuteAction() {
  ParseAST(CI.getSema(), CI.getFrontendOpts().ShowStats,
           CI.getFrontendOpts().SkipFunctionBodies);
}

这里截取了重要的部分代码。跟踪 ParseAST，截取代码如下：

void clang::ParseAST(Sema &S, bool PrintStats, bool SkipFunctionBodies) {
  ASTConsumer *Consumer = &S.getASTConsumer();

  std::unique_ptr<Parser> ParseOP(
      new Parser(S.getPreprocessor(), S, SkipFunctionBodies));
  Parser &P = *ParseOP.get();

  S.getPreprocessor().EnterMainSourceFile();
  P.Initialize();

  if (P.ParseTopLevelDecl(ADecl)) {
    if (!External && !S.getLangOpts().CPlusPlus)
      P.Diag(diag::ext_empty_translation_unit);
  } else {
    do {
      // If we got a null return and something *was* parsed, ignore it.  This
      // is due to a top-level semicolon, an action override, or a parse error
      // skipping something.
      if (ADecl && !Consumer->HandleTopLevelDecl(ADecl.get()))
        return;
    } while (!P.ParseTopLevelDecl(ADecl));
  }

  // Process any TopLevelDecls generated by #pragma weak.
  for (Decl *D : S.WeakTopLevelDecls())
    Consumer->HandleTopLevelDecl(DeclGroupRef(D));
  
  Consumer->HandleTranslationUnit(S.getASTContext());
}

Parse 通过 ParseTopLevelDecl 得到 Decl ，然后通过 ASTConsumer 的 HandleTopLevelDecl 处理。忽略其他现在并不关心的部分，在 ParseTopLevelDecl 内部调用 ParseExternalDeclaration 开始。而 ParseExternalDeclaration 内部，我们只关心下面一行代码：

return ParseDeclarationOrFunctionDefinition(attrs, DS);

这里处理声明或者函数定义，内部有一个 Internal 包含。这其中分为两部分，一部分是 ParseDeclarationSpecifiers；另一部分是最后的 ParseDeclGroup。第一部分用于获取类型说明符，第二部分则是具体声明部分，这里重点看第二部分：

Parser::DeclGroupPtrTy Parser::ParseDeclGroup(ParsingDeclSpec &DS,
                                              unsigned Context,
                                              SourceLocation *DeclEnd,
                                              ForRangeInit *FRI) {
  // Parse the first declarator.
  ParsingDeclarator D(*this, DS, static_cast<Declarator::TheContext>(Context));
  ParseDeclarator(D);

  // Check to see if we have a function *definition* which must have a body.
  if (D.isFunctionDeclarator() && !isDeclarationAfterDeclarator()) {
      Decl *TheDecl =
          ParseFunctionDefinition(D, ParsedTemplateInfo(), &LateParsedAttrs);
        return Actions.ConvertDeclToDeclGroup(TheDecl);
  }

  SmallVector<Decl *, 8> DeclsInGroup;
  Decl *FirstDecl = ParseDeclarationAfterDeclaratorAndAttributes(
      D, ParsedTemplateInfo(), FRI);

  // If we don't have a comma, it is either the end of the list (a ';') or an
  // error, bail out.
  SourceLocation CommaLoc;
  while (TryConsumeToken(tok::comma, CommaLoc)) {
    ParseDeclarator(D);
    if (!D.isInvalidType()) {
      Decl *ThisDecl = ParseDeclarationAfterDeclarator(D);
    }
  }
}

ParseDeclGroup 的结构大致如上，首先使用 ParseDeclarator 获取一个声明，比如 int i; 这里得到的就是 i，然后判断是否紧接 ()，如果是，则调用 ParseFunctionDefinition 分析函数定义，否则循环调用 ParseDeclarator 获取所有声明的变量。

对于声明而言，需要调用 ParseDeclarationAfterDeclarator 将类型与 declarator 结合，形成一个完整的声明。ParseDeclarationAfterDeclarator 中调用了 ParseDeclarationAfterDeclaratorAndAttributes, 而 ParseDeclarationAfterDeclaratorAndAttributes 中实际调用了 Actions.ActOnDeclarator，ActOnDeclarator 实际调用了 HandleDeclarator。

在 HandleDeclarator 中，实际的工作有三个，首先调用 GetTypeForDeclarator; 得到类型信息，因为各个类型实际上只有一个实例，所以这里需要映射过程。其次调用以下几个中的某一个:

ActOnTypedefDeclarator
ActOnFunctionDeclarator
ActOnVariableDeclarator

最后，调用 PushOnScopeChains 将声明存起来。保存起来的 declarator 的信息可以用于处理下一次遇到 declarator 判断是否符合语法。

现在回头看 ParseFunctionDefinition 部分。节选代码如下：

Decl *Parser::ParseFunctionDefinition(ParsingDeclarator &D,
                                      const ParsedTemplateInfo &TemplateInfo,
                                      LateParsedAttrList *LateParsedAttrs) {
  // Enter a scope for the function body.
  ParseScope BodyScope(this, Scope::FnScope|Scope::DeclScope);

  // Tell the actions module that we have entered a function definition with the
  // specified Declarator for the function.
  Decl *Res = Actions.ActOnStartOfFunctionDef(getCurScope(), D,
                                              TemplateInfo.TemplateParams
                                                  ? *TemplateInfo.TemplateParams
                                                  : MultiTemplateParamsArg(),
                                              &SkipBody);

  return ParseFunctionStatementBody(Res, BodyScope);
}

首先进入函数作用域，执行相应的 Action，最后调用 ParseFunctionStatementBody 开始解析函数部分。

Decl *Parser::ParseFunctionStatementBody(Decl *Decl, ParseScope &BodyScope) {
  // Do not enter a scope for the brace, as the arguments are in the same scope
  // (the function body) as the body itself.  Instead, just read the statement
  // list and put it into a CompoundStmt for safe keeping.
  StmtResult FnBody(ParseCompoundStatementBody());

  BodyScope.Exit();
  return Actions.ActOnFinishFunctionBody(Decl, FnBody.get());
}

ParseFunctionStatementBody 中最重要的一句是:

StmtResult FnBody(ParseCompoundStatementBody());

然后就是退出作用域，执行相应的 Action。继续跟进 ParseCompoundStatementBody。

在 ParseCompoundStatementBody 中，暂时忽略 kw__extension__ 的情况，于是，实际的调用为 ParseStatementOrDeclaration，其中又调用了 ParseStatementOrDeclarationAfterAttributes，这个函数就是正式进行分析的代码部分：

StmtResult
Parser::ParseStatementOrDeclarationAfterAttributes(StmtVector &Stmts,
          AllowedContsructsKind Allowed, SourceLocation *TrailingElseLoc,
          ParsedAttributesWithRange &Attrs) {
  switch (Kind) {
  case tok::identifier: {
    Token Next = NextToken();
    if (Next.is(tok::colon)) { // C99 6.8.1: labeled-statement
      // identifier ':' statement
      return ParseLabeledStatement(Attrs);
    }

    // Look up the identifier, and typo-correct it to a keyword if it's not
    // found.
    if (Next.isNot(tok::coloncolon)) {
      // Try to limit which sets of keywords should be included in typo
      // correction based on what the next token is.
      if (TryAnnotateName(/*IsAddressOfOperand*/ false,
                          llvm::make_unique<StatementFilterCCC>(Next)) ==
          ANK_Error) {
        // Handle errors here by skipping up to the next semicolon or '}', and
        // eat the semicolon if that's what stopped us.
        SkipUntil(tok::r_brace, StopAtSemi | StopBeforeMatch);
        if (Tok.is(tok::semi))
          ConsumeToken();
        return StmtError();
      }

      // If the identifier was typo-corrected, try again.
      if (Tok.isNot(tok::identifier))
        goto Retry;
    }

    // Fall through
  }

  default: {
    if ((getLangOpts().CPlusPlus || Allowed == ACK_Any) &&
        isDeclarationStatement()) {
      SourceLocation DeclStart = Tok.getLocation(), DeclEnd;
      DeclGroupPtrTy Decl = ParseDeclaration(Declarator::BlockContext,
                                             DeclEnd, Attrs);
      return Actions.ActOnDeclStmt(Decl, DeclStart, DeclEnd);
    }

    if (Tok.is(tok::r_brace)) {
      Diag(Tok, diag::err_expected_statement);
      return StmtError();
    }

    return ParseExprStatement();
  }

  case tok::kw_case:                // C99 6.8.1: labeled-statement
    return ParseCaseStatement();
  case tok::kw_default:             // C99 6.8.1: labeled-statement
    return ParseDefaultStatement();

  case tok::l_brace:                // C99 6.8.2: compound-statement
    return ParseCompoundStatement();
  case tok::semi: {                 // C99 6.8.3p3: expression[opt] ';'
    bool HasLeadingEmptyMacro = Tok.hasLeadingEmptyMacro();
    return Actions.ActOnNullStmt(ConsumeToken(), HasLeadingEmptyMacro);
  }

  case tok::kw_if:                  // C99 6.8.4.1: if-statement
    return ParseIfStatement(TrailingElseLoc);
  case tok::kw_switch:              // C99 6.8.4.2: switch-statement
    return ParseSwitchStatement(TrailingElseLoc);

  case tok::kw_while:               // C99 6.8.5.1: while-statement
    return ParseWhileStatement(TrailingElseLoc);
  case tok::kw_do:                  // C99 6.8.5.2: do-statement
    Res = ParseDoStatement();
    SemiError = "do/while";
    break;
  case tok::kw_for:                 // C99 6.8.5.3: for-statement
    return ParseForStatement(TrailingElseLoc);

  case tok::kw_goto:                // C99 6.8.6.1: goto-statement
    Res = ParseGotoStatement();
    SemiError = "goto";
    break;
  case tok::kw_continue:            // C99 6.8.6.2: continue-statement
    Res = ParseContinueStatement();
    SemiError = "continue";
    break;
  case tok::kw_break:               // C99 6.8.6.3: break-statement
    Res = ParseBreakStatement();
    SemiError = "break";
    break;
  case tok::kw_return:              // C99 6.8.6.4: return-statement
    Res = ParseReturnStatement();
    SemiError = "return";
    break;
  }

  return Res;
}

关键代码如上，可以看到针对对应的关键字，调用对应的 Parse。这里就不继续跟踪了，有兴趣可以深入了解。

可以发现的是，Action 穿插在代码中，Parse 到指定位置，则调用相应的语义动作进行检查，这样就不需要在 Parse 完成后，遍历一次语法树，且代码简介易于编写。