Original source: medium.com/@huidou/lets-understand-chrome-v..

Welcome to other chapters of Let’s Understand Chrome V8

In this article, we’ll talk about the source code and important data structures of V8’s JS parser. Our test case is the same as in Chapter 4.

1. Parser

In V8, the parser is the next stage of the scanner, and the token output by the scanner is the input of the parser. During compiling JS, the parser will frequently call the scanner to generate the token. With the help of our case, let’s analyze the code of the parser. The following is DoParseProgram which is our start.

1.  FunctionLiteral* Parser::DoParseProgram(Isolate* isolate, ParseInfo* info) {
2.    DCHECK_EQ(parsing_on_main_thread_, isolate != nullptr);
3.    DCHECK_NULL(scope_);
4.    ParsingModeScope mode(this, allow_lazy_ ? PARSE_LAZILY : PARSE_EAGERLY);
5.    ResetFunctionLiteralId();
6.    FunctionLiteral* result = nullptr;
7.    {
8.  //omit.....................
9.      if (flags().is_module()) {
10.        DCHECK(flags().is_module());
11.        PrepareGeneratorVariables();
12.        Expression* initial_yield =
13.            BuildInitialYield(kNoSourcePosition, kGeneratorFunction);
14.        body.Add(
15.            factory()->NewExpressionStatement(initial_yield, kNoSourcePosition));
16.        if (flags().allow_harmony_top_level_await()) {
17.          BlockT block = impl()->NullBlock();
18.          {
19.            StatementListT statements(pointer_buffer());
20.            ParseModuleItemList(&statements);
21.            if (function_state.suspend_count() > 1) {
22.              scope->set_is_async_module();
23.              block = factory()->NewBlock(true, statements);
24.            } else {
25.              statements.MergeInto(&body);
26.            }
27.          }
28.          if (IsAsyncModule(scope->function_kind())) {
29.            impl()->RewriteAsyncFunctionBody(
30.                &body, block, factory()->NewUndefinedLiteral(kNoSourcePosition));
31.          }
32.        } else {
33.          ParseModuleItemList(&body);
34.        }
35.        if (!has_error() &&
36.            !module()->Validate(this->scope()->AsModuleScope(),
37.                                pending_error_handler(), zone())) {
38.          scanner()->set_parser_error();
39.        }
40.      } else if (info->is_wrapped_as_function()) {
41.        DCHECK(parsing_on_main_thread_);
42.        ParseWrapped(isolate, info, &body, scope, zone());
43.      } else if (flags().is_repl_mode()) {
44.        ParseREPLProgram(info, &body, scope);
45.      } else {
46.        this->scope()->SetLanguageMode(info->language_mode());
47.        ParseStatementList(&body, Token::EOS);
48.      }
49.  //omit............................................
50.    }
51.    info->set_max_function_literal_id(GetLastFunctionLiteralId());
52.    if (has_error()) return nullptr;
53.    RecordFunctionLiteralSourceRange(result);
54.    return result;
55.  }

The 6th line defines a result, which wil save the AST tree. After DoParseProgram() exit, the AST is generated. In our use case, the following method will be called.

1.  void ParserBase<Impl>::ParseStatementList(StatementListT* body,
2.                                            Token::Value end_token) {
3.    while (peek() == Token::STRING) {
4.  //omit.....
5.      StatementT stat = ParseStatementListItem();
6.      if (impl()->IsNull(stat)) return;
7.      body->Add(stat);
8.      if (!impl()->IsStringLiteral(stat)) break;
9.      if (use_strict) {
10.        RaiseLanguageMode(LanguageMode::kStrict);
11.        if (!scope()->HasSimpleParameters()) {
12.          impl()->ReportMessageAt(token_loc,
13.                                  MessageTemplate::kIllegalLanguageModeDirective,
14.                                  "use strict");
15.          return;
16.        }
17.  //omit................
18.      } else {
19.        RaiseLanguageMode(LanguageMode::kSloppy);
20.      }
21.    }
22.    while (peek() != end_token) {
23.      StatementT stat = ParseStatementListItem();
24.      if (impl()->IsNull(stat)) return;
25.      if (stat->IsEmptyStatement()) continue;
26.      body->Add(stat);
27.    }
28.  }

ParseStatementList() starts parsing program statements. In the third line of code, peek() takes out the type of the token word. For our case, the type obtained by the peek is Token::FUNCTION, so the result of while is false, and then jumps to line 22 and executes ParseStatementListItem(), which is defined below.

1.  ParserBase<Impl>::ParseHoistableDeclaration(
2.      ZonePtrList<const AstRawString>* names, bool default_export) {
3.    Consume(Token::FUNCTION);//token cache
4.    int pos = position();
5.    ParseFunctionFlags flags = ParseFunctionFlag::kIsNormal;
6.    if (Check(Token::MUL)) {
7.      flags |= ParseFunctionFlag::kIsGenerator;
8.    }
9.    return ParseHoistableDeclaration(pos, flags, names, default_export);
10.  }

Line 3, Consume() is the token cache we mentioned in the third article. It takes out a token from the cache, and when the cache is missing, it calls the Scanner to generate tokens.

For our case, after knowing that the token type is a function, we need to determine which FunctionKind the function is. The specific code of FunctionKind is as follows:

enum FunctionKind : uint8_t {
  // BEGIN constructable functions
  kNormalFunction,
  kModule,
  kAsyncModule,
//.................................
//omit
//.................................
  // END concise methods 1
  kAsyncGeneratorFunction,
  // END async functions
  kGeneratorFunction,
  // BEGIN concise methods 2
  kConciseGeneratorMethod,
  kStaticConciseGeneratorMethod,
  // END generators
  kConciseMethod,
  kStaticConciseMethod,
  kClassMembersInitializerFunction,
  kClassStaticInitializerFunction,
  // END concise methods 2
  kInvalid,

  kLastFunctionKind = kClassStaticInitializerFunction,
};

Note: Don’t confuse FunctionKind with Token::FUNCTION, the token is a stuff in compilation technology, but FunctionKind belongs to ECMA specification. In our case, the FunctionKind of the function is KnormalFunction, so the parser will analyze the Token::IDENTIFIER of this function, the code is bellow.

const AstRawString* Scanner::CurrentSymbol(
    AstValueFactory* ast_value_factory) const {
  if (is_literal_one_byte()) {
    return ast_value_factory->GetOneByteString(literal_one_byte_string());
  }
  return ast_value_factory->GetTwoByteString(literal_two_byte_string());
}

CurrentSymbol judges whether it is single-byte or double-byte. In our case, JsPrint is single byte. Figure 1 is the call stack of CurrentSymbol. At this point, Parse is complete. In our case, the three most important things Parse does:

In the JS source code, when Parser meets the function token, it will know that the next token is a function;
Judging FunctionKind, is it asynchronous or something else? Our case is kNormalFunction.
Take out the next token JsPrint, and parse it.
2. Lazy Parser
What is delay parser? It is an optimization technique in V8, that is, the code is not parsed until it is executed. As we all know, because of control conditions, not all code will be executed. Based on this, V8 uses delay parser and delay compilation to improve efficiency.

In our case, the JsPrint’s type is kNormalFunction, which is the function to be executed immediately, so we need to parse it immediately.

FunctionLiteral* Parser::ParseFunctionLiteral(
    const AstRawString* function_name, Scanner::Location function_name_location,
    FunctionNameValidity function_name_validity, FunctionKind kind,
    int function_token_pos, FunctionSyntaxKind function_syntax_kind,
    LanguageMode language_mode,
    ZonePtrList<const AstRawString>* arguments_for_wrapped_function) {
  bool is_wrapped = function_syntax_kind == FunctionSyntaxKind::kWrapped;
  DCHECK_EQ(is_wrapped, arguments_for_wrapped_function != nullptr);
  int pos = function_token_pos == kNoSourcePosition ? peek_position()
                                                    : function_token_pos;
  DCHECK_NE(kNoSourcePosition, pos);
  bool should_infer_name = function_name == nullptr;

  if (should_infer_name) {
    function_name = ast_value_factory()->empty_string();
  }
  FunctionLiteral::EagerCompileHint eager_compile_hint =
      function_state_->next_function_is_likely_called() || is_wrapped
          ? FunctionLiteral::kShouldEagerCompile
          : default_eager_compile_hint();
  DCHECK_IMPLIES(parse_lazily(), info()->flags().allow_lazy_compile());
  DCHECK_IMPLIES(parse_lazily(), has_error() || allow_lazy_);
  DCHECK_IMPLIES(parse_lazily(), extension() == nullptr);

  const bool is_lazy =
      eager_compile_hint == FunctionLiteral::kShouldLazyCompile;
  const bool is_top_level = AllowsLazyParsingWithoutUnresolvedVariables();
  const bool is_eager_top_level_function = !is_lazy && is_top_level;
  const bool is_lazy_top_level_function = is_lazy && is_top_level;
  const bool is_lazy_inner_function = is_lazy && !is_top_level;

  RCS_SCOPE(runtime_call_stats_, RuntimeCallCounterId::kParseFunctionLiteral,
            RuntimeCallStats::kThreadSpecific);
  base::ElapsedTimer timer;
  if (V8_UNLIKELY(FLAG_log_function_events)) timer.Start();
  const bool should_preparse_inner = parse_lazily() && is_lazy_inner_function;
  bool should_post_parallel_task =
      parse_lazily() && is_eager_top_level_function &&
      FLAG_parallel_compile_tasks && info()->parallel_tasks() &&
      scanner()->stream()->can_be_cloned_for_parallel_access();

  // This may be modified later to reflect preparsing decision taken
  bool should_preparse = (parse_lazily() && is_lazy_top_level_function) ||
                         should_preparse_inner || should_post_parallel_task;
  ScopedPtrList<Statement> body(pointer_buffer());
  int expected_property_count = 0;
  int suspend_count = -1;
  int num_parameters = -1;
  int function_length = -1;
  bool has_duplicate_parameters = false;
  int function_literal_id = GetNextFunctionLiteralId();
  ProducedPreparseData* produced_preparse_data = nullptr;
  Zone* parse_zone = should_preparse ? &preparser_zone_ : zone();
  DeclarationScope* scope = NewFunctionScope(kind, parse_zone);
  SetLanguageMode(scope, language_mode);
#ifdef DEBUG
  scope->SetScopeName(function_name);
#endif
  if (!is_wrapped && V8_UNLIKELY(!Check(Token::LPAREN))) {
    ReportUnexpectedToken(Next());
    return nullptr;
  }
  scope->set_start_position(position());

  bool did_preparse_successfully =
      should_preparse &&
      SkipFunction(function_name, kind, function_syntax_kind, scope,
                   &num_parameters, &function_length, &produced_preparse_data);
  if (!did_preparse_successfully) {
    if (should_preparse) Consume(Token::LPAREN);
    should_post_parallel_task = false;
    ParseFunction(&body, function_name, pos, kind, function_syntax_kind, scope,
                  &num_parameters, &function_length, &has_duplicate_parameters,
                  &expected_property_count, &suspend_count,
                  arguments_for_wrapped_function);
  }
  if (V8_UNLIKELY(FLAG_log_function_events)) {
    double ms = timer.Elapsed().InMillisecondsF();
    const char* event_name =
        should_preparse
            ? (is_top_level ? "preparse-no-resolution" : "preparse-resolution")
            : "full-parse";
    logger_->FunctionEvent(
        event_name, flags().script_id(), ms, scope->start_position(),
        scope->end_position(),
        reinterpret_cast<const char*>(function_name->raw_data()),
        function_name->byte_length(), function_name->is_one_byte());
  }
#ifdef V8_RUNTIME_CALL_STATS
  if (did_preparse_successfully && runtime_call_stats_ &&
      V8_UNLIKELY(TracingFlags::is_runtime_stats_enabled())) {
    runtime_call_stats_->CorrectCurrentCounterId(
        RuntimeCallCounterId::kPreParseWithVariableResolution,
        RuntimeCallStats::kThreadSpecific);
  }
#endif  // V8_RUNTIME_CALL_STATS

  language_mode = scope->language_mode();
  CheckFunctionName(language_mode, function_name, function_name_validity,
                    function_name_location);

  if (is_strict(language_mode)) {
    CheckStrictOctalLiteral(scope->start_position(), scope->end_position());
  }

  FunctionLiteral::ParameterFlag duplicate_parameters =
      has_duplicate_parameters ? FunctionLiteral::kHasDuplicateParameters
                               : FunctionLiteral::kNoDuplicateParameters;
  FunctionLiteral* function_literal = factory()->NewFunctionLiteral(
      function_name, scope, body, expected_property_count, num_parameters,
      function_length, duplicate_parameters, function_syntax_kind,
      eager_compile_hint, pos, true, function_literal_id,
      produced_preparse_data);
  function_literal->set_function_token_position(function_token_pos);
  function_literal->set_suspend_count(suspend_count);

  RecordFunctionLiteralSourceRange(function_literal);

  if (should_post_parallel_task) {
    // Start a parallel parse / compile task on the compiler dispatcher.
    info()->parallel_tasks()->Enqueue(info(), function_name, function_literal);
  }

  if (should_infer_name) {
    fni_.AddFunction(function_literal);
  }
  return function_literal;
}

After analyzing the function name (JsPrint), ParseFunctionLiteral will be called, which is responsible for parsing the function body.

Figure 2 is the case, you can see that JsPrint will not be executed immediately, meeting the delay parse condition. We can also get the same conclusion from the JS code: console.log() is executed first, and then console.log() calls JsPrint, so it satisfies the delay parse condition.

Debugging the program is the best way to verify the above conclusions. By debugging the ParseFunctionLiteral(), and watching the is_lazy and is_top_level members, you will also agree with the above conclusion. Figure 3 is the ParseFunctionLiteral’s call stack.

Figure 4 is the abstract syntax tree of JsPrint.

Okay, that wraps it up for this share. I’ll see you guys next time, take care!

Please reach out to me if you have any issues. WeChat: qq9123013 Email: v8blink@outlook.com

Let’s Understand Chrome V8 — Chapter 5: Compilation Parser

1. Parser

2. Lazy Parser