Age | Commit message (Collapse) | Author |
|
The Lexer constructor calls consume() once, which initializes m_position
to be > 0 and sets m_character. consume() calls is_line_terminator(),
which wasn't accounting for this state.
|
|
B.1.3 HTML-like Comments
The syntax and semantics of 11.4 is extended as follows except that this
extension is not allowed when parsing source code using the goal symbol
Module:
Syntax (only relevant part included)
SingleLineHTMLCloseComment ::
LineTerminatorSequence HTMLCloseComment
HTMLCloseComment ::
WhiteSpaceSequence[opt] SingleLineDelimitedCommentSequence[opt] --> SingleLineCommentChars[opt]
Fixes #3810.
|
|
|
|
|
|
This allows us to communicate details about invalid tokens to the parser
without having to invent a bunch of specific invalid tokens like
TokenType::InvalidNumericLiteral.
|
|
https://tc39.es/ecma262/#sec-line-terminators
|
|
|
|
i.e. "1e" "0x" "0b" "0o" used to be parsed as valid literals.
They now produce invalid tokens. Fixes #3716
|
|
It would be cool to solve this in a general way so that looking up
a string literal or StringView in a HashMap with String keys avoids
creating a temp string.
For now, this patch simply addresses the issue in JS::Lexer.
This is a 2-3% speed-up on test-js.
|
|
TC39 proposal, stage 4 as of 2020-07.
https://tc39.es/proposal-logical-assignment/
|
|
This broke in case of unterminated regular expressions, causing goofy location
numbers, and 'source_location_hint' to eat up all memory:
Unexpected token UnterminatedRegexLiteral. Expected statement (line: 2, column: 4294967292)
|
|
This prevents a regex such as /=/ from lexing into TokenType::SlashEquals,
preventing the regex logic from working.
|
|
|
|
|
|
|
|
This adds regex parsing/lexing, as well as a relatively empty
RegExpObject. The purpose of this patch is to allow the engine to not
get hung up on parsing regexes. This will aid in finding new syntax
errors (say, from google or twitter) without having to replace all of
their regexes first!
|
|
- initializing m_line_column to 1 in the lexer results in incorrect
column values in tokens on the first line of input.
- not incrementing m_line_column when EOF is reached results in
an incorrect column value on the last token.
|
|
Giving the lexer the ability to generate errors adds unnecessary
complexity - also it only calls its syntax_error() function in one place
anyway ("unterminated string literal"). But since the lexer *also* emits
tokens like Eof or UnterminatedStringLiteral, it should be up to the
consumer of these tokens to decide what to do.
Also remove the option to not print errors to stderr as that's not
relevant anymore.
|
|
Some of these are required for syntax we have not implemented yet, some
are future reserved words in strict mode.
|
|
|
|
|
|
Adds fully functioning template literals. Because template literals
contain expressions, most of the work has to be done in the Lexer rather
than the Parser. And because of the complexity of template literals
(expressions, nesting, escapes, etc), the Lexer needs to have some
template-related state.
When entering a new template literal, a TemplateLiteralStart token is
emitted. When inside a literal, all text will be parsed up until a '${'
or '`' (or EOF, but that's a syntax error) is seen, and then a
TemplateLiteralExprStart token is emitted. At this point, the Lexer
proceeds as normal, however it keeps track of the number of opening
and closing curly braces it has seen in order to determine the close
of the expression. Once it finds a matching curly brace for the '${',
a TemplateLiteralExprEnd token is emitted and the state is updated
accordingly.
When the Lexer is inside of a template literal, but not an expression,
and sees a '`', this must be the closing grave: a TemplateLiteralEnd
token is emitted.
The state required to correctly parse template strings consists of a
vector (for nesting) of two pieces of information: whether or not we
are in a template expression (as opposed to a template string); and
the count of the number of unmatched open curly braces we have seen
(only applicable if the Lexer is currently in a template expression).
TODO: Add support for template literal newlines in the JS REPL (this will
cause a syntax error currently):
> `foo
> bar`
'foo
bar'
|
|
|
|
Implement the syntax and behavor necessary to support array literals
such as [...[1, 2, 3]]. A type error is thrown if the target of the
spread operator does not evaluate to an array (though it should
eventually just check for an iterable).
Note that the spread token's name is TripleDot, since the '...' token is
used for two features: spread and rest. Calling it anything involving
'spread' or 'rest' would be a bit confusing.
|
|
This is required for template literals - we're not quite there yet, but at
least the parser can now tell us when this token is encountered -
currently this yields "Unexpected token Invalid". Not really helpful.
The character is a "backtick", but as we already have
TokenType::{StringLiteral,RegexLiteral} this seemed like a fitting name.
This also enables syntax highlighting for template literals in the js
REPL and LibGUI's JSSyntaxHighlighter.
|
|
|
|
|
|
|
|
|
|
|
|
While debugging test failures, it's pretty frustrating to have to go do
printf debugging to figure out what test is failing right now. While
watching your JS Raytracer stream it seemed like this was pretty
furstrating as well. So I wanted to start working on improving the
diagnostics here.
In the future I hope we can eventually be able to plumb the info down
to the Error classes so any thrown exceptions will contain enough
metadata to know where they came from.
|
|
|
|
This is very hackish and should definitely be improved. :^)
|
|
There is no such thing as a "undefined literal" in JS - undefined is
just a property on the global object with a value of undefined.
This is pretty similar to NaN.
var undefined = "foo"; is a perfectly fine AssignmentExpression :^)
|
|
|
|
|
|
You can now throw an expression to the nearest catcher! :^)
To support throwing arbitrary values, I added an Exception class that
sits as a wrapper around whatever is thrown. In the future it will be
a logical place to store a call stack.
|
|
|
|
|
|
|
|
|
|
Obey precedence and associativity rules when parsing expressions
with chained operators.
|
|
|
|
Before this commit the last character in a file would be swallowed.
This also fixes parsing of empty files which would previously ASSERT.
|
|
|
|
|
|
|
|
This still includes the double-quote characters (") but at least the
AST comes out right.
|
|
This adds a basic Javascript lexer and parser. It can parse the
currently existing demo programs. More work needs to be done to
turn it into a complete parser than can parse arbitrary JS Code.
The lexer outputs tokens with preceeding whitespace and comments
in the trivia member. This should allow us to generate the exact
source code by concatenating the generated tokens.
The parser is written in a way that it always returns a complete
syntax tree. Error conditions are represented as nodes in the
tree. This simplifies the code and allows it to be used as an
early stage parser, e.g for parsing JS documents in an IDE while
editing the source code.:
|