summaryrefslogtreecommitdiff
path: root/Libraries/LibJS/Token.cpp
AgeCommit message (Collapse)Author
2021-01-12Libraries: Move to Userland/Libraries/Andreas Kling
2020-10-29LibJS: Use GenericLexer for Token::string_value()Linus Groh
This is, and I can't stress this enough, a lot better than all the manual bounds checking and indexing that was going on before. Also fixes a small bug where "\u{}" wouldn't get rejected as invalid unicode escape sequence.
2020-10-28LibJS: Don't parse numeric literal containing 8 or 9 as octalLinus Groh
If the value has a leading zero (allowed in non-strict mode) but contains the digits 8 or 9 it can't be an octal number.
2020-10-25LibJS: Allow all line terminators to be used for line continuationsLinus Groh
2020-10-25LibJS: Parse line continuations in string literals properlyMarcin Gasperowicz
Newlines after line continuation were inserted into the string literals. This patch makes the parser ignore the newlines after \ and also makes it so that "use strict" containing a line continuation is not a valid "use strict".
2020-10-24LibJS: Support LegacyOctalEscapeSequence in string literalsLinus Groh
https://tc39.es/ecma262/#sec-additional-syntax-string-literals The syntax and semantics of 11.8.4 is extended as follows except that this extension is not allowed for strict mode code: Syntax EscapeSequence:: CharacterEscapeSequence LegacyOctalEscapeSequence NonOctalDecimalEscapeSequence HexEscapeSequence UnicodeEscapeSequence LegacyOctalEscapeSequence:: OctalDigit [lookahead ∉ OctalDigit] ZeroToThree OctalDigit [lookahead ∉ OctalDigit] FourToSeven OctalDigit ZeroToThree OctalDigit OctalDigit ZeroToThree :: one of 0 1 2 3 FourToSeven :: one of 4 5 6 7 NonOctalDecimalEscapeSequence :: one of 8 9 This definition of EscapeSequence is not used in strict mode or when parsing TemplateCharacter. Note It is possible for string literals to precede a Use Strict Directive that places the enclosing code in strict mode, and implementations must take care to not use this extended definition of EscapeSequence with such literals. For example, attempting to parse the following source text must fail: function invalid() { "\7"; "use strict"; }
2020-10-22LibJS: Support all line terminators (LF, CR, LS, PS)Linus Groh
https://tc39.es/ecma262/#sec-line-terminators
2020-10-04LibJS: Unify syntax highlightingLinus Groh
So far we have three different syntax highlighters for LibJS: - js's Line::Editor stylization - JS::MarkupGenerator - GUI::JSSyntaxHighlighter This not only caused repetition of most token types in each highlighter but also a lot of inconsistency regarding the styling of certain tokens: - JSSyntaxHighlighter was considering TokenType::Period to be an operator whereas MarkupGenerator categorized it as punctuation. - MarkupGenerator was considering TokenType::{Break,Case,Continue, Default,Switch,With} control keywords whereas JSSyntaxHighlighter just disregarded them - MarkupGenerator considered some future reserved keywords invalid and others not. JSSyntaxHighlighter and js disregarded most Adding a new token type meant adding it to ENUMERATE_JS_TOKENS as well as each individual highlighter's switch/case construct. I added a TokenCategory enum, and each TokenType is now associated to a certain category, which the syntax highlighters then can use for styling rather than operating on the token type directly. This also makes changing a token's category everywhere easier, should we need to do that (e.g. I decided to make TokenType::{Period,QuestionMarkPeriod} TokenCategory::Operator for now, but we might want to change them to Punctuation.
2020-08-14LibJS: Add missing reserved words to Token::is_identifier_name()Linus Groh
This is being used in match_identifier_name(), for example when parsing property keys - the list was incomplete, likely as some token types were added later, leading to some unexpected syntax errors: > var e = {}; undefined > e.extends = "a"; e.extends = "a"; ^ Uncaught exception: [SyntaxError]: Unexpected token Extends. Expected IdentifierName (line: 1, column: 3) Fixes #3128.
2020-08-05Unicode: Try s/codepoint/code_point/g againNico Weber
This time, without trailing 's'. Ran: git grep -l 'codepoint' | xargs sed -ie 's/codepoint/code_point/g
2020-08-05Revert "Unicode: s/codepoint/code_point/g"Nico Weber
This reverts commit ea9ac3155d1774f13ac4e9a96605c0e85a8f299e. It replaced "codepoint" with "code_points", not "code_point".
2020-08-03Unicode: s/codepoint/code_point/gAndreas Kling
Unicode calls them "code points" so let's follow their style.
2020-07-22LibJS: Fix \x escapes of bytes with high bit setNico Weber
With this, typing `"\xff"` into Browser's console no longer makes the app crash. While here, also make the \u handler call append_codepoint() instead of calling an overload where it's not immediately clear which overload is getting called. This has no behavior change.
2020-06-01LibJS: Fix out-of-bounds read when parsing escape sequencesSergey Bugaev
We cannot look at i+1'th character until we verify it's there.
2020-05-18LibJS: Handle hex and unicode escape sequences in string literalsMatthew Olsson
Introduces the following syntax: '\x55' '\u26a0' '\u{1f41e}'
2020-05-04LibJS: Add template literalsmattco98
Adds fully functioning template literals. Because template literals contain expressions, most of the work has to be done in the Lexer rather than the Parser. And because of the complexity of template literals (expressions, nesting, escapes, etc), the Lexer needs to have some template-related state. When entering a new template literal, a TemplateLiteralStart token is emitted. When inside a literal, all text will be parsed up until a '${' or '`' (or EOF, but that's a syntax error) is seen, and then a TemplateLiteralExprStart token is emitted. At this point, the Lexer proceeds as normal, however it keeps track of the number of opening and closing curly braces it has seen in order to determine the close of the expression. Once it finds a matching curly brace for the '${', a TemplateLiteralExprEnd token is emitted and the state is updated accordingly. When the Lexer is inside of a template literal, but not an expression, and sees a '`', this must be the closing grave: a TemplateLiteralEnd token is emitted. The state required to correctly parse template strings consists of a vector (for nesting) of two pieces of information: whether or not we are in a template expression (as opposed to a template string); and the count of the number of unmatched open curly braces we have seen (only applicable if the Lexer is currently in a template expression). TODO: Add support for template literal newlines in the JS REPL (this will cause a syntax error currently): > `foo > bar` 'foo bar'
2020-04-24LibJS: Add TokenType::TemplateLiteralLinus Groh
This is required for template literals - we're not quite there yet, but at least the parser can now tell us when this token is encountered - currently this yields "Unexpected token Invalid". Not really helpful. The character is a "backtick", but as we already have TokenType::{StringLiteral,RegexLiteral} this seemed like a fitting name. This also enables syntax highlighting for template literals in the js REPL and LibGUI's JSSyntaxHighlighter.
2020-04-18LibJS: Allow reserved words as keys in object expressions.Stephan Unverwerth
2020-04-05LibJS: Add numeric literal parsing for different bases and exponentsStephan Unverwerth
2020-04-04LibJS: Hack the lexer to allow numbers with decimalsAndreas Kling
This is very hackish and should definitely be improved. :^)
2020-03-30LibJS: Use some macro magic to avoid duplicating all the token typesAndreas Kling
2020-03-29LibJS: Lexer and parser support for "switch" statementsAndreas Kling
2020-03-21LibJS: Parse object expressions0xtechnobabble
2020-03-16LibJS: Implement null and undefined literals0xtechnobabble
2020-03-14LibJS: Unescape strings in Token::string_value()Stephan Unverwerth
2020-03-14LibJS: Strip double-quote characters from StringLiteral tokensAndreas Kling
This is very hackish since I'm just doing it to make progress on something else. :^)
2020-03-14LibJS: Lex single quote strings, escaped chars and unterminated stringsStephan Unverwerth
2020-03-14LibJS: Add missing tokens to name()Oriko
2020-03-14LibJS: Add operator precedence parsingStephan Unverwerth
Obey precedence and associativity rules when parsing expressions with chained operators.
2020-03-12LibJS: Add Javascript lexer and parserStephan Unverwerth
This adds a basic Javascript lexer and parser. It can parse the currently existing demo programs. More work needs to be done to turn it into a complete parser than can parse arbitrary JS Code. The lexer outputs tokens with preceeding whitespace and comments in the trivia member. This should allow us to generate the exact source code by concatenating the generated tokens. The parser is written in a way that it always returns a complete syntax tree. Error conditions are represented as nodes in the tree. This simplifies the code and allows it to be used as an early stage parser, e.g for parsing JS documents in an IDE while editing the source code.: