Age | Commit message (Collapse) | Author |
|
This prevents us from needing a sv suffix, and potentially reduces the
need to run generic code for a single character (as contains,
starts_with, ends_with etc. for a char will be just a length and
equality check).
No functional changes.
|
|
Each of these strings would previously rely on StringView's char const*
constructor overload, which would call __builtin_strlen on the string.
Since we now have operator ""sv, we can replace these with much simpler
versions. This opens the door to being able to remove
StringView(char const*).
No functional changes.
|
|
Accessing last() if there are no elements makes WebContent crash :^)
|
|
|
|
|
|
|
|
|
|
These are part of HTML, not CSS, so let's not confuse things.
|
|
Since this macro was created we gained a couple more parsers in the
system :^)
|
|
HTMLObjectElement will need to be both a FormAssociatedElement and a
BrowsingContextContainer. Currently, both of these classes inherit from
HTMLElement. This can work in C++, but is generally frowned upon, and
doesn't play particularly well with the rest of LibWeb.
Instead, we can essentially revert commit 3bb5c62 to remove HTMLElement
from FormAssociatedElement's hierarchy. This means that objects such as
HTMLObjectElement individually inherit from FormAssociatedElement and
HTMLElement now.
Some caveats are:
* FormAssociatedElement still needs to know when the HTMLElement is
inserted into and removed from the DOM. This hook is automatically
injected via a macro now, while still allowing classes like
HTMLInputElement to also know when the element is inserted.
* Casting from a DOM::Element to a FormAssociatedElement is now a
sideways cast, rather than directly following an inheritance chain.
This means static_cast cannot be used here; but we can safely use
dynamic_cast since the only 2 instances of this already use RTTI to
verify the cast.
|
|
This ripples down to LibWeb's HTML and XHR decoders, which therefore
become less allocation heavy.
|
|
The HTML Specification is quite tricky in this case.
Usually "have a particular element in <x> scope" mentions
"consisting of the following element types:", but in this case it's
"consisting of all element types except the following:"
Thanks to @AtkinsSJ for spotting this difference
|
|
This gets us 2 points on html5test.com :^)
- Before: https://html5te.st/4cf57659bc08272e (208)
- After: https://html5te.st/fb8a9259bda1c115 (210)
|
|
We shouldn't delay the load event for scripts that we're completely
refusing to run anyway. Also, for scripts that have inline text content,
we don't need to delay them either, as they will become ready before
returning from "prepare script".
This makes the "load" event finally fire on lots of websites, including
Wikipedia. :^)
|
|
We previously had a bug where markup with unclosed script tags caused
the document load event to be delayed indefinitely. Fix this by only
marking script elements as delaying the load event once we encounter
the script end tag.
|
|
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#cother-other-default-operation-rules
"The compiler is more likely to get the default semantics right and
you cannot implement these functions better than the compiler."
|
|
This makes JS exception line numbers meaningful for inline script tags.
|
|
The Window object is part of the HTML spec. :^)
https://html.spec.whatwg.org/multipage/window-object.html
|
|
We were storing double-quoted system ID's in the public ID field.
1% progression on ACID3. :^)
|
|
This makes it available for all form associated elements and not just
select and input elements. It also makes it more spec compliant,
especially around the form attribute.
The main thing missing is re-associating form elements with a form
attribute when the form attribute changes or an element with an ID
is inserted/removed or has its ID changed.
|
|
This necessitated making HTMLParser ref-counted, and having it register
itself with Document when created. That makes it possible for scripts to
add new input at the current parser insertion point.
There is now a reference cycle between Document and HTMLParser. This
cycle is explicitly broken by calling Document::detach_parser() at the
end of HTMLParser::run().
This is a huge progression on ACID3, from 31% to 49%! :^)
|
|
This implements basic support for dynamic markup insertion, adding
* Document::open()
* Document::write(Vector<String> const&)
* Document::writeln(Vector<String> const&)
* Document::close()
The HTMLParser is modified to make it possible to create a
script-created parser which initially only contains a HTMLTokenizer
without any data. Aditionally the HTMLParser::run method gains an
overload which does not modify the Document and does not run
HTMLParser::the_end() so that we can reenter the parser at a later time.
Furthermore all FIXMEs that consern the insertion point are implemented
wich is defined in the HTMLTokenizer. Additionally the following
member-variables of the HTMLParser are now exposed by getter funcions:
* m_tokenizer
* m_aborted
* m_script_nesting_level
The HTMLTokenizer is modified so that it contains an insertion
point which keeps track of where the next input from the Document::write
functions will be inserted. The insertion point is implemented as the
charakter offset into m_decoded_input and a boolean describing if the
insertion point is defined. Functions to update, check and {re}store the
insertion point are also added.
The function HTMLTokenizer::insert_eof is added to tell a script-created
parser that document::close was called and HTMLParser::the_end() should
be called.
Lastly an explicit default constructor is added to HTMLTokenizer to
create a empty HTMLTokenizer into which data can be inserted.
|
|
Also, update the expected hash in the LibWeb TestHTMLTokenizer
regression test.
This is due to the "This comment has a few too many dashes." comment
token being updated.
|
|
Newline normalization will replace \r and \r\n with \n.
The spec specifically states
> Before the tokenization stage, the input stream must be preprocessed
> by normalizing newlines.
wheras this is implemented the processing during the tokenization
itself.
This should still exhibit the same behaviour, while keeping the
tokenization logic in the same place.
|
|
In 'NamedCharacterReference' we attempt to lookup the code point by a
identifier, eg apos; becomes '
This is done by passing the entire rest of the document to the
`HTML::code_points_from_entity` function.
However, before this change we didn't sent the final character which
meant if the document ended in a named character reference the lookup
would fail.
|
|
The entry we get from the active formatting elements list during the
Rewind step of "reconstruct the active formatting elements" can be a
marker. Previously we assumed it was not a marker, which can trigger
an assertion failure with certain malformed HTML.
If the entry in this step is a marker, the spec simply ignores it.
This is step 6 of the algorithm.
This also makes the index unsigned, as this algorithm is a no-op if
the list is empty.
Additionally, this also adds spec comments to this algorithm.
Fixes #12668.
|
|
This avoids constantly reallocating the Vector<HTMLToken>.
|
|
Pages such as https://html5test.com are testing all sorts of weird,
incomplete, and wrong HTML but can be useful or at least interesting for
development - let's try to avoid crashing the process.
|
|
|
|
This is needed to access the 'adjusted current node' in the 'Markup
declaration open state'. We don't want to create a full parser for
something like syntax highlighting, so it's optional (null) by default.
|
|
There is no implementation of this function:
HTMLParser::stack_of_open_elements_has_element_with_tag_name_in_scope
|
|
I didn't add full spec comments this time, but this is better than
nothing :^)
|
|
|
|
This matches the spec terminology around the "stack of open elements".
|
|
|
|
Emitting tokens on EOF caused an infinite loop, freezing the app, which
could be a bit annoying when writing an HTML comment at the end of
the file in Text Editor. :^)
|
|
Commit b193351a99 caused the HTML comments to flash when changing
the text cursor. Also, when double-clicking on a comment, the selection
started from the beginning of the file instead.
The following message was displaying when `TOKENIZER_TRACE_DEBUG`
was enabled:
(Tokenizer::nth_last_position) Invalid position requested: 4th-last
of 4. Returning (0-0).
Changing the `nth_last_position` to 3 fixes this. I'm guessing that's
because the parser is at that moment on the second hyphen of the `<!--`
string, so it has to go back only by three characters.
|
|
The difference should be between m_utf8_iterator and the
the new position, if m_prev_utf8_iterator is used one fewer
source position is popped than required.
This issue was not apparent on most pages since restore_to
used for tokens such <!doctype> that are normally
followed by a newline that resets the column to zero,
but it can be seen on pages with minified HTML.
|
|
The environment settings object is effectively the context a piece of
script is running under, for example, it contains the origin,
responsible document, realm, global object and event loop for the
current context. This effectively replaces ScriptExecutionContext, but
it cannot be removed in this commit as EventTarget still depends on it.
https://html.spec.whatwg.org/multipage/webappapis.html#environment-settings-object
|
|
This fixes #11166
|
|
|
|
|
|
Note our Attribute class is what the spec refers to as just "Attr". The
main differences between the existing implementation and the spec are
just that the spec defines more fields.
Attributes can contain namespace URIs and prefixes. However, note that
these are not parsed in HTML documents unless the document content-type
is XML. So for now, these are initialized to null. Web pages are able to
set the namespace via JavaScript (setAttributeNS), so these fields may
be filled in when the corresponding APIs are implemented.
The main change to be aware of is that an attribute is a node. This has
implications on how attributes are stored in the Element class. Nodes
are non-copyable and non-movable because these constructors are deleted
by the EventTarget base class. This means attributes cannot be stored in
a Vector or HashMap as these containers assume copyability / movability.
So for now, the Vector holding attributes is changed to hold RefPtrs to
attributes instead. This might change when attribute storage is
implemented according to the spec (by way of NamedNodeMap).
|
|
|
|
This particularly implements these two points:
- "If the adjusted current node is an HTML integration point and the
token is a start tag"
- "If the adjusted current node is an HTML integration point and the
token is a character token"
This also adds spec comments to the tree constructor.
|
|
We now fire "pageshow" events at the appropriate time during document
loading (done by the parser.)
Note that there are no corresponding "pagehide" events yet.
|
|
This will be used to determine whether "pageshow" and "pagehide" events
are appropriate. We won't actually make use of it until we implement
more of history traversal and document unloading.
|
|
The only difference from what we were already doing is that setting the
same ready state twice no longer fires a "readystatechange" event.
I don't think that could happen in practice though.
|
|
|
|
We will now spin in "the end" until there are no more "things delaying
the load event". Of course, nothing actually uses this yet, and there
are a lot of things that need to.
|