Parses the XML information and builds up the DOM tree in memory
providing a Tcl object command to this DOM document object. Example:
dom parse $xml doc
$doc documentElement root
parses the XML in the variable xml, creates the DOM tree in memory,
make a reference to the document object, visible in Tcl as a document object
command, and assigns this new object name to the variable doc. When doc gets
freed, the DOM tree and the associated Tcl command object (document and all
node objects) are freed automatically.
set document [dom parse $xml]
set root [$document documentElement]
parses the XML in the variable xml, creates the DOM tree in memory,
make a reference to the document object, visible in Tcl as a document object
command, and returns this new object name, which is then stored in
document. To free the underlying DOM tree and the associative Tcl
object commands (document + nodes + fragment nodes) the document object command
has to be explicitly deleted by:
$document delete
or
rename $document ""
The valid options are:
- -simple
- If -simple is specified, a simple but
fast parser is used (conforms not fully to XML
recommendation). That should double parsing and DOM
generation speed. The encoding of the data is not
transformed inside the parser. The simple parser does
not respect any encoding information in the XML
declaration. It skips over the internal DTD subset and
ignores any information in it. Therefore, it doesn't
include defaulted attribute values into the tree, even
if the according attribute declaration is in the
internal subset. It also doesn't expand internal or
external entity references other than the predefined
entities and character references
- -html
- If -html is specified, a fast HTML parser
is used, which tries to even parse badly formed HTML
into a DOM tree. If the HTML document given to parse
does not have a single root element (as it was legal
up to HTML 4.01) and the -forest option is not used
then a html node will be inserted as document element,
with the HTML input data top level elements as
children.
- -html5
- This option is only available if tDOM was build
with --enable-html5. Use the featureinfo method
if you need to know if this feature is build in. If
-html5 is specified, the gumbo lib html5 parser
(https://github.com/google/gumbo-parser) is used to
build the DOM tree. This is, as far as it goes, XML
namespace-aware (which means for example that all HTML
elements are in the html5 namespace). Since this
probably isn't wanted by a lot of users and adds only
burden for no good in a lot of use cases -html5
can be combined with -ignorexmlns, in which
case all nodes and attributes in the DOM tree are not
in an XML namespace. All tag and attribute names in
the DOM tree will be lower case, even for foreign
elements not in the xhtml, svg or mathml namespace.
The DOM tree may include nodes, that the parser
inserted because they are implied by the context (as
<head>, <tbody>, etc.). Input longer than
4 GByte byte length is not supported by the underlying
gumbo parser.
- -json
- If -json is specified, the data is
expected to be a valid JSON string (according to RFC
7159). The command returns an ordinary DOM document
with nesting token inside the JSON data translated
into tree hierarchy. If a JSON array value is itself
an object or array then container element nodes named
(in a default build) arraycontainer or
objectcontainer, respectively, are inserted into the
tree. The JSON serialization of this document (with
the domDoc method asJSON) is the same JSON
information as the data, preserving JSON
datatypes, allowing non-unique member names of objects
while preserving their order and the full range of
JSON string values. JSON datatype handling is done
with an additional property "sticking" at the doc and
tree nodes. This property isn't contained in an XML
serialization of the document. If you need to store
the JSON data represented by a document, store the
JSON serialization and parse it back from there. Apart
from this JSON type information the returned doc
command or handle is an ordinary DOM doc, which may be
investigated or modified with the full range of the
doc and node methods. Please note that the element
node names and the text node values within the tree
may be outside of what the appropriate XML productions
allow.
- -jsonroot <document element name>
- If given makes the given element name the
document element of the resulting doc. The parsed
content of the JSON string will be the children of
this document element node.
-
-jsonmaxnesting integer
- This option only has effect if used together
with the -json option. The current
implementation uses a recursive descent JSON parser.
In order to avoid using excess stack space, any JSON
input that has more than a certain levels of nesting
is considered invalid. The default maximum nesting is
2000. The option -jsonmaxnesting allows the user to
adjust that.
- --
- The option -- marks the end of options.
To give this option isn't strictly necessary even in
the case of JSON parsing, for which valid data may
start with a "-". If parsing json and if the second to
last or last argument start with a "-" and isn't a
known option name it will be treated as JSON
data.
- -keepEmpties
- If -keepEmpties is
specified then text nodes which contain only whitespaces will be part of the
resulting DOM tree. In default case (-keepEmpties not given) those empty
text nodes are removed at parsing time.
- -keepCDATA
- If -keepCDATA is
specified then CDATA sections aren't added to the tree as text nodes
(and, if necessary, combined with sibling text nodes into one text
node) as without this option but are added as CDATA_SECTION_NODEs to
the tree. Please note that the resulting tree isn't prepared for XPath
selects or to be the source or the stylesheet of an XSLT
transformation. If not combined with -keepEmpties only not
whitespace only CDATA sections will be added to the resulting DOM
tree.
-
-channel <channel-ID>
- If -channel <channel-ID> is specified, the
input to be parsed is read from the specified channel. The encoding setting of
the channel (via fconfigure -encoding) is respected, ie the data read from the
channel are converted to UTF-8 according to the encoding settings before the
data is parsed.
-
-baseurl <baseURI>
- If -baseurl <baseURI> is specified,
the baseURI is used as the base URI of the document.
External entities references in the document are
resolved relative to this base URI. This base URI is
also stored within the DOM tree.
-
-feedbackAfter <#bytes>
- If -feedbackAfter <#bytes> is
specified, the Tcl command given by
-feedbackcmd is evaluated at the first element
start within the document (or an external entity)
after the start of the document or external entity or
the last such call after #bytes. For backward
compatibility if no -feedbackcmd is given but there is
a Tcl proc named ::dom::domParseFeedback this proc is
used as -feedbackcmd. If there isn't such a proc and
-feedbackAfter is used it is an error to not also use
-feedbackcmd. If the called script raises error, then
parsing will be aborted, the dom parse call
returns error, with the script error msg as error msg.
If the called script return -code break, the
parsing will abort and the dom parse call will
return the empty string.
-
-feedbackcmd <script>
- If -feedbackcmd <script> is specified, the
script script is evaluated at the first
element start within the document (or an external entity) after the
start of the document or external entity or the last such call after
#bytes value given by the -feedbackAfter option. If
-feedbackAfter isn't given, using this option
doesn't has any effect. If the called
script raises error, then parsing will be aborted, the
dom parse call returns error, with the script
error msg as error msg. If the called script return
-code break, the parsing will abort and the dom
parse call will return the empty string.
-
-externalentitycommand <script>
- If -externalentitycommand <script> is
specified, the specified Tcl script is called to resolve any external entities
of the document. The actual evaluated command consists of this option followed
by three arguments: the base uri, the system identifier of the entity and the
public identifier of the entity. The base uri and the public identifier may be
the empty list. The script has to return a Tcl list consisting of three
elements. The first element of this list signals how the external entity is
returned to the processor. Currently the two allowed types are "string"
and "channel". The second element of the list has to be the (absolute) base URI
of the external entity to be parsed. The third element of the list are data,
either the already read data out of the external entity as string in the case
of type "string", or the name of a Tcl channel, in the case of type
"channel". Note that if the script returns a Tcl channel, it will not be closed
by the processor. It must be closed separately if it is no longer
needed.
-
-useForeignDTD <boolean>
- If
<boolean> is true and the document does not have
an external subset, the parser will call the
-externalentitycommand script with empty values for
the systemId and publicID arguments. Please note that
if the document also doesn't have an internal subset,
the -startdoctypedeclcommand and
-enddoctypedeclcommand scripts, if set, are not
called.
-
-paramentityparsing <always|never|notstandalone>
- The -paramentityparsing option controls,
if the parser tries to resolve the external entities
(including the external DTD subset) of the document
while building the DOM tree.
-paramentityparsing requires an argument, which
must be either "always", "never", or "notstandalone".
The value "always" means that the parser tries to
resolves (recursively) all external entities of the
XML source. This is the default in case
-paramentityparsing is omitted. The value
"never" means that only the given XML source is
parsed and no external entity (including the external
subset) will be resolved and parsed. The value
"notstandalone" means, that all external entities will
be resolved and parsed, with the exception of
documents, which explicitly states standalone="yes" in
their XML declaration.
- -forest
- If this option is given, there is no need for a
single root; any sequence of well-formed, balanced
subtrees will be parsed into a DOM tree. This works
for the expat DOM builder, the simple xml parser
enabled with -simple and the simple HTML parser
enabled -with -html. If used together with
-json or -html5 this option is ignored.
- -ignorexmlns
- It is recommended, that you only use this option
with the -html5 option. If this option is
given, no node within the created DOM tree will be
internally marked as placed into an XML Namespace,
even if there is a default namespace in scope for
un-prefixed elements or even if the element has a
defined namespace prefix. One consequence is that
XPath node expressions on such a DOM tree doesn't work
as may be expected. Prefixed element nodes can't be
selected naively and element nodes without prefix will
be seen by XPath expressions as if they are not in any
namespace (no matter if they are in fact should be in
a default namespace). If you need to inject prefixed
node names into an XPath expression use the '%' syntax
described in the documentation of the of the
domNode command method
>selectNodes.
-
-billionLaughsAttackProtectionMaximumAmplification <float>
- This option together with
-billionLaughsAttackProtectionActivationThreshold
gives control over the parser limits that protects
against billion laugh attacks
(https://en.wikipedia.org/wiki/Billion_laughs_attack).
This option expects a float >= 1.0 as argument. You
should never need to use this option, because the
default value (100.0) should work for any real data.
If you ever need to increase this value for non-attack
payload, please report.
-
-billionLaughsAttackProtectionActivationThreshold <long>
- This option together with
-billionLaughsAttackProtectionMaximumAmplification
gives control over the parser limits that protects
against billion laugh attacks
(https://en.wikipedia.org/wiki/Billion_laughs_attack).
This option expects a positiv integer as argument. You
should never need to use this option, because the
default value (8388608) should work for any real data.
If you ever need to increase this value for non-attack
payload, please report.
This method creates Tcl commands, which in turn create
tDOM nodes. Tcl commands created by this command are only
available inside a script given to the domNode methods
appendFromScript or insertBeforeFromScript. If
a command created with createNodeCmd is invoked in
any other context, it will return error. The created command
commandName replaces any existing command or
procedure with that name. If the commandName includes
any Tcl namespace qualifiers, it is created in the specified
namespace. The -tagName option is only allowed for
the elementNode type. The -jsonType option is only
allowed for elementNode and textNode types.
If such command is invoked inside a script given as argument to the
domNode method appendFromScript or
insertBeforeFromScript it creates a new node and appends this
node at the end of the child list of the invoking element node. If the
option -returnNodeCmd was given, the command returns the
created node as Tcl command. If this option was omitted, the command
returns nothing. Each command creates always the same type of node.
Which type of node is created by the command is determined by the
first argument to the createNodeCmd. The syntax of the created
command depends on the type of the node it creates.
If the command type to create is elementNode, the created
command will create an element node, if called. Without the
-tagName option the tag name of the created node is
commandName without Tcl namespace qualifiers. If the
-tagName option was given then the created elements will have
the value of this option as tag name. If the -jsonType option
was given then the created node elements will have the given JSON
type. If the -namespace option is given the created element
node will be XML namespaced and in the namespace given by the option.
The element name will be literal as given either by the command name
or the -tagname option, if that was given. An appropriate XML
namespace declaration will be automatically added, to bind the prefix
(if the element name has one) or the default namespace (if the element
name hasn't a prefix) to the namespace if such a binding isn't in
scope.
The syntax of the created command is:
elementNodeCmd ?attributeName attributeValue ...? ?script?
elementNodeCmd ?-attributeName attributeValue ...? ?script?
elementNodeCmd name_value_list script
The command syntax allows three different ways to specify the attributes of
the resulting element. These could be specified with attributeName
attributeValue argument pairs, in an "option style" way with
-attriubteName attributeValue argument pairs (the '-' character is only
syntactical sugar and will be stripped off) or as a Tcl list with elements
interpreted as attribute name and the corresponding attribute value.
The attribute name elements in the list may have a leading '-' character, which
will be stripped off.
Every elementNodeCmd accepts an optional Tcl script as last
argument. This script is evaluated as recursive appendFromScript script
with the node created by the elementNodeCmd as parent of all nodes
created by the script.
If the first argument of the method is textNode, the command
will create a text node. If the -jsonType option was given then
the created text node will have that JSON type. The syntax of the
created command is:
textNodeCmd ?-disableOutputEscaping? ?data?
If the json type of the created text node is NULL, TRUE or FALSE
then the data argument is optional, otherwise it this argument
must be given.
If the optional flag -disableOutputEscaping is given, the
escaping of the ampersand character (&) and the left angle bracket (<)
inside the data is disabled. You should use this flag carefully.
If the first argument of the method is commentNode or
cdataNode the command will create an comment node or CDATA section
node. The syntax of the created command is:
nodeCmd data
If the first argument of the method is piNode, the command will
create a processing instruction node. The syntax of the created
command is:
piNodeCmd target data
Beside the with dom createNodeCmd calls
created node commands there are two more commands which automatically
insert nodes into the tree inside an appendFromScript
script.
tdom::fsnewNode ?-jsonType <jsonType>? ?-namespace <namespace>? tagName ?attributes? ?script?
If called inside a fromScript context this command creates a new
node tagName in the XML namespace namespace if the
-namespace option was given and with the JSON type
jsonType if the -jsonType option was given and appends
this node at the end of the child list of the invoking element node.
The attributes and script arguments will be processed as
if given to an element creating node command. If called outside a
fromScript context this command will raise error.
tdom::fsinsertNode node
If called inside o fromScript context this comannd instead of
creating a new node appends the as argument given node at the end of
the child list of the invoking element node. The node is unlinked from
its previous place. If called outside a fromScript context this
command will raise error.