Index: CHANGES ================================================================== --- CHANGES +++ CHANGES @@ -1,6 +1,37 @@ +2017-08-21 Ashok Nadkarni + Windows build system (VC and mingw) modernised. + +2017-08-17 Rolf Ade + + New feature "creating real FQ nodes with *fromScript methods", + by adding option -namespace to [dom createNodeCmd]. + +2017-08-14 Rolf Ade + + Updated TEA. + +2017-07-29 Rolf Ade + + Removed hacky check on [load] time if the tclsh and tDOM are + build with incompatible TCL_UTF_MAX (because it did not work + anymore with recent tcl because of changes in core). + +2017-07-28 Rolf Ade + + Added JSON support. New -json option to [dom parse]. New doc + method asJSON. New node method jsonType. New option -jsonType + of [dom createNodeCmd]. New option -tagName of [dom + createNodeCmd]. New option -jsonType to dom method + createDocumentNode. + +2017-04-06 Rolf Ade + + Added HTM5 parser (new -html5 option to [dom parse]). Requires + gumbo lib and must be enabled at configure time. + 2016-10-01 Rolf Ade Updated to expat 2.2.0. 2015-09-11 Rolf Ade Index: ChangeLog ================================================================== --- ChangeLog +++ ChangeLog @@ -1,9 +1,8 @@ NOTICE: This file isn't kept up to date anymore. Look at the timeline -of the leading fossil repository -(https://46.163.78.80/cgi-bin/repros/tdom/timeline) or at the backup +of the leading fossil repository (http://tdom.org) or at the backup repository at https://core.tcl.tk/tdom/timeline for detailed lists of code changes. User interface changes/enhancements and other important changes will still be documented in the CHANGES file. Index: README ================================================================== --- README +++ README @@ -1,83 +1,116 @@ - tDOM - a XML/DOM/XPath/XSLT implementation for Tcl - (Version 0.8.4) + tDOM - a XML/DOM/XPath/XSLT/HTML/JSON implementation for Tcl + (Version 0.9.0) - Jochen Loewer (loewerj@hotmail.com) - Rolf Ade (rolf@pointsman.de) - with some contributions by: +This directory contains a freely distributable thread-safe extension +to Tcl/Tk called tDOM. - Zoran Vasiljevic (zv@archiware.com) - - -This directory contains a freely distributable (under the Mozilla Public -License) thread-safe extension to Tcl/Tk called tDOM. +tDOM was started by Jochen Loewer (loewerj@hotmail.com) and developed +by Jochen and Rolf Ade (rolf@pointsman.de) with contributions by Zoran +Vasiljevic (zv@archiware.com). Since more than a decade it is +maintained and developed by Rolf Ade. tDOM contains: - * the newest version of Expat, the XML parser from James Clark, - including namespace and DTD support. + * for convenience expat 2.2.0, the XML parser originated from + James Clark, although you're able to link tDOM with other + expat versions or the library provided by the system. - * a modified version of Steve Ball's Tclexpat, the Tcl interface to - expat, for event-like (SAX-like) XML parsing. The modifications - are for performance improvements, to make the newest Expat - features (XML namespace) available and for some additional features. + * building a DOM tree from XML in one go implemented in C for + maximum performance and minimum memory usage, and DOM I and II + methods to work on such a tree using either a OO-like or a + handle syntax. - * a (partial) DOM I and II implementation in C for maximum - performance and minimum memory need following the W3C DOM Core - Level 1 recommendation using a OO-like syntax. + * a Tcl interface to expat for event-like (SAX-like) XML parsing. - * a very complete, compliant and fast XPath implementation in C - following the November 99 W3C recommendation. + * a complete, compliant and fast XPath implementation in C + following the November 99 W3C recommendation for navigating and + data extraction. * a fast XSLT implementation in C following the W3C Recommendation 16 November 1999. - * a (partial) implementation in C of the XPointer (97) navigational - functions. + * optional DTD validation. - * UTF-8 to 8 bit encoding back conversion functionality to support - Tcl version < 8.1x + * a JSON parser which parses any possible JSON input into a DOM + tree without losing information. - * optional DTD validation + * an efficient and Tcl'ish way to create XML and HTML documents + and JSON string. - * additional convenience methods + * as build option an interface to the gumbo HTML5 parser, which + also digests almost any other HTML. + + * an even faster simple XML parser for trusted XML input. + + * additional convenience methods. - * documentation in TMML, HTML and nroff format + * and more. + + +DOCUMENTATION + + The documentation is included into the source distribution in HTML + and man format. Alternatively, read it online starting at + http://tdom.org/index.html/doc/tdom-0-9-0/doc/index.html + + +GETTING THE CODE + + The development repository is hosted at http://tdom.org and is + mirrored at http://core.tcl.tk/tdom. You are encouraged to use + trunk. + + If you insist on using an older tDOM with lesser features and + probably more bugs, you should use the latest release 0.9.0. Get + the source code release from + http://tdom.org/downloads/tdom-0.9.0-src.tgz or + http://tdom.org/downloads/tdom-0.9.0-src.zip + + Windows binaries (32 bit as well as 64 bit) of the 0.9.0 release + are also available. Get it from + http://tdom.org/downloads/tdom-0.9-windows-x64.zip and + http://tdom.org/downloads/tdom-0.9-windows-x86.zip + + The provided windows binaries include (statically linked) the + HTML5 parser. -COMPILING/USING tDOM +COMPILING tDOM - Depending on your platform, (unix or win) go to the corresponding - directory and invoke the configure script: + Depending on your platform (unix/mac or win), go to the + corresponding directory and invoke the configure script: ../configure make make test make install Alternatively, you can build the tDOM package in just about any directory elsewhere on the fileystem (since TEA-compatible). - - Don't build against Tcl 8.6.2 (or Tcl 8.5.16). This tcl releases - had bugs in the I/O system, that may bite you while using tDOM. - You might also want to do "../configure --help" to get list of all - supported options of the configure script. In the "unix" directory - there is a "CONFIG" file containing some examples on how to invoke - the "configure" script for some common cases. You can peek - there. This file also includes a short description of the tDOM - specific configure options. + You might also want to do "../configure --help" to get a list of + all supported options of the configure script. In the "unix" + directory there is a "CONFIG" file containing some examples on how + to invoke the "configure" script for some common cases. You can + peek there. This file also includes a short description of the + tDOM specific configure options. Since tDOM is TEA-compatible you should be able to build it using the MinGW build environment for Windows. There is also the MSVC nmake file so you can compile the package with Microsoft tools. + Refer to the README in the win directory for more details about + building on Windows. The compile process will build the tDOM shared library suitable for loading into the Tcl shell using standard "package require" mechanism. -Have fun! + +REPORTING BUGS -- EOF - + Please head to http://tdom.org/index.html/ticket and click on "New + Ticket". Log in as anonymous and report your findings. If you + prefer to have an individual login write Rolf a mail. Index: README.AOL ================================================================== --- README.AOL +++ README.AOL @@ -1,9 +1,9 @@ tDOM - a XML/DOM/XPath/XSLT implementation for Tcl - (Version 0.8.3) + (Version 0.9.0) Jochen Loewer (loewerj@hotmail.com) Rolf Ade (rolf@pointsman.de) with some contributions by: Index: configure ================================================================== --- configure +++ configure @@ -1,8 +1,8 @@ #! /bin/sh # Guess values for system-dependent variables and create Makefiles. -# Generated by GNU Autoconf 2.69 for tdom 0.8.4. +# Generated by GNU Autoconf 2.69 for tdom 0.9.0. # # # Copyright (C) 1992-1996, 1998-2012 Free Software Foundation, Inc. # # @@ -575,12 +575,12 @@ MAKEFLAGS= # Identity of this package. PACKAGE_NAME='tdom' PACKAGE_TARNAME='tdom' -PACKAGE_VERSION='0.8.4' -PACKAGE_STRING='tdom 0.8.4' +PACKAGE_VERSION='0.9.0' +PACKAGE_STRING='tdom 0.9.0' PACKAGE_BUGREPORT='' PACKAGE_URL='' # Factoring default headers for most tests. ac_includes_default="\ @@ -1307,11 +1307,11 @@ # if test "$ac_init_help" = "long"; then # Omit some internal or obsolete options to make the list less imposing. # This message is too long to be a string in the A/UX 3.1 sh. cat <<_ACEOF -\`configure' configures tdom 0.8.4 to adapt to many kinds of systems. +\`configure' configures tdom 0.9.0 to adapt to many kinds of systems. Usage: $0 [OPTION]... [VAR=VALUE]... To assign environment variables (e.g., CC, CFLAGS...), specify them as VAR=VALUE. See below for descriptions of some of the useful variables. @@ -1368,11 +1368,11 @@ _ACEOF fi if test -n "$ac_init_help"; then case $ac_init_help in - short | recursive ) echo "Configuration of tdom 0.8.4:";; + short | recursive ) echo "Configuration of tdom 0.9.0:";; esac cat <<\_ACEOF Optional Features: --disable-option-checking ignore unrecognized --enable/--with options @@ -1479,11 +1479,11 @@ fi test -n "$ac_init_help" && exit $ac_status if $ac_init_version; then cat <<\_ACEOF -tdom configure 0.8.4 +tdom configure 0.9.0 generated by GNU Autoconf 2.69 Copyright (C) 2012 Free Software Foundation, Inc. This configure script is free software; the Free Software Foundation gives unlimited permission to copy, distribute and modify it. @@ -1844,11 +1844,11 @@ } # ac_fn_c_check_header_mongrel cat >config.log <<_ACEOF This file contains any messages produced by compilers while running configure, to aid debugging if configure makes a mistake. -It was created by tdom $as_me 0.8.4, which was +It was created by tdom $as_me 0.9.0, which was generated by GNU Autoconf 2.69. Invocation command line was $ $0 $@ _ACEOF @@ -2210,16 +2210,16 @@ $as_echo_n "checking for correct TEA configuration... " >&6; } if test x"${PACKAGE_NAME}" = x ; then as_fn_error $? " The PACKAGE_NAME variable must be defined by your TEA configure.ac" "$LINENO" 5 fi - if test x"3.9" = x ; then + if test x"3.10" = x ; then as_fn_error $? " TEA version not specified." "$LINENO" 5 - elif test "3.9" != "${TEA_VERSION}" ; then - { $as_echo "$as_me:${as_lineno-$LINENO}: result: warning: requested TEA version \"3.9\", have \"${TEA_VERSION}\"" >&5 -$as_echo "warning: requested TEA version \"3.9\", have \"${TEA_VERSION}\"" >&6; } + elif test "3.10" != "${TEA_VERSION}" ; then + { $as_echo "$as_me:${as_lineno-$LINENO}: result: warning: requested TEA version \"3.10\", have \"${TEA_VERSION}\"" >&5 +$as_echo "warning: requested TEA version \"3.10\", have \"${TEA_VERSION}\"" >&6; } else { $as_echo "$as_me:${as_lineno-$LINENO}: result: ok (TEA ${TEA_VERSION})" >&5 $as_echo "ok (TEA ${TEA_VERSION})" >&6; } fi @@ -5549,11 +5549,15 @@ if test "$HAVEGUMBO" = "1" ; then { $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5 $as_echo "yes" >&6; } $as_echo "#define TDOM_HAVE_GUMBO 1" >>confdefs.h - HTML5_LIBS="`pkg-config --cflags --libs gumbo`" + if test "${TEA_PLATFORM}" = "windows" ; then + HTML5_LIBS="-Wl,-Bstatic `pkg-config --static --cflags --libs gumbo` -Wl,-Bdynamic" + else + HTML5_LIBS="`pkg-config --cflags --libs gumbo`" + fi else as_fn_error $? "The required lib gumbo not found" "$LINENO" 5 fi else { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 @@ -5699,12 +5703,10 @@ # A few miscellaneous platform-specific items: # TEA_ADD_* any platform specific compiler/build info here. #-------------------------------------------------------------------- if test "${TEA_PLATFORM}" = "windows" ; then - $as_echo "#define BUILD_tdom 1" >>confdefs.h - CLEANFILES="pkgIndex.tcl *.lib *.dll *.exp *.ilk *.pdb vc*.pch" #TEA_ADD_SOURCES([win/winFile.c]) #TEA_ADD_INCLUDES([-I\"$(${CYGPATH} ${srcdir}/win)\"]) else CLEANFILES="pkgIndex.tcl tdomConfig.sh tdom.tcl tcldomsh" @@ -9696,11 +9698,11 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1 # Save the log message, to keep $0 and so on meaningful, and to # report actual input values of CONFIG_FILES etc. instead of their # values after options handling. ac_log=" -This file was extended by tdom $as_me 0.8.4, which was +This file was extended by tdom $as_me 0.9.0, which was generated by GNU Autoconf 2.69. Invocation command line was CONFIG_FILES = $CONFIG_FILES CONFIG_HEADERS = $CONFIG_HEADERS CONFIG_LINKS = $CONFIG_LINKS @@ -9749,11 +9751,11 @@ _ACEOF cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`" ac_cs_version="\\ -tdom config.status 0.8.4 +tdom config.status 0.9.0 configured by $0, generated by GNU Autoconf 2.69, with options \\"\$ac_cs_config\\" Copyright (C) 2012 Free Software Foundation, Inc. This config.status script is free software; the Free Software Foundation Index: configure.in ================================================================== --- configure.in +++ configure.in @@ -17,19 +17,19 @@ # so you can encode the package version directly into the source files. # This will also define a special symbol for Windows (BUILD_sample in # this case) so that we create the export library with the dll. #----------------------------------------------------------------------- -AC_INIT([tdom], [0.8.4]) +AC_INIT([tdom], [0.9.0]) #-------------------------------------------------------------------- # Call TEA_INIT as the first TEA_ macro to set up initial vars. # This will define a ${TEA_PLATFORM} variable == "unix" or "windows" # as well as PKG_LIB_FILE and PKG_STUB_LIB_FILE. #-------------------------------------------------------------------- -TEA_INIT([3.9]) +TEA_INIT([3.10]) AC_CONFIG_AUX_DIR(tclconfig) #-------------------------------------------------------------------- # Load the tclConfig.sh file @@ -127,11 +127,10 @@ # A few miscellaneous platform-specific items: # TEA_ADD_* any platform specific compiler/build info here. #-------------------------------------------------------------------- if test "${TEA_PLATFORM}" = "windows" ; then - AC_DEFINE(BUILD_tdom) CLEANFILES="pkgIndex.tcl *.lib *.dll *.exp *.ilk *.pdb vc*.pch" #TEA_ADD_SOURCES([win/winFile.c]) #TEA_ADD_INCLUDES([-I\"$(${CYGPATH} ${srcdir}/win)\"]) else CLEANFILES="pkgIndex.tcl tdomConfig.sh tdom.tcl tcldomsh" Index: doc/category-index.html ================================================================== --- doc/category-index.html +++ doc/category-index.html @@ -1,19 +1,19 @@ -tDOM manual: Index +tDOM manual: Index

tDOM manual: Index

+Tcl commands · C functions ·
Index: doc/dom.html ================================================================== --- doc/dom.html +++ doc/dom.html @@ -1,22 +1,22 @@ -tDOM manual: dom +tDOM manual: dom
-

NAME

+

NAME

dom -
Create an in-memory DOM tree from XML

-

SYNOPSIS

package require tdom
+  

SYNOPSIS

package require tdom
 
 dom method ?arg arg ...?
-

DESCRIPTION

This command provides the creation of DOM trees in memory. In +

DESCRIPTION

This command provides the creation of DOM trees in memory. In the usual case a string containing a XML information is parsed and converted into a DOM tree. Other possible parse input may be HTML or JSON. The method indicates a specific subcommand.

The valid methods are:

@@ -57,11 +57,11 @@
If -simple is specified, a simple but fast parser is used (conforms not fully to XML recommendation). That should double parsing and DOM generation speed. The encoding of the data is not transformed inside the parser. The simple parser does not respect any encoding information in the XML declaration. It skips over -the internal DTD subset and ignores any information in it. Therefor it doesn't +the internal DTD subset and ignores any information in it. Therefore it doesn't include defaulted attribute values into the tree, even if the according attribute declaration is in the internal subset. It also doesn't expand internal or external entity references other than the predefined entities and character references.
@@ -72,25 +72,25 @@ used, which tries to even parse badly formed HTML into a DOM tree.
-html5
-
This option is only available, if tDOM was build +
This option is only available if tDOM was build with --enable-html5. Try the featureinfo method - if you need to know, if this feature is build in. If + if you need to know if this feature is build in. If -html5 is specified, the gumbo lib html5 parser - (https://github.com/google/gumbo-parser) is used, to + (https://github.com/google/gumbo-parser) is used to build the DOM tree. This is, as far as it goes, XML - namespace aware. Since this probably isn't wanted by a + namespace-aware. Since this probably isn't wanted by a lot of users and adds only burden for no good in a lot of use cases -html5 can be combined with -ignorexmlns, in which case all nodes and attributes in the DOM tree are not in an XML namespace. All tag and attribute names in the DOM tree will be lower case, even for foreign elements not in the xhtml, svg or mathml namespace. The DOM tree may - include nodes, that the parser inserted, because they + include nodes, that the parser inserted because they are implied by the context (as <head>, <tbody>, etc.).
@@ -120,15 +120,32 @@ doc and node methods. Please note that the element node names and the text node values within the tree may be outside of what the appropriate XML productions allow. + +
+-jsonmaxnesting integer +
+ +
This options only has effect if used together + with the -json option. The current implementation uses recursive descent JSON parser. In order to avoid using excess stack space, any JSON input that has more than a certain levels of nesting is considered invalid. The default maximum nesting is 2000. The option -jsonmaxnesting allows the user to adjust that.
+ + + +
--
+
The option -- marks the end of options. + While respected in general this option is only needed + in case of parsing JSON data, which may start with a + "-".
+ +
-keepEmpties
If -keepEmpties is -specified, text nodes, which contain only whitespaces, will be part of the +specified then text nodes which contain only whitespaces will be part of the resulting DOM tree. In default case (-keepEmpties not given) those empty text nodes are removed at parsing time.
@@ -137,43 +154,47 @@
If -channel <channel-ID> is specified, the input to be parsed is read from the specified channel. The encoding setting of the channel (via fconfigure -encoding) is respected, ie the data read from the -channel are converted to UTF-8 according to the encoding settings, befor the +channel are converted to UTF-8 according to the encoding settings before the data is parsed.
-baseurl <baseURI>
-
If -baseurl <baseURI> is specified, the -baseURI is used as the base URI of the document. External entities referenced -in the document are resolved relative to this base URI. This base URI is also -stored within the DOM tree.
+
If -baseurl <baseURI> is specified, + the baseURI is used as the base URI of the document. + External entities references in the document are + resolved relative to this base URI. This base URI is + also stored within the DOM tree.
-feedbackAfter <#bytes>
-
If -feedbackAfter <#bytes> is specified, the -tcl command given by -feedbackcmd is evaluated at the first -element start within the document (or an external entity) after the -start of the document or external entity or the last such call after -#bytes.For backward compatibility, if no -feedbackcmd is given, but -there is a tcl proc named ::dom::domParseFeedback then this proc is -used as -feedbackcmd. If there isn't such a proc and -feedbackAfter is -used, it is an error to not also use -feedbackcmd. If the called -script raises error, then parsing will be aborted, the -dom parse call returns error, with the script -error msg as error msg. If the called script return --code break, the parsing will abort and the dom -parse call will return the empty string.
+
If -feedbackAfter <#bytes> is + specified, the tcl command given by + -feedbackcmd is evaluated at the first element + start within the document (or an external entity) + after the start of the document or external entity or + the last such call after #bytes. For backward + compatibility if no -feedbackcmd is given but there is + a tcl proc named ::dom::domParseFeedback this proc is + used as -feedbackcmd. If there isn't such a proc and + -feedbackAfter is used it is an error to not also use + -feedbackcmd. If the called script raises error, then + parsing will be aborted, the dom parse call + returns error, with the script error msg as error msg. + If the called script return -code break, the + parsing will abort and the dom parse call will + return the empty string.
-feedbackcmd <script> @@ -202,29 +223,29 @@ specified, the specified tcl script is called to resolve any external entities of the document. The actual evaluated command consists of this option followed by three arguments: the base uri, the system identifier of the entity and the public identifier of the entity. The base uri and the public identifier may be the empty list. The script has to return a tcl list consisting of three -elements. The first element of this list signals, how the external entity is -returned to the processor. At the moment, the two allowed types are "string" +elements. The first element of this list signals how the external entity is +returned to the processor. Currently the two allowed types are "string" and "channel". The second element of the list has to be the (absolute) base URI of the external entity to be parsed. The third element of the list are data, either the already read data out of the external entity as string in the case of type "string", or the name of a tcl channel, in the case of type "channel". Note that if the script returns a tcl channel, it will not be closed by the processor. It must be closed separately if it is no longer -required. +needed.
-useForeignDTD <boolean>
If <boolean> is true and the document does not have an external subset, the parser will call the -externalentitycommand script with -empty values for the systemId and publicID arguments. Pleace notice, that, if +empty values for the systemId and publicID arguments. Please note that if the document also doesn't have an internal subset, the -startdoctypedeclcommand and -enddoctypedeclcommand scripts, if set, are not called. The -useForeignDTD respects
@@ -231,39 +252,44 @@
-paramentityparsing <always|never|notstandalone>
-
The -paramentityparsing option controls, if the -parser tries to resolve the external entities (including the external DTD -subset) of the document, while building the DOM -tree. -paramentityparsing requires an argument, which must be either -"always", "never", or "notstandalone". The value "always" means, that the -parser tries to resolves (recursively) all external entities of the XML -source. This is the default, in case -paramentityparsing is omitted. The -value "never" means, that only the given XML source is parsed and no external -entity (including the external subset) will be resolved and parsed. The value -"notstandalone" means, that all external entities will be resolved and parsed, -with the execption of documents, which explicitly states standalone="yes" in -their XML declaration.
+
The -paramentityparsing option controls, + if the parser tries to resolve the external entities + (including the external DTD subset) of the document + while building the DOM tree. + -paramentityparsing requires an argument, which + must be either "always", "never", or "notstandalone". + The value "always" means that the parser tries to + resolves (recursively) all external entities of the + XML source. This is the default in case + -paramentityparsing is omitted. The value + "never" means that only the given XML source is + parsed and no external entity (including the external + subset) will be resolved and parsed. The value + "notstandalone" means, that all external entities will + be resolved and parsed, with the execption of + documents, which explicitly states standalone="yes" in + their XML declaration.
-ignorexmlns
It is recommended, that you only use this option - together with the -html5 option, if ever. If - this option is given, no node within the created DOM - tree will be internally marked as placed into an XML - Namespace, even if there is a default namespace in - scope for un-prefixed elements or even if the element - has a defined namespace prefix. One consequence is of - this is, that XPath node expressions on such a DOM - tree doesn't work as expected. Prefixed element nodes - can't be selected and element nodes without prefix - will be seen by XPath expressions as if they haven't - any namespace (no matter if they in fact in a default + with the -html5 option. If this option is + given, no node within the created DOM tree will be + internally marked as placed into an XML Namespace, + even if there is a default namespace in scope for + un-prefixed elements or even if the element has a + defined namespace prefix. One consequence is that + XPath node expressions on such a DOM tree doesn't work + as expected. Prefixed element nodes can't be selected + and element nodes without prefix will be seen by XPath + expressions as if they are not in any namespace (no + matter if they are in fact should be in a default namespace).
@@ -292,31 +318,36 @@
dom createDocumentNode ?objVar?
-
Creates a new, 'empty' DOM document object without any element +
Creates a new 'empty' DOM document object without any element node. objVar controls the memory handling as explained above.
dom setResultEncoding ?encodingName?
-
If encodingName is not given the current global -result encoding is returned. Otherwise the global result encoding is set to -encodingName. All character data, attribute values, etc. will -then be converted from UTF-8, which is delivered from the Expat XML parser, to -the given 8 bit encoding at XML/DOM parse time. Valid values for -encodingName are: utf-8, ascii, cp1250, cp1251, cp1252, cp1253, -cp1254, cp1255, cp1256, cp437, cp850, en, iso8859-1, iso8859-2, iso8859-3, -iso8859-4, iso8859-5, iso8859-6, iso8859-7, iso8859-8, iso8859-9, koi8-r.
+
This option is for backward compatibility with Tcl + 8.0. If tDOM is build with any newer Tcl version this option + does not has any effect. If encodingName is not given + the current global result encoding is returned. Otherwise + the global result encoding is set to encodingName. + All character data, attribute values etc. will then be + converted from UTF-8, which is delivered from the Expat XML + parser, to the given 8 bit encoding at XML/DOM parse time. + Valid values for encodingName are: utf-8, ascii, + cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256, + cp437, cp850, en, iso8859-1, iso8859-2, iso8859-3, + iso8859-4, iso8859-5, iso8859-6, iso8859-7, iso8859-8, + iso8859-9, koi8-r.
dom createNodeCmd -?-returnNodeCmd? ?-tagName name? ?-jsonType jsonType (element|comment|text|cdata|pi)Node commandName +?-returnNodeCmd? ?-tagName name? ?-jsonType jsonType? ?-namespace URI? (element|comment|text|cdata|pi)Node commandName
This method creates Tcl commands, which in turn create tDOM nodes. Tcl commands created by this command are only avaliable inside a script given to the domNode methods appendFromScript or insertBeforeFromScript. If @@ -338,18 +369,27 @@ returns nothing. Each command creates always the same type of node. Which type of node is created by the command is determined by the first argument to the createNodeCmd. The syntax of the created command depends on the type of the node it creates.

-

If the first argument of the method is elementNode, the -created command will create an element node. Without the +

If the command type to create is elementNode, the created +command will create an element node, if called. Without the -tagName option the tag name of the created node is commandName without namespace qualifiers. If the -tagName option was given then the created command the created elements will have this tag name. If the -jsonType option was -given then the created node elements will have the given JSON type. -The syntax of the created command is:

+given then the created node elements will have the given JSON type. If +the -namespace option is given the created element node will be +XML namespaced and in the namespace given by the option. The element +name will be literal as given either by the command name or the +-tagname option, if that was given. An appropriate XML +namespace declaration will be automatically added, to bind the prefix +(if the element name has one) or the default namespace (if the element +name hasn't a prefix) to the namespace if such a binding isn't in +scope.

+ +

The syntax of the created command is:

 elementNodeCmd ?attributeName attributeValue ...? ?script?
 elementNodeCmd ?-attributeName attributeValue ...? ?script?
 elementNodeCmd name_value_list script
@@ -404,45 +444,45 @@
         
           
dom setStoreLineColumn ?boolean?
If switched on, the DOM nodes will contain line and column position information for the original XML document after parsing. The default -is, not to store line and column position information.
+is not to store line and column position information.
dom setNameCheck ?boolean?
If NameCheck is true, every method which expects an XML Name, a full qualified name or a processing instructing target will check, if the -given string is valid according to his production rule. For commands created +given string is valid according to its production rule. For commands created with the createNodeCmd method to be used in the context of appendFromScript the status of the flag at creation time decides. If NameCheck is true at creation time, the command will -check his arguments, otherwise not. The setNameCheck +check its arguments, otherwise not. The setNameCheck set this flag. It returns the current NameCheck flag state. The default state for NameCheck is true.
dom setTextCheck ?boolean?
If TextCheck is true, every command which expects XML Chars, a comment, a CDATA section value or a processing instructing value will check, -if the given string is valid according to his production rule. For commands +if the given string is valid according to its production rule. For commands created with the createNodeCmd method to be used in the context of appendFromScript the status of the flag at creation time decides. If TextCheck is true at creation time, the -command will check his arguments, otherwise not.The -setTextCheck method set this flag. It returns the current +command will check its arguments, otherwise not.The +setTextCheck method sets this flag. It returns the current TextCheck flag state. The default state for TextCheck is true.
dom setObjectCommands ?(automatic|token|command)?
-
Controls, if documents and nodes are created as tcl commands or +
Controls if documents and nodes are created as tcl commands or as token to be used with the domNode and domDoc commands. If the mode is 'automatic', then methods used at tcl commands will create tcl commands and methods used at doc or node tokes will create tokens. If the mode is 'command' then always tcl commands will be created. If @@ -452,59 +492,59 @@
dom isName name
-
Returns 1, if name is a valid XML Name according to +
Returns 1 if name is a valid XML Name according to production 5 of the XML - 1.0 recommendation. This means, that name is a valid + 1.0 recommendation. This means that name is a valid XML element or attribute name. Otherwise it returns 0.
dom isPIName name
-
Returns 1, if name is a valid XML processing instruction +
Returns 1 if name is a valid XML processing instruction target according to production 17 of the XML 1.0 recommendation. Otherwise it returns 0.
dom isNCName name
-
Returns 1, if name is a valid NCName according +
Returns 1 if name is a valid NCName according to production 4 of the of the Namespaces in XML recommendation. Otherwise it returns 0.
dom isQName name
-
Returns 1, if name is a valid QName according +
Returns 1 if name is a valid QName according to production 6 of the of the Namespaces in XML recommendation. Otherwise it returns 0.
dom isCharData string
-
Returns 1, if every character in string is +
Returns 1 if every character in string is a valid XML Char according to production 2 of the XML 1.0 recommendation. Otherwise it returns 0.
dom isBMPCharData string
-
Returns 1, if every character in string is +
Returns 1 if every character in string is a valid XML Char with a Unicode code point within the Basic Multilingual Plane (that means, that every character within the string is at most 3 bytes long). Otherwise it returns 0.
@@ -511,31 +551,31 @@
dom isComment string
-
Returns 1, if string is +
Returns 1 if string is a valid comment according to production 15 of the XML 1.0 recommendation. Otherwise it returns 0.
dom isCDATA string
-
Returns 1, if string is +
Returns 1 if string is valid according to production 20 of the XML 1.0 recommendation. Otherwise it returns 0.
dom isPIValue string
-
Returns 1, if string is +
Returns 1 if string is valid according to production 16 of the XML 1.0 recommendation. Otherwise it returns 0.
@@ -548,11 +588,11 @@
expatversion
Returns the version of the underlyling expat version as string, something like - "exapt_2.1.0". This is. what the expat API + "exapt_2.1.0". This is what the expat API function XML_ExpatVersion() returns.
expatmajorversion
Returns the major version of the underlyling @@ -568,31 +608,31 @@
Returns the micro version of the underlyling expat version as integer.
dtd
-
Returns as boolean, if build with +
Returns as boolean if build with --enable-dtd.
ns
-
Returns as boolean, if build with +
Returns as boolean if build with --enable-ns.
unknown
-
Returns as boolean, if build with +
Returns as boolean if build with --enable-unknown.
tdomalloc
-
Returns as boolean, if build with +
Returns as boolean if build with --enable-tdomalloc.
lessns
-
Returns as boolean, if build with +
Returns as boolean if build with --enable-lessns.
TCL_UTF_MAX
Returns the TCL_UTF_MAX value of the tcl @@ -606,13 +646,13 @@
-

KEYWORDS

+

KEYWORDS

XML, DOM, document, node, parsing

Index: doc/dom.n ================================================================== --- doc/dom.n +++ doc/dom.n @@ -228,34 +228,34 @@ If \fI-simple\fR is specified, a simple but fast parser is used (conforms not fully to XML recommendation). That should double parsing and DOM generation speed. The encoding of the data is not transformed inside the parser. The simple parser does not respect any encoding information in the XML declaration. It skips over -the internal DTD subset and ignores any information in it. Therefor it doesn't +the internal DTD subset and ignores any information in it. Therefore it doesn't include defaulted attribute values into the tree, even if the according attribute declaration is in the internal subset. It also doesn't expand internal or external entity references other than the predefined entities and character references. .IP "\fB-html\fR" If \fI-html\fR is specified, a fast HTML parser is used, which tries to even parse badly formed HTML into a DOM tree. .IP "\fB-html5\fR" -This option is only available, if tDOM was build +This option is only available if tDOM was build with --enable-html5. Try the \fIfeatureinfo\fR method -if you need to know, if this feature is build in. If +if you need to know if this feature is build in. If \&\fI-html5\fR is specified, the gumbo lib html5 parser -(https://github.com/google/gumbo-parser) is used, to +(https://github.com/google/gumbo-parser) is used to build the DOM tree. This is, as far as it goes, XML -namespace aware. Since this probably isn't wanted by a +namespace-aware. Since this probably isn't wanted by a lot of users and adds only burden for no good in a lot of use cases \fI-html5\fR can be combined with \&\fI-ignorexmlns\fR, in which case all nodes and attributes in the DOM tree are not in an XML namespace. All tag and attribute names in the DOM tree will be lower case, even for foreign elements not in the xhtml, svg or mathml namespace. The DOM tree may -include nodes, that the parser inserted, because they +include nodes, that the parser inserted because they are implied by the context (as , , etc.). .IP "\fB-json\fR" If \fI-json\fR is specified, the \fIdata\fR is expected to be a valid JSON string (according to RFC @@ -281,40 +281,52 @@ investigated or modified with the full range of the doc and node methods. Please note that the element node names and the text node values within the tree may be outside of what the appropriate XML productions allow. +.IP "\fB-jsonmaxnesting \fIinteger\fP\fR" +This options only has effect if used together +with the \fI-json\fR option. The current implementation uses recursive descent JSON parser. In order to avoid using excess stack space, any JSON input that has more than a certain levels of nesting is considered invalid. The default maximum nesting is 2000. The option -jsonmaxnesting allows the user to adjust that. +.IP "\fB--\fR" +The option \fI--\fR marks the end of options. +While respected in general this option is only needed +in case of parsing JSON data, which may start with a +"-". .IP "\fB-keepEmpties\fR" If \fI-keepEmpties\fR is -specified, text nodes, which contain only whitespaces, will be part of the +specified then text nodes which contain only whitespaces will be part of the resulting DOM tree. In default case (\fI-keepEmpties\fR not given) those empty text nodes are removed at parsing time. .IP "\fB-channel \fI\fP\fR" If \fI-channel \fR is specified, the input to be parsed is read from the specified channel. The encoding setting of the channel (via fconfigure -encoding) is respected, ie the data read from the -channel are converted to UTF-8 according to the encoding settings, befor the +channel are converted to UTF-8 according to the encoding settings before the data is parsed. .IP "\fB-baseurl \fI\fP\fR" -If \fI-baseurl \fR is specified, the -baseURI is used as the base URI of the document. External entities referenced -in the document are resolved relative to this base URI. This base URI is also -stored within the DOM tree. +If \fI-baseurl \fR is specified, +the baseURI is used as the base URI of the document. +External entities references in the document are +resolved relative to this base URI. This base URI is +also stored within the DOM tree. .IP "\fB-feedbackAfter \fI<#bytes>\fP\fR" -If \fI-feedbackAfter <#bytes>\fR is specified, the -tcl command given by \fI-feedbackcmd\fR is evaluated at the first -element start within the document (or an external entity) after the -start of the document or external entity or the last such call after -#bytes.For backward compatibility, if no -feedbackcmd is given, but -there is a tcl proc named ::dom::domParseFeedback then this proc is -used as -feedbackcmd. If there isn't such a proc and -feedbackAfter is -used, it is an error to not also use -feedbackcmd. If the called -script raises error, then parsing will be aborted, the -\&\fIdom parse\fR call returns error, with the script -error msg as error msg. If the called script \fIreturn --code break\fR, the parsing will abort and the \fIdom -parse\fR call will return the empty string. +If \fI-feedbackAfter <#bytes>\fR is +specified, the tcl command given by +\&\fI-feedbackcmd\fR is evaluated at the first element +start within the document (or an external entity) +after the start of the document or external entity or +the last such call after #bytes. For backward +compatibility if no -feedbackcmd is given but there is +a tcl proc named ::dom::domParseFeedback this proc is +used as -feedbackcmd. If there isn't such a proc and +-feedbackAfter is used it is an error to not also use +-feedbackcmd. If the called script raises error, then +parsing will be aborted, the \fIdom parse\fR call +returns error, with the script error msg as error msg. +If the called script \fIreturn -code break\fR, the +parsing will abort and the \fIdom parse\fR call will +return the empty string. .IP "\fB-feedbackcmd \fI