/* * Copyright (c) 2004 World Wide Web Consortium, * * (Massachusetts Institute of Technology, European Research Consortium for * Informatics and Mathematics, Keio University). All Rights Reserved. This * work is distributed under the W3C(r) Software License [1] in the hope that * it will be useful, but WITHOUT ANY WARRANTY; without even the implied * warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. * * [1] http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231 */ package org.w3c.dom.ls; import org.w3c.dom.Document; import org.w3c.dom.DOMConfiguration; import org.w3c.dom.Node; import org.w3c.dom.DOMException; /** * An interface to an object that is able to build, or augment, a DOM tree * from various input sources. *

LSParser provides an API for parsing XML and building the * corresponding DOM document structure. A LSParser instance * can be obtained by invoking the * DOMImplementationLS.createLSParser() method. *

As specified in [DOM Level 3 Core] * , when a document is first made available via the LSParser: *

*

Asynchronous LSParser objects are expected to also * implement the events::EventTarget interface so that event * listeners can be registered on asynchronous LSParser * objects. *

Events supported by asynchronous LSParser objects are: *

*
load
*
* The LSParser finishes to load the document. See also the * definition of the LSLoadEvent interface.
*
progress
*
The * LSParser signals progress as data is parsed. This * specification does not attempt to define exactly when progress events * should be dispatched. That is intentionally left as * implementation-dependent. Here is one example of how an application might * dispatch progress events: Once the parser starts receiving data, a * progress event is dispatched to indicate that the parsing starts. From * there on, a progress event is dispatched for every 4096 bytes of data * that is received and processed. This is only one example, though, and * implementations can choose to dispatch progress events at any time while * parsing, or not dispatch them at all. See also the definition of the * LSProgressEvent interface.
*
*

Note: All events defined in this specification use the * namespace URI "http://www.w3.org/2002/DOMLS". *

While parsing an input source, errors are reported to the application * through the error handler (LSParser.domConfig's " * error-handler" parameter). This specification does in no way try to define all possible * errors that can occur while parsing XML, or any other markup, but some * common error cases are defined. The types (DOMError.type) of * errors and warnings defined by this specification are: *

*
* "check-character-normalization-failure" [error]
*
Raised if * the parameter " * check-character-normalization" is set to true and a string is encountered that fails normalization * checking.
*
"doctype-not-allowed" [fatal]
*
Raised if the * configuration parameter "disallow-doctype" is set to true * and a doctype is encountered.
*
"no-input-specified" [fatal]
*
* Raised when loading a document and no input is specified in the * LSInput object.
*
* "pi-base-uri-not-preserved" [warning]
*
Raised if a processing * instruction is encountered in a location where the base URI of the * processing instruction can not be preserved. One example of a case where * this warning will be raised is if the configuration parameter " * entities" is set to false and the following XML file is parsed: *
 * <!DOCTYPE root [ <!ENTITY e SYSTEM 'subdir/myentity.ent' ]> 
 * <root> &e; </root>
* And subdir/myentity.ent * contains: *
<one> <two/> </one> <?pi 3.14159?> 
 * <more/>
*
*
"unbound-prefix-in-entity" [warning]
*
An * implementation dependent warning that may be raised if the configuration * parameter " * namespaces" is set to true and an unbound namespace prefix is * encountered in an entity's replacement text. Raising this warning is not * enforced since some existing parsers may not recognize unbound namespace * prefixes in the replacement text of entities.
*
* "unknown-character-denormalization" [fatal]
*
Raised if the * configuration parameter "ignore-unknown-character-denormalizations" is * set to false and a character is encountered for which the * processor cannot determine the normalization properties.
*
* "unsupported-encoding" [fatal]
*
Raised if an unsupported * encoding is encountered.
*
"unsupported-media-type" [fatal]
*
* Raised if the configuration parameter "supported-media-types-only" is set * to true and an unsupported media type is encountered.
*
*

In addition to raising the defined errors and warnings, implementations * are expected to raise implementation specific errors and warnings for any * other error and warning cases such as IO errors (file not found, * permission denied,...), XML well-formedness errors, and so on. *

See also the Document Object Model (DOM) Level 3 Load and Save Specification. */ public interface LSParser { /** * The DOMConfiguration object used when parsing an input * source. This DOMConfiguration is specific to the parse * operation. No parameter values from this DOMConfiguration * object are passed automatically to the DOMConfiguration * object on the Document that is created, or used, by the * parse operation. The DOM application is responsible for passing any * needed parameter values from this DOMConfiguration * object to the DOMConfiguration object referenced by the * Document object. *
In addition to the parameters recognized in on the * DOMConfiguration interface defined in [DOM Level 3 Core] * , the DOMConfiguration objects for LSParser * add or modify the following parameters: *

*
* "charset-overrides-xml-encoding"
*
*
*
true
*
[optional] (default) If a higher level protocol such as HTTP [IETF RFC 2616] provides an * indication of the character encoding of the input stream being * processed, that will override any encoding specified in the XML * declaration or the Text declaration (see also section 4.3.3, * "Character Encoding in Entities", in [XML 1.0]). * Explicitly setting an encoding in the LSInput overrides * any encoding from the protocol.
*
false
*
[required] The parser ignores any character set encoding information from * higher-level protocols.
*
*
"disallow-doctype"
*
*
*
* true
*
[optional] Throw a fatal "doctype-not-allowed" error if a doctype node is found while parsing the document. This is * useful when dealing with things like SOAP envelopes where doctype * nodes are not allowed.
*
false
*
[required] (default) Allow doctype nodes in the document.
*
*
* "ignore-unknown-character-denormalizations"
*
*
*
* true
*
[required] (default) If, while verifying full normalization when [XML 1.1] is * supported, a processor encounters characters for which it cannot * determine the normalization properties, then the processor will * ignore any possible denormalizations caused by these characters. * This parameter is ignored for [XML 1.0].
*
* false
*
[optional] Report an fatal "unknown-character-denormalization" error if a character is encountered for which the processor cannot * determine the normalization properties.
*
*
"infoset"
*
See * the definition of DOMConfiguration for a description of * this parameter. Unlike in [DOM Level 3 Core] * , this parameter will default to true for * LSParser.
*
"namespaces"
*
*
*
true
*
[required] (default) Perform the namespace processing as defined in [XML Namespaces] * and [XML Namespaces 1.1] * .
*
false
*
[optional] Do not perform the namespace processing.
*
*
* "resource-resolver"
*
[required] A reference to a LSResourceResolver object, or null. If * the value of this parameter is not null when an external resource * (such as an external XML entity or an XML schema location) is * encountered, the implementation will request that the * LSResourceResolver referenced in this parameter resolves * the resource.
*
"supported-media-types-only"
*
*
*
* true
*
[optional] Check that the media type of the parsed resource is a supported media * type. If an unsupported media type is encountered, a fatal error of * type "unsupported-media-type" will be raised. The media types defined in [IETF RFC 3023] must always * be accepted.
*
false
*
[required] (default) Accept any media type.
*
*
"validate"
*
See the definition of * DOMConfiguration for a description of this parameter. * Unlike in [DOM Level 3 Core] * , the processing of the internal subset is always accomplished, even * if this parameter is set to false.
*
* "validate-if-schema"
*
See the definition of * DOMConfiguration for a description of this parameter. * Unlike in [DOM Level 3 Core] * , the processing of the internal subset is always accomplished, even * if this parameter is set to false.
*
* "well-formed"
*
See the definition of * DOMConfiguration for a description of this parameter. * Unlike in [DOM Level 3 Core] * , this parameter cannot be set to false.
*
*/ public DOMConfiguration getDomConfig(); /** * When a filter is provided, the implementation will call out to the * filter as it is constructing the DOM tree structure. The filter can * choose to remove elements from the document being constructed, or to * terminate the parsing early. *
The filter is invoked after the operations requested by the * DOMConfiguration parameters have been applied. For * example, if " * validate" is set to true, the validation is done before invoking the * filter. */ public LSParserFilter getFilter(); /** * When a filter is provided, the implementation will call out to the * filter as it is constructing the DOM tree structure. The filter can * choose to remove elements from the document being constructed, or to * terminate the parsing early. *
The filter is invoked after the operations requested by the * DOMConfiguration parameters have been applied. For * example, if " * validate" is set to true, the validation is done before invoking the * filter. */ public void setFilter(LSParserFilter filter); /** * true if the LSParser is asynchronous, * false if it is synchronous. */ public boolean getAsync(); /** * true if the LSParser is currently busy * loading a document, otherwise false. */ public boolean getBusy(); /** * Parse an XML document from a resource identified by a * LSInput. * @param input The LSInput from which the source of the * document is to be read. * @return If the LSParser is a synchronous * LSParser, the newly created and populated * Document is returned. If the LSParser is * asynchronous, null is returned since the document * object may not yet be constructed when this method returns. * @exception DOMException * INVALID_STATE_ERR: Raised if the LSParser's * LSParser.busy attribute is true. * @exception LSException * PARSE_ERR: Raised if the LSParser was unable to load * the XML document. DOM applications should attach a * DOMErrorHandler using the parameter " * error-handler" if they wish to get details on the error. */ public Document parse(LSInput input) throws DOMException, LSException; /** * Parse an XML document from a location identified by a URI reference [IETF RFC 2396]. If the URI * contains a fragment identifier (see section 4.1 in [IETF RFC 2396]), the * behavior is not defined by this specification, future versions of * this specification may define the behavior. * @param uri The location of the XML document to be read. * @return If the LSParser is a synchronous * LSParser, the newly created and populated * Document is returned, or null if an error * occured. If the LSParser is asynchronous, * null is returned since the document object may not yet * be constructed when this method returns. * @exception DOMException * INVALID_STATE_ERR: Raised if the LSParser.busy * attribute is true. * @exception LSException * PARSE_ERR: Raised if the LSParser was unable to load * the XML document. DOM applications should attach a * DOMErrorHandler using the parameter " * error-handler" if they wish to get details on the error. */ public Document parseURI(String uri) throws DOMException, LSException; // ACTION_TYPES /** * Append the result of the parse operation as children of the context * node. For this action to work, the context node must be an * Element or a DocumentFragment. */ public static final short ACTION_APPEND_AS_CHILDREN = 1; /** * Replace all the children of the context node with the result of the * parse operation. For this action to work, the context node must be an * Element, a Document, or a * DocumentFragment. */ public static final short ACTION_REPLACE_CHILDREN = 2; /** * Insert the result of the parse operation as the immediately preceding * sibling of the context node. For this action to work the context * node's parent must be an Element or a * DocumentFragment. */ public static final short ACTION_INSERT_BEFORE = 3; /** * Insert the result of the parse operation as the immediately following * sibling of the context node. For this action to work the context * node's parent must be an Element or a * DocumentFragment. */ public static final short ACTION_INSERT_AFTER = 4; /** * Replace the context node with the result of the parse operation. For * this action to work, the context node must have a parent, and the * parent must be an Element or a * DocumentFragment. */ public static final short ACTION_REPLACE = 5; /** * Parse an XML fragment from a resource identified by a * LSInput and insert the content into an existing document * at the position specified with the context and * action arguments. When parsing the input stream, the * context node (or its parent, depending on where the result will be * inserted) is used for resolving unbound namespace prefixes. The * context node's ownerDocument node (or the node itself if * the node of type DOCUMENT_NODE) is used to resolve * default attributes and entity references. *
As the new data is inserted into the document, at least one * mutation event is fired per new immediate child or sibling of the * context node. *
If the context node is a Document node and the action * is ACTION_REPLACE_CHILDREN, then the document that is * passed as the context node will be changed such that its * xmlEncoding, documentURI, * xmlVersion, inputEncoding, * xmlStandalone, and all other such attributes are set to * what they would be set to if the input source was parsed using * LSParser.parse(). *
This method is always synchronous, even if the * LSParser is asynchronous (LSParser.async is * true). *
If an error occurs while parsing, the caller is notified through * the ErrorHandler instance associated with the " * error-handler" parameter of the DOMConfiguration. *
When calling parseWithContext, the values of the * following configuration parameters will be ignored and their default * values will always be used instead: " * validate", " * validate-if-schema", and " * element-content-whitespace". Other parameters will be treated normally, and the parser is expected * to call the LSParserFilter just as if a whole document * was parsed. * @param input The LSInput from which the source document * is to be read. The source document must be an XML fragment, i.e. * anything except a complete XML document (except in the case where * the context node of type DOCUMENT_NODE, and the action * is ACTION_REPLACE_CHILDREN), a DOCTYPE (internal * subset), entity declaration(s), notation declaration(s), or XML or * text declaration(s). * @param contextArg The node that is used as the context for the data * that is being parsed. This node must be a Document * node, a DocumentFragment node, or a node of a type * that is allowed as a child of an Element node, e.g. it * cannot be an Attribute node. * @param action This parameter describes which action should be taken * between the new set of nodes being inserted and the existing * children of the context node. The set of possible actions is * defined in ACTION_TYPES above. * @return Return the node that is the result of the parse operation. If * the result is more than one top-level node, the first one is * returned. * @exception DOMException * HIERARCHY_REQUEST_ERR: Raised if the content cannot replace, be * inserted before, after, or as a child of the context node (see also * Node.insertBefore or Node.replaceChild in [DOM Level 3 Core] * ). *
NOT_SUPPORTED_ERR: Raised if the LSParser doesn't * support this method, or if the context node is of type * Document and the DOM implementation doesn't support * the replacement of the DocumentType child or * Element child. *
NO_MODIFICATION_ALLOWED_ERR: Raised if the context node is a * read only node and the content is being appended to its child list, * or if the parent node of the context node is read only node and the * content is being inserted in its child list. *
INVALID_STATE_ERR: Raised if the LSParser.busy * attribute is true. * @exception LSException * PARSE_ERR: Raised if the LSParser was unable to load * the XML fragment. DOM applications should attach a * DOMErrorHandler using the parameter " * error-handler" if they wish to get details on the error. */ public Node parseWithContext(LSInput input, Node contextArg, short action) throws DOMException, LSException; /** * Abort the loading of the document that is currently being loaded by * the LSParser. If the LSParser is currently * not busy, a call to this method does nothing. */ public void abort(); }