Issue
I want to parse texts that appear in XML file but outside XML tags. in the attached example I would like to parse only the texts that is outside of p
tag, such as "FIELD OF THE TECHNOLOGY"
and "DETAILED DESCRIPTION OF THE TECHNOLOGY"
.
An example of my XML file is:
<description>
FIELD OF THE TECHNOLOGY
<p>The present technology is directed ....</p>
<p>The present invention is.....</p>
<p>One promising approach has ...,</p>
DETAILED DESCRIPTION OF THE TECHNOLOGY
<p>The present tech provides, ....</p>
<p>A report by Kearse et al.,...</p>
</description>
Solution
Terminology
In your example, the description
element has mixed content. You're looking to extract the text node children of the description
element. Identifying the right terminology is the first step to searching for answers (and narrowing overly broad questions).
Parsing XML
...with Java in general
- Best XML parser for Java
- Which is the best library for XML parsing in java
- How to retrieve element value of XML using Java?
- Is there an easier way to parse XML in Java?
...with mixed content:
...choosing parsing technology:
You can find many tutorials on choosing a parsing technology, but XPath is particularly well-suited for selecting parts of an XML document, and there are libraries available for most languages.
...via XPath, for example:
This XPath,
//description/text()
will select all immediate text node children from the description
element. It will not include the p
elements or descendents thereof, as requested.
Answered By - kjhughes
Answer Checked By - Cary Denson (JavaFixing Admin)