Welcome!

Artificial Intelligence Authors: Liz McMillan, Yeshim Deniz, Zakia Bouachraoui, Pat Romanski, Elizabeth White

RSS Feed Item

Re: How to parse XML document with default namespace with JDOM

Hi Michael,
 
Thank you very much for your thorough response even though I don't understand most of them.
 
All I was trying to do was to parse some simple html pages (http://forums.sun.com/thread.jspa?threadID=5343084&tstart=30) which has worked in an unstructured coding manner. I got stuck with what appears to be an I/O issue and has turned to other XML conversion tool such as TagSoup (requires Saxon 6.5.5).
I am reluctant to embark on what appears to be very different to simple JDOM XPath environment and do not see a need to use XSLT just yet. Nevertheless, you extra effort has been very much appreciated still.
 
Cheers,
 
Jack


From: Michael Kay <[email protected]>
To: Jack Bush <[email protected]>; [email protected]
Sent: Wednesday, 5 November, 2008 8:13:59 PM
Subject: RE: How to parse XML document with default namespace with JDOM XPath

The book you are quoting is very old information. Where it says that you can do something, it's probably right, but where it says that you can't, it may well be wrong..
 
Frankly, I forget exactly what's in Saxon 6.5.x because it's so long ago - I know JDOM was already supported back then but I don't remember the details of the API. I do recall, that as Elliotte says in his book, the Saxon API for invoking XPath was pretty clumsy in those days (because it was designed primarily for internal use by the XSLT engine, not as a user-facing interface). I would use a more recent release.
 
But the code you gave us wasn't even trying to use Saxon, it was using the XPath engine within JDOM, using a JDOM API that I'm not very familiar with, and therefore I can't tell you why it isn't working.
 
My own preference for this kind of coding would be to use Saxon's s9api interface, documented at
 
http://www.saxonica.com/documentation/javadoc/net/sf/saxon/s9api/package-summary.html
 
There are sample applications in the saxon-resources download (from SourceForge). It includes an example of XPath with JDOM like this:
 
         public void run() throws SaxonApiException {
             // Build the JDOM document
             org.jdom.input.SAXBuilder jdomBuilder = new org.jdom.input.SAXBuilder();
             File file = new File("data/books.xml");
             org.jdom.Document doc;
             try {
                 doc = jdomBuilder.build(file);
             } catch (org.jdom.JDOMException e) {
                 throw new SaxonApiException(e);
             } catch (IOException e) {
                 throw new SaxonApiException(e);
             }
             Processor proc = new Processor(false);
             DocumentBuilder db = proc.newDocumentBuilder();
             XdmNode xdmDoc = db.wrap(doc);
             XPathCompiler xpath = proc.newXPathCompiler();
             XPathExecutable xx = xpath.compile("//ITEM/TITLE");
             XPathSelector selector = xx.load();
             selector.setContextItem(xdmDoc);
             for(XdmItem item : selector) {
                 XdmNode node = (XdmNode)item;
                 org.jdom.Element element = (org.jdom.Element)node.getExternalNode();
                 System.out.println(element.getValue());
             }
         }
 
(The method getExternalNode() was added in Saxon 9.1.0.2 and is not yet in the published Javadoc)
 
You would probably want to add a call
 
xpath.declareNamespace("prefix", "uri")
 
before the compile() call. 
 
Michael Kay
http://www.saxonica.com/
 

From: Jack Bush [mailto:[email protected]]
Sent: 05 November 2008 03:48
To: Michael Kay; [email protected]
Subject: Re: How to parse XML document with default namespace with JDOM XPath

Hi Michael,
 
Thanks for responding to this question.
 
I have not had any luck with [email protected] forum at all since subscribing to them a few months back.
 
In the meantime, can you confirm that it is not possible to use Sax 6.5.x with JDOM according to http://www.cafeconleche.org/books/xmljava/chapters/ch16s05.html? Or is it because you are not familiar with JDOM?
 
Could anyone point me to a more useful JDOM forum to assistance with this question?
 
Many thanks,
 
Jack


From: Michael Kay <[email protected]>
To: Jack Bush <[email protected]>; [email protected]
Sent: Wednesday, 5 November, 2008 12:39:48 AM
Subject: RE: How to parse XML document with default namespace with JDOM XPath

I see no Saxon code here.. You are using the XPath engine that comes with JDOM. You might be better off asking on the JDOM list. I have to confess I'm surprised to see you declaring namespaces AFTER compiling the XPath expression, but I can't say I'm familiar with this API.
 
Michael Kay
http://www.saxonica.com/


From: Jack Bush [mailto:[email protected]]
Sent: 04 November 2008 13:02
To: [email protected]
Subject: How to parse XML document with default namespace with JDOM XPath

Hi All,

 

I am having difficulty parsing using Saxon and TagSoup parser on a namespace html document. The relevant content of this document are as follows:

 

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

……..

</head>

<body>

    <div id="container">

        <div id="content">

            <table class="sresults">

                <tr>

                    <td>

                        <a href="http://www.abc.com/areas" title=" Hollywood , CA "> hollywood </a>

                    </td>

                    <td>

                        <a href="http://www.abc.com/areas" title=" San Jose , CA "> san jose </a>

                    </td>

                    <td>

                        <a href="http://www.abc.com/areas" title=" San Francisco , CA "> san francisco </a>

                    </td>

                    <td>

                        <a href="http://www.abc.com/areas" title=" San Diego , CA "> San diego </a>

                    </td>

              </tr>

……….

</body>

</html>

 

Below is the relevant code snippets illustrates how I have attempted to retrieve the contents (value of  <a>):

 

             import java.util.*;

             import org.jdom.*;

             import org.jdom.xpath.*;

             import org.saxpath.*;

             import org.ccil.cowan.tagsoup.Parser;

 

( 1 )       frInHtml = new FileReader("C:\\Tmp\\ABC.html");

( 2 )       brInHtml = new BufferedReader(frInHtml);

( 3 ) //    SAXBuilder saxBuilder = new SAXBuilder("org.apache.xerces.parsers.SAXParser");

( 4 )       SAXBuilder saxBuilder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser");

( 5 )       org.jdom..Document jdomDocument = saxbuilder.build(brInHtml);

( 6 )       XPath xpath =  XPath.newInstance("/ns:html/ns:body/ns:div[@id='container']/ns:div[@id='content']/ns:table[@class='sresults']/ns:tr/ns:td/ns:a");

( 7 )       xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml");

( 8 )       java.util.List list = (java.util.List) (xpath.selectNodes(jdomDocument));

( 9 )       Iterator iterator = list.iterator();

( 10 )     while (iterator.hasNext())

( 11 )     {

( 12 )            Object object = iterator.next();

( 13 ) //         if (object instanceof Element)

( 14 ) //               System.out.println(((Element)object).getTextNormalize());

( 15 )             if (object instanceof Content)

( 16 )                   System.out.println(((Content)object).getValue());

              }

….

 

This program would work on the same document without the default namespace, hence, it would not be necessary to include “ns” prefix along in the XPath statements (line 6-7) either. Moreover, I was using “org..apache.xerces.parsers.SAXParser” to have successfully retrieve content of <a> from the same document without default namespace in the past.

 

I would like to achieve the following objectives if possible:

 

( i ) Exclude DTD and namespace in order to simplifying the parsing process. How this could be done?

( ii ) If this is not possible, how to include it in XPath statements (line 6-7) so that the value of <a> is picked up correctly?

( iii ) Would changing from “org.apache.xerces.parsers.SAXParser” to “org.ccil.cowan.tagsoup.Parser” make any difference as far as using XPath is concerned?

( iv ) Failing to exlude DTD, how to change the lookup of a PUBLIC DTD to a local SYSTEM one and include a local DTD for reference?

 

I am running JDK 1.6.0_06, Netbeans 6.1, JDOM 1.1, Saxon6-5-5, Tagsoup 1.2 on Windows XP platform.

 

Any assistance would be appreciated.

 

Thanks in advance,

 

Jack



Search 1000's of available singles in your area at the new Yahoo!7 Dating. http://au.rd.yahoo..com/dating/mail/tagline1/*http://au.dating.yahoo.com/?cid=53151&pid=1011.


Search 1000's of available singles in your area at the new Yahoo!7 Dating. http://au.rd.yahoo.com/dating/mail/tagline1/*http://au.dating.yahoo.com/?cid=53151&pid=1011.


Search 1000's of available singles in your area at the new Yahoo!7 Dating. http://au.rd.yahoo.com/dating/mail/tagline1/*http://au.dating.yahoo.com/....

Read the original blog entry...

IoT & Smart Cities Stories
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
DXWorldEXPO LLC announced today that Telecom Reseller has been named "Media Sponsor" of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. Telecom Reseller reports on Unified Communications, UCaaS, BPaaS for enterprise and SMBs. They report extensively on both customer premises based solutions such as IP-PBX as well as cloud based and hosted platforms.
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-c...
To Really Work for Enterprises, MultiCloud Adoption Requires Far Better and Inclusive Cloud Monitoring and Cost Management … But How? Overwhelmingly, even as enterprises have adopted cloud computing and are expanding to multi-cloud computing, IT leaders remain concerned about how to monitor, manage and control costs across hybrid and multi-cloud deployments. It’s clear that traditional IT monitoring and management approaches, designed after all for on-premises data centers, are falling short in ...
DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use of real time applications accelerate, legacy networks are no longer able to architecturally support cloud adoption and deliver the performance and security required by highly distributed enterprises. These outdated solutions have become more costly and complicated to implement, install, manage, and maintain.SD-WAN offers unlimited capabilities for accessing the benefits of the cloud and Internet. ...
Business professionals no longer wonder if they'll migrate to the cloud; it's now a matter of when. The cloud environment has proved to be a major force in transitioning to an agile business model that enables quick decisions and fast implementation that solidify customer relationships. And when the cloud is combined with the power of cognitive computing, it drives innovation and transformation that achieves astounding competitive advantage.
DXWorldEXPO LLC announced today that "IoT Now" was named media sponsor of CloudEXPO | DXWorldEXPO 2018 New York, which will take place on November 11-13, 2018 in New York City, NY. IoT Now explores the evolving opportunities and challenges facing CSPs, and it passes on some lessons learned from those who have taken the first steps in next-gen IoT services.
SYS-CON Events announced today that Silicon India has been named “Media Sponsor” of SYS-CON's 21st International Cloud Expo, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Published in Silicon Valley, Silicon India magazine is the premiere platform for CIOs to discuss their innovative enterprise solutions and allows IT vendors to learn about new solutions that can help grow their business.