Thursday, December 07, 2006

Cube Platform: La Grid en Java

La plataforma Cube es un middleware que permite distribuir tareas entre los integrantes de una red Cube. De esta forma si una aplicación necesita una gran necesidad de cómputo puede obtenerlo de otras máquinas. Una tarea no es más que una clase Java. El usuario la escribe, la compila y la envia a Cube. Se encuentra en la primera beta y es totalmente funcional. La documentación aun es escasa. La licencia es GPL.

Saturday, December 02, 2006

Cursos online sobre Java (www.javapassion.com)


JAXP (all about JAXP)

The Java API for XML Processing (JAXP) lets you validate, parse, and transform XML using several different APIs. JAXP provides both ease of use and vendor neutrality. This article, the first of a two-part series introducing JAXP, shows you how to take advantage of the API's parsing and validation features. Part 2 will cover XSL transformations using JAXP.




Java technology and XML are arguably the most important programming developments of the last five years. As a result, APIs for working with XML in the Java language have proliferated. The two most popular -- the Document Object Model (DOM) and the Simple API for XML (SAX) -- have generated a tremendous amount of interest, and JDOM and data-binding APIs have followed (see Resources). Understanding even one or two of these APIs thoroughly is quite a task; using all of them correctly makes you a guru. However, more and more Java developers are finding that they no longer need extensive knowledge of SAX and DOM -- thanks largely to Sun Microsystems' JAXP toolkit. The Java API for XML Processing (JAXP) makes XML manageable for even beginning Java programmers while still providing plenty of heft for advanced developers. That said, even advanced developers who use JAXP often have misconceptions about the very API they depend on.

This article assumes that you have some basic knowledge of SAX and DOM. If you're new to XML parsing, you might want to read up on SAX and DOM first through online sources or skim through my book (see Resources). You don't need to be fluent in callbacks or DOM Nodes, but you should at least understand that SAX and DOM are parsing APIs. It would also help to have a basic understanding of their differences. This article will make a lot more sense once you've picked up these basics.


JAXP: API or abstraction?


Strictly speaking, JAXP is an API, but it is more accurately called an abstraction layer. It doesn't provide a new means of parsing XML, nor does it add to SAX or DOM, or give new functionality to Java and XML handling. (If you're in disbelief at this point, you're reading the right article.) Instead, JAXP makes it easier to use DOM and SAX to deal with some difficult tasks. It also makes it possible to handle some vendor-specific tasks that you might encounter when using the DOM and SAX APIs, in a vendor-neutral way.















Going bigtime

In earlier versions of the Java platform, JAXP was a separate download from the core platform. With Java 5.0, JAXP has become a staple of the Java language. If you've got the latest version of the JDK (see Resources), then you've already got JAXP.


Without SAX, DOM, or another XML parsing API, you cannot parse XML. I have seen many requests for a comparison of SAX, DOM, JDOM, and dom4j to JAXP, but making such comparisons is impossible because the first four APIs serve a completely different purpose from JAXP. SAX, DOM, JDOM, and dom4j all parse XML. JAXP provides a means of getting to these parsers and the data that they expose, but doesn't offer a new way to parse an XML document. Understanding this distinction is critical if you're going to use JAXP correctly. It will also most likely put you miles ahead of many of your fellow XML developers.


If you're still dubious, make sure you have the JAXP distribution (see Going bigtime). Fire up a Web browser and load the JAXP API docs. Navigate to the parsing portion of the API, located in the javax.xml.parsers package. Surprisingly, you'll find only six classes. How hard can this API be? All of these classes sit on top of an existing parser. And two of them are just for error handling. JAXP is a lot simpler than people think. So why all the confusion?















Sitting on top of the world

Even JDOM and dom4j (see Resources), like JAXP, sit on top of other parsing APIs. Although both APIs provide a different model for accessing data from SAX or DOM, they use SAX internally (with some tricks and modifications) to get at the data they present to the user.


Sun's JAXP and Sun's parser


A lot of the parser/API confusion results from how Sun packages JAXP and the parser that JAXP uses by default. In earlier versions of JAXP, Sun included the JAXP API (with those six classes I just mentioned and a few more used for transformations) and a parser, called Crimson. Crimson was part of the com.sun.xml package. In newer versions of JAXP -- included in the JDK -- Sun has repackaged the Apache Xerces parser (see Resources). In both cases, though, the parser is part of the JAXP distribution, but not part of the JAXP API.


Think about it this way: JDOM ships with the Apache Xerces parser. That parser isn't part of JDOM, but is used by JDOM, so it's included to ensure that JDOM is usable out of the box. The same principle applies for JAXP, but it isn't as clearly publicized: JAXP comes with a parser so it can be used immediately. However, many people refer to the classes included in Sun's parser as part of the JAXP API itself. For example, a common question on newsgroups used to be, "How can I use the XMLDocument class that comes with JAXP? What is its purpose?" The answer is somewhat complicated.















What's in a (package) name?

When I first cracked open the source code to Java 1.5, I was surprised at what I saw -- or rather, at what I did not see. Instead of finding Xerces in it's normal package, org.apache.xerces, Sun relocated the Xerces classes to com.sun.org.apache.xerces.internal. (I find this a little disrespectful, but nobody asked me.) In any case, if you're looking for Xerces in the JDK, that's where it is.


First, the com.sun.xml.tree.XMLDocument class is not part of JAXP. It is part of Sun's Crimson parser, packaged in earlier versions of JAXP. So the question is misleading from the start. Second, a major purpose of JAXP is to provide vendor independence when dealing with parsers. With JAXP, you can use the same code with Sun's XML parser, Apache's Xerces XML parser, and Oracle's XML parser. Using a Sun-specific class, then, violates the point of using JAXP. Are you starting to see how this subject has gotten muddied? The parser and the API in the JAXP distribution have been lumped together, and some developers mistake classes and features from one as part of the other, and vice versa.


Now that you can see beyond all the confusion, you're ready to move on to some code and concepts.

























Back to top





Starting with SAX


SAX is an event-driven methodology for processing XML. It consists of many callbacks. For example, the startElement() callback is invoked every time a SAX parser comes across an element's opening tag. The characters() callback is called for character data, and then endElement() is called for the element's end tag. Many more callbacks are present for document processing, errors, and other lexical structures. You get the idea. The SAX programmer implements one of the SAX interfaces that defines these callbacks. SAX also provides a class called DefaultHandler (in the org.xml.sax.helpers package) that implements all of these callbacks and provides default, empty implementations of all the callback methods. (You'll see that this is important in my discussion of DOM in the next section, Dealing with DOM.) The SAX developer needs only extend this class, then implement methods that require insertion of specific logic. So the key in SAX is to provide code for these various callbacks, then let a parser trigger each of them when appropriate. Here's the typical SAX routine:



  1. Create a SAXParser instance using a specific vendor's parser implementation.

  2. Register callback implementations (by using a class that extends DefaultHandler, for example).

  3. Start parsing and sit back as your callback implementations are fired off.


JAXP's SAX component provides a simple means for doing all of this. Without JAXP, a SAX parser instance either must be instantiated directly from a vendor class (such as org.apache.xerces.parsers.SAXParser), or it must use a SAX helper class called XMLReaderFactory (also in the org.xml.sax.helpers package). The problem with the first methodology is obvious: It isn't vendor neutral. The problem with the second is that the factory requires, as an argument, the String name of the parser class to use (that Apache class, org.apache.xerces.parsers.SAXParser, again). You can change the parser by passing in a different parser class as a String. With this approach, if you change the parser name, you won't need to change any import statements, but you will still need to recompile the class. This is obviously not a best-case solution. It would be much easier to be able to change parsers without recompiling the class.


JAXP offers that better alternative: It lets you provide a parser as a Java system property. Of course, when you download a distribution from Sun, you get a JAXP implementation that uses Sun's version of Xerces. Changing the parser -- say, to Oracle's parser -- requires that you change a classpath setting, moving from one parser implementation to another, but it does not require code recompilation. And this is the magic -- the abstraction -- that JAXP is all about.















Sneaky SAX developers

I'm hedging a bit. With a little clever coding, you can make a SAX application pick up the parser class to use from a system property or a properties file. However, JAXP gives you this same behavior without any work at all, so many of you are better off going the JAXP route.


A look at the SAX parser factory


The JAXP SAXParserFactory class is the key to being able to change parser implementations easily. You must create a new instance of this class (which I'll look at in a moment). After the new instance is created, the factory provides a method for obtaining a SAX-capable parser. Behind the scenes, the JAXP implementation takes care of the vendor-dependent code, keeping your code happily unpolluted. This factory has some other nice features, as well.


In addition to the basic job of creating instances of SAX parsers, the factory lets you set configuration options. These options affect all parser instances obtained through the factory. The two most commonly used options available in JAXP 1.3 are to set namespace awareness with setNamespaceAware(boolean awareness), and to turn on DTD validation with setValidating(boolean validating). Remember that once these options are set, they affect all instances obtained from the factory after the method invocation.


Once you have set up the factory, invoking newSAXParser() returns a ready-to-use instance of the JAXP SAXParser class. This class wraps an underlying SAX parser (an instance of the SAX class org.xml.sax.XMLReader). It also protects you from using any vendor-specific additions to the parser class. (Remember the discussion about the XmlDocument class earlier in this article?) This class allows actual parsing behavior to be kicked off. Listing 1 shows how you can create, configure, and use a SAX factory:




Listing 1. Using the SAXParserFactory







  import java.io.OutputStreamWriter;
import java.io.Writer;
// JAXP
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;
// SAX
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class TestSAXParsing {
public static void main(String[] args) {
try {
if (args.length != 1) {
System.err.println ("Usage: java TestSAXParsing [filename]");
System.exit (1);
}
// Get SAX Parser Factory
SAXParserFactory factory = SAXParserFactory.newInstance();
// Turn on validation, and turn off namespaces
factory.setValidating(true);
factory.setNamespaceAware(false);
SAXParser parser = factory.newSAXParser();
parser.parse(new File(args[0]), new MyHandler());
} catch (ParserConfigurationException e) {
System.out.println("The underlying parser does not support " +
" the requested features.");
} catch (FactoryConfigurationError e) {
System.out.println("Error occurred obtaining SAX Parser Factory.");
} catch (Exception e) {
e.printStackTrace();
}
}
}
class MyHandler extends DefaultHandler {
// SAX callback implementations from ContentHandler, ErrorHandler, etc.
}



In Listing 1, you can see that two JAXP-specific problems can occur in using the factory: the inability to obtain or configure a SAX factory, and the inability to configure a SAX parser. The first of these problems, represented by a FactoryConfigurationError, usually occurs when the parser specified in a JAXP implementation or system property cannot be obtained. The second problem, represented by a ParserConfigurationException, occurs when a requested feature is not available in the parser being used. Both are easy to deal with and shouldn't pose any difficulty when using JAXP. In fact, you might want to write code that attempts to set several features and gracefully handles situations where a certain feature isn't available.


A SAXParser instance is obtained once you get the factory, turn off namespace support, and turn on validation; then parsing begins. The SAX parser's parse() method takes an instance of the SAX HandlerBase helper class that I mentioned earlier, which your custom handler class extends. See the code distribution to view the implementation of this class with the complete Java listing (see Download). You also pass in the File to parse. However, the SAXParser class contains much more than this single method.


Working with the SAX parser


Once you have an instance of the SAXParser class, you can do a lot more than just pass it a File to parse. Because of the way components in large applications communicate, it's not always safe to assume that the creator of an object instance is its user. One component might create the SAXParser instance, while another component (perhaps coded by another developer) might need to use that same instance. For this reason, JAXP provides methods to determine the parser's settings. For example, you can use isValidating() to determine if the parser will -- or will not -- perform validation, and isNamespaceAware() to see if the parser can process namespaces in an XML document. These methods can give you information about what the parser can do, but users with just a SAXParser instance -- and not the SAXParserFactory itself -- do not have the means to change these features. You must do this at the parser factory level.


You also have a variety of ways to request parsing of a document. Instead of just accepting a File and a SAX DefaultHandler instance, the SAXParser's parse() method can also accept a SAX InputSource, a Java InputStream, or a URL in String form, all with a DefaultHandler instance. So you can still parse documents wrapped in various forms.


Finally, you can obtain the underlying SAX parser (an instance of org.xml.sax.XMLReader) and use it directly through the SAXParser's getXMLReader() method. Once you get this underlying instance, the usual SAX methods are available. Listing 2 shows examples of the various uses of the SAXParser class, the core class in JAXP for SAX parsing:




Listing 2. Using the JAXP SAXParser class







    // Get a SAX Parser instance
SAXParser saxParser = saxFactory.newSAXParser();
// Find out if validation is supported
boolean isValidating = saxParser.isValidating();
// Find out if namespaces are supported
boolean isNamespaceAware = saxParser.isNamespaceAware();
// Parse, in a variety of ways
// Use a file and a SAX DefaultHandler instance
saxParser.parse(new File(args[0]), myDefaultHandlerInstance);
// Use a SAX InputSource and a SAX DefaultHandler instance
saxParser.parse(mySaxInputSource, myDefaultHandlerInstance);
// Use an InputStream and a SAX DefaultHandler instance
saxParser.parse(myInputStream, myDefaultHandlerInstance);
// Use a URI and a SAX DefaultHandler instance
saxParser.parse("http://www.newInstance.com/xml/doc.xml",
myDefaultHandlerInstance);
// Get the underlying (wrapped) SAX parser
org.xml.sax.XMLReader parser = saxParser.getXMLReader();
// Use the underlying parser
parser.setContentHandler(myContentHandlerInstance);
parser.setErrorHandler(myErrorHandlerInstance);
parser.parse(new org.xml.sax.InputSource(args[0]));



Up to this point, I've talked a lot about SAX, but I haven't unveiled anything remarkable or surprising. JAXP's added functionality is fairly minor, especially where SAX is involved. This minimal functionality makes your code more portable and lets other developers use it, either freely or commercially, with any SAX-compliant XML parser. That's it. There's nothing more to using SAX with JAXP. If you already know SAX, you're about 98 percent of the way there. You just need to learn two new classes and a couple of Java exceptions, and you're ready to roll. If you've never used SAX, it's easy enough to start now.

























Back to top





Dealing with DOM


If you think you need to take a break to gear up for the challenge of DOM, you can save yourself some rest. Using DOM with JAXP is nearly identical to using it with SAX; all you do is change two class names and a return type, and you are pretty much there. If you understand how SAX works and what DOM is, you won't have any problem.


The primary difference between DOM and SAX is the structures of the APIs themselves. SAX consists of an event-based set of callbacks, while DOM has an in-memory tree structure. With SAX, there's never a data structure to work on (unless the developer creates one manually). SAX, therefore, doesn't give you the ability to modify an XML document. DOM does provide this functionality. The org.w3c.dom.Document class represents an XML document and is made up of DOM nodes that represent the elements, attributes, and other XML constructs. So JAXP doesn't need to fire SAX callbacks; it's responsible only for returning a DOM Document object from parsing.


A look at the DOM parser factory


With this basic understanding of DOM and the differences between DOM and SAX, you don't need to know much more. The code in Listing 3 looks remarkably similar to the SAX code in Listing 1. First, a DocumentBuilderFactory is obtained (in the same way that SAXParserFactory was in Listing 1). Then the factory is configured to handle validation and namespaces (in the same way that it was in SAX). Next, a DocumentBuilder instance, the analog to SAXParser, is retrieved from the factory (in the same way . . . you get the idea). Parsing can then occur, and the resultant DOM Document object is handed off to a method that prints the DOM tree:




Listing 3. Using the DocumentBuilderFactory







  import java.io.File;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
// JAXP
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
// DOM
import org.w3c.dom.Document;
import org.w3c.dom.DocumentType;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class TestDOMParsing {
public static void main(String[] args) {
try {
if (args.length != 1) {
System.err.println ("Usage: java TestDOMParsing " +
"[filename]");
System.exit (1);
}
// Get Document Builder Factory
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
// Turn on validation, and turn off namespaces
factory.setValidating(true);
factory.setNamespaceAware(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File(args[0]));
// Print the document from the DOM tree and
// feed it an initial indentation of nothing
printNode(doc, "");
} catch (ParserConfigurationException e) {
System.out.println("The underlying parser does not " +
"support the requested features.");
} catch (FactoryConfigurationError e) {
System.out.println("Error occurred obtaining Document " +
"Builder Factory.");
} catch (Exception e) {
e.printStackTrace();
}
}
private static void printNode(Node node, String indent) {
// print the DOM tree
}
}



Two problems can arise with this code (as with SAX in JAXP): a FactoryConfigurationError and a ParserConfigurationException. The cause of each is the same as it is with SAX. Either a problem is present in the implementation classes (resulting in a FactoryConfigurationError), or the parser provided doesn't support the requested features (resulting in a ParserConfigurationException). The only difference between DOM and SAX in this respect is that with DOM you substitute DocumentBuilderFactory for SAXParserFactory, and DocumentBuilder for SAXParser. It's that simple. (You can view the complete code listing, which includes the method used to print out the DOM tree; see Download.)


Working with the DOM parser


Once you have a DOM factory, you can obtain a DocumentBuilder instance. The methods available to a DocumentBuilder instance are very similar to those available to its SAX counterpart. The major difference is that variations of the parse() method do not take an instance of the SAX DefaultHandler class. Instead they return a DOM Document instance representing the XML document that was parsed. The only other difference is that two methods are provided for SAX-like functionality:



  • setErrorHandler(), which takes a SAX ErrorHandler implementation to handle problems that might arise in parsing

  • setEntityResolver(), which takes a SAX EntityResolver implementation to handle entity resolution


Listing 4 shows examples of these methods in action:




Listing 4. Using the JAXP DocumentBuilder class







    // Get a DocumentBuilder instance
DocumentBuilder builder = builderFactory.newDocumentBuilder();
// Find out if validation is supported
boolean isValidating = builder.isValidating();
// Find out if namespaces are supported
boolean isNamespaceAware = builder.isNamespaceAware();
// Set a SAX ErrorHandler
builder.setErrorHandler(myErrorHandlerImpl);
// Set a SAX EntityResolver
builder.setEntityResolver(myEntityResolverImpl);
// Parse, in a variety of ways
// Use a file
Document doc = builder.parse(new File(args[0]));
// Use a SAX InputSource
Document doc = builder.parse(mySaxInputSource);
// Use an InputStream
Document doc = builder.parse(myInputStream, myDefaultHandlerInstance);
// Use a URI
Document doc = builder.parse("http://www.newInstance.com/xml/doc.xml");



If you're a little bored reading this section on DOM, you're not alone; I found it a little boring to write because applying what you've learned about SAX to DOM is so straightforward.

























Back to top





Performing validation


In Java 5.0 (and JAXP 1.3), JAXP introduces a new way to validate documents. Instead of simply using the setValidating() method on a SAX or DOM factory, validation is broken out into several classes within the new javax.xml.validation package. I would need more space than I have in this article to detail all the nuances of validation -- including W3C XML Schema, DTDs, RELAX NG schemas, and other constraint models -- but if you already have some constraints, it's pretty easy to use the new validation model and ensure that your document matches up with them.















Redundancy isn't always good

One thing you should not do is use setValidating(true) and the javax.xml.validation package. You'll get some nasty errors, and most of them are hard to track down. It's best to make a habit of never calling setValidating() -- which defaults to false -- and to use the new JAXP validation framework instead.


First, convert your constraint model -- presumably a file on disk somewhere -- into a format that JAXP can use. Load the file into a Source instance. (I'll cover Source in more detail in Part 2; for now, just know that it represents a document somewhere, on disk, as a DOM Document or just about anything else.) Then, create a SchemaFactory and load the schema using SchemaFactory.newSchema(Source), which returns a new Schema object. Finally, with this Schema object, create a new Validator object with Schema.newValidator(). Listing 5 should make everything I've just said much clearer:




Listing 5. Using the JAXP validation framework







    DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File(args[0]));
// Handle validation SchemaFactory constraintFactory =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Source constraints = new StreamSource(new File(args[1]));
Schema schema = constraintFactory.newSchema(constraints);
Validator validator = schema.newValidator();
// Validate the DOM tree
try {
validator.validate(new DOMSource(doc));
System.out.println("Document validates fine.");
} catch (org.xml.sax.SAXException e) {
System.out.println("Validation error: " + e.getMessage());
}



This is pretty straightforward once you get the hang of it. Type this code in yourself, or check out the full listing (see Download).

























Back to top





Changing the parser


It's easy to change out the parser that the JAXP factory classes use. Changing the parser actually means changing the parser factory, because all SAXParser and DocumentBuilder instances come from these factories. The factories determine which parser is loaded, so it's the factories that you must change. To change the implementation of the SAXParserFactory interface, set the Java system property javax.xml.parsers.SAXParserFactory. If this property isn't defined, then the default implementation (whatever parser your vendor specified) is returned. The same principle applies for the DocumentBuilderFactory implementation you use. In this case, the javax.xml.parsers.DocumentBuilderFactory system property is queried.

























Back to top





Summary


Having read this article, you've seen almost the entire scope of JAXP:



  • Provide hooks into SAX

  • Provide hooks into DOM

  • Allow the parser to easily be changed out


To understand JAXP's parsing and validation features, you'll wade through very little tricky material. The most difficult parts of putting JAXP to work are changing a system property, setting validation through a factory instead of a parser or builder, and getting clear on what JAXP isn't. JAXP provides a helpful pluggability layer over two popular Java and XML APIs. It makes your code vendor neutral and lets you to change from parser to parser without ever recompiling your parsing code. So download JAXP and go to it! Part 2 will show you how JAXP can help you transform XML documents.
























Back to top





Download

















Description Name Size Download method
Sample code for All about JAXP x-jaxp-all-about.zip 5 KB FTP

JAXP (Java XML)

Introduction


After the first release of the W3C XML 1.0 recommendation in early 1998, XML started gaining huge popularity. Sun Microsystems Inc., at that time had just formalized the Java Community Process (JCP), and the first version of JAXP (JSR-05) was made public in early 2000, supported by industry majors like (in chronological order) BEA Systems, Fujitsu Limited, Hewlett-Packard, IBM, Netscape Communications, Oracle, and Sun Microsystems, Inc.

JAXP 1.0, then called Java API for XML Parsing, was a box office hit in the developer community, because of the pluggability layer provided by JAXP; that's what the essence of JAXP is. Developers can write program independent of the underlying XML processor by using the JAXP APIs, and can replace the underlying XML processor by choice without even changing a single line of application code.


So what exactly is JAXP? First of all, there has been some confusion in the past about the P in JAXP: Parsing or Processing? Because JAXP 1.0 supported only parsing, therefore, it was called Java API for XML Parsing. But in JAXP 1.1 (JSR-63), XML transformation was introduced using XSL-T. Unfortunately, the W3C XSL-T specification does not provide any APIs for transformation. Therefore, the JAXP 1.1 Expert Group (EG) introduced a set of APIs called Transformation API for XML (TrAX) in JAXP 1.1, and since then, JAXP is called Java API for XML Processing. Thereafter, JAXP has evolved to an extent, where now it supports a lot more things (like validation against schema while parsing, validation against preparsed schema, evaluating XPath expressions, etc.,) than only parsing an XML document.





So, JAXP is a lightweight API to process XML documents by being agnostic of the underlying XML processor, which are pluggable.


XML Parsing Using JAXP


JAXP supports Object-based and Event-based parsing. In Object-based, only W3C DOM parsing is supported so far. Maybe in future versions of JAXP, the EG might decide to support J-DOM as well. In Event-based, only SAX parsing is supported. Another Event-based parsing called Pull Parsing, should have been made part of JAXP. But, there is a different JSR (#173) filed for pull parsing, also known as Streaming API for XML (StAX) parsing, and nothing much can be done about that now.


Figure 1

Figure 1: Various mechanism of parsing XML.


Simple API for XML (SAX) Parsing

SAX APIs were proposed by David Megginson (in early 1998) as an effort towards a standard API for event-based parsing of XML (read the genesis of SAX here). Even though SAX is not a W3C REC, it is surely the de facto industry standard for parsing XML documents.


SAX parsing is an event-based, push-parsing mechanism, which generates events for the <opening> tags, </closing> tags, the character data, and so on. A SAX parser parses an XML document in a streaming fashion (forward only) and reports the events, in the sequence encountered, to the registered content handler, org.xml.sax.ContentHandler, (Don't get confused with the java.net.ContentHandler.) and errors (if any) to the registered error handler, org.xml.sax.ErrorHandler.


If you don't register an error handler, you will never know if there was any error while parsing the XML, and what it was. Therefore, it becomes extremely important to always register a meaningful error handler while SAX parsing an XML document.


If the application needs to be informed of the parsing events (and process it), it must implement the org.xml.sax.ContentHandler interface and register it with the SAX parser. A typical sequence of events reported through the callbacks could be startDocument, startElement, characters, endElement, endDocument, in that order. startDocument is called only once before reporting any other event. Similarly, endDocument is called only once after the entire XML is parsed successfully. See the javadocs for more details.


Figure 2

Figure 2: SAX Parsing XML


Snippet to SAX parse an XML document using JAXP:


        SAXParserFactory spfactory = SAXParserFactory.newInstance();
spfactory.setNamespaceAware(true);
SAXParser saxparser = spfactory.newSAXParser();
//write your handler for processing events and handling error
DefaultHandler handler = new MyHandler();
//parse the XML and report events and errors (if any) to the handler
saxparser.parse(new File("data.xml"), handler);

Document Object Model (DOM) Parsing

DOM parsing is an object-based parsing mechanism, which generates an XML object model: an inverted tree-like data structure, which represents the XML document. Every element node in the object model represents a pair of <opening> and </closing> tags in the XML. A DOM parser reads the entire XML file and creates an in-memory data structure called DOM. If the DOM parser is W3C compliant, then, the DOM created is a W3C DOM, which can be traversed or modified using the org.w3c.dom APIs.


Most of the DOM parsers also allow you to create an in-memory DOM structure from scratch, rather than just parsing an XML to a DOM.


Figure 3

Figure 3: DOM Parsing XML


Snippet to DOM parse an XML document using JAXP:


          DocumentBuilderFactory dbfactory = DocumentBuilderFactory.newInstance();
dbfactory.setNamespaceAware(true);
DocumentBuilder domparser = dbfactory.newDocumentBuilder();
//parse the XML and create the DOM
Document doc = domparser.parse(new File("data.xml"));
//to create a new DOM from scratch -
//Document doc = domparser.newDocument();
//once you have the Document handle, then you can use
//the org.w3c.dom.* APIs to traverse or modify the DOM...

Parsing in Validating Mode


Validation Against DTD

DTD is a grammar for XML documents. Often people think that DTD is something alien because it has a different syntax than XML, but DTD is an integral part of W3C XML 1.0. If an XML instance document has a DOCTYPE declaration, then to turn on validation against DTD, while parsing XML, you need to set the validating feature to true using the setValidating method on the appropriate factory. For example:


        DocumentBuilderFactory dbfactory = DocumentBuilderFactory.newInstance();
dbfactory.setValidating(true);
OR
SAXParserFactory spfactory = SAXParserFactory.newInstance();
spfactory.setValidating(true);

Note that, even if the validation is turned off, and if the XML instance has a DOCTYPE declaration to an external DTD, the parser always tries to load that DTD. This is done to ensure that any entity references in the XML instance (entity declarations being in the DTD) are expanded properly, which otherwise might lead to a malformed XML document, until and unless the standalone attribute on the XML declaration prolog is set to true, in which case the external DTD is ignored completely. For example:


        <?xml version="1.1" encoding="UTF-8" standalone="yes"?>          

Validation Against W3C XMLSchema (WXS)

XMLSchema is yet another grammar for XML documents, and has gained huge popularity because of the XML syntax it uses, and the richness it provides to define fine grained validation constraints. If an XML instance document points to XMLSchema using the "schemaLocation" and "noNamespaceSchemaLocation" hints, then to turn on validation against XMLSchema, you need to do the following things:



  1. Set the validating feature to true using the setValidating method on SAXParserFactory or DocumentBuilderFactory, as mentioned above.

  2. Set the property "http://java.sun.com/xml/jaxp/properties/schemaLanguage" with the corresponding value as "http://www.w3.org/2001/XMLSchema"


Note that, in this case, even if a DOCTYPE exists in the XML instance, the instance won't be validated against DTD. But as mentioned earlier, surely it would be loaded so that any entity references can be expanded properly.


Since "schemaLocation" and "noNamespaceSchemaLocation" are just hints, the schemas can also be provided externally to override these hints, using the property "http://java.sun.com/xml/jaxp/properties/schemaSource". The acceptable value for this property must be one of the following:



  • java.lang.String that points to the URI of the schema

  • java.io.InputStream with the contents of the schema

  • org.xml.sax.InputSource

  • java.io.File

  • an array of java.lang.Object with the contents being one of the types defined above.


For example:


        SAXParserFactory spfactory = SAXParserFactory.newInstance();
spfactory.setNamespaceAware(true);
//turn the validation on
spfactory.setValidating(true);
//set the validation to be against WXS
saxparser.setProperty("http://java.sun.com/xml/jaxp/properties/
schemaLanguage", "http://www.w3.org/2001/XMLSchema");
//set the schema against which the validation is to be done
saxparser.setProperty("http://java.sun.com/xml/jaxp/properties/
schemaSource", new File("myschema.xsd"));

XML Transformation Using the TrAX APIs in JAXP


W3C XSL-T defines transformation rules to transform a source tree into a result tree. A transformation expressed in XSL-T is called a stylesheet. To transform an XML document using JAXP, you need to create a Transformer using the stylesheet. Once a Transformer is created, it takes the XML input to be transformed as a JAXP Source, and returns the transformed result as a JAXP Result. There are three types of sources and results that JAXP provides: StreamSource, SAXSource, DOMSource and StreamResult, SAXResult, DOMResult, which can be used in any combination for transformation.


Figure 4

Figure: XML Transformation


For example, to generate SAX events from DOM:


        //parse the XML file to a W3C DOM
DocumentBuilderFactory dbfactory = DocumentBuilderFactory.newInstance();
dbfactory.setNamespaceAware(true);
DocumentBuilder domparser = dbfactory.newDocumentBuilder();
Document doc = domparser.parse(new File("data.xml"));
//prepare the DOM source
Source xmlsource = new DOMSource(doc);
//create a content handler to handle the SAX events
ContentHandler handler = new MyHandler();
//prepare a SAX result using the content handler
Result result = new SAXResult(handler);
//create a transformer factory
TransformerFactory xfactory = TransformerFactory.newInstance();
//create a transformer
Transformer xformer = xfactory.newTransformer();
//transform to raise the SAX events from DOM
xformer.transform(xmlsource, result);

In the above example, we haven't used any XSL while creating the Transformer. This means the Transformer would merely pour the XML from the Source to the Result without any transformation. When you want to actually transform using a XSL, then you should create the Transformer using the XSL source as follows:


        //create the xsl source
Source xslsource = new StreamSource(new File("mystyle.xsl"));
//create the transformer using the xsl source
Transformer xformer = xfactory.newTransformer(xslsource);

What's New in JAXP 1.3


Apart from supporting SAX parsing, DOM parsing, validation against DTD/XMLSchema while parsing, transformation using XSL-T, from the previous versions, JAXP 1.3 additionally supports:



  1. XML 1.1 and Namespaces in XML 1.1

  2. XML Inclusions - XInclude 1.0

  3. Validation of instance against preparsed schema (XMLSchema and RELAX-NG).

  4. Evaluating XPath expressions.

  5. XML/Java type mappings for those datatypes in XMLSchema 1.0, XPath 2.0 and XQuery 1.0 for which there wasn't any XML/Java mappings earlier.


Using JAXP 1.3


XML 1.1 and XInclude 1.0


Major things supported in XML 1.1 are:



  1. forward compatibility for the ever-growing Unicode character set.

  2. addition of NEL (#x85) and the Unicode line separator character (#x2028) to the list of line-end characters.


Changes in XML 1.1 are not fully backward compatible with XML 1.0 and also break the well-formedness rules defined in XML 1.0. Therefore, a new specification, XML 1.1, was proposed rather than simply updating the existing XML 1.0 specification.


To use XML 1.1 and the Namespaces in XML 1.1 feature, you must set the value of the version attribute, in the XML declaration prolog, of your XML document, to "1.1." For example:


        <?xml version="1.1" encoding="UTF-8" standalone="yes"?>          

XInclude allows an XML document to include other XML documents. For example:


        <myMainXMLDoc xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include href="fragment.xml"/>
  ...
</myMainXMLDoc>

To allow XML inclusions, you must set the XInclude feature on the appropriate factory as follows:


        DocumentBuilderFactory dbfactory = DocumentBuilderFactory.newInstance();
dbfactory.setXIncludeAware(true);

Validating a JAXP Source Against a Preparsed Schema


javax.xml.validation package provides support for parsing a schema, and validating XML instance documents against those preparsed schemas. A JAXP DOMSource or a SAXSource can be validated against a preparsed schema. The preparsed schema can be cached for optimization, if required. Note that the JAXP StreamSource is not supported and that the schema can be either a W3C XML Schema or an OASIS RELAX-NG. For example:


          //parse an XML in non-validating mode and create a DOMSource
DocumentBuilderFactory dbfactory = DocumentBuilderFactory.newInstance();
dbfactory.setNamespaceAware(true);
dbfactory.setXIncludeAware(true);
DocumentBuilder parser = dbfactory.newDocumentBuilder();
Document doc = parser.parse(new File("data.xml"));
DOMSource xmlsource = new DOMSource(doc);
//create a SchemaFactory for loading W3C XML Schemas
SchemaFactory wxsfactory =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
//set the errorhandler for handling errors in schema itself
wxsfactory.setErrorHandler(schemaErrorHandler);
//load a W3C XML Schema
Schema schema = wxsfactory.newSchema(new File("myschema.xsd"));
// create a validator from the loaded schema
Validator validator = schema.newValidator();
//set the errorhandler for handling validation errors
validator.setErrorHandler(validationErrorHandler);
//validate the XML instance
validator.validate(xmlsource);

Evaluating XPath Expressions


javax.xml.xpath package provides support for executing XPath expressions against a given XML document. The XPath expressions can be compiled for performance reasons, if it is to be reused.


By the way, the XPath APIs in JAXP are designed to be stateless, which means every time you  want to evaluate an XPath expression, you also need to pass in the XML document. Often, many XPath expressions are evaluated against a single XML document. In such a case, it would have been better if the XPath APIs in JAXP were made stateful by passing the XML document once. The underlying implementation would then have had a choice of storing the XML source in an optimized fashion (say, a DTM) for faster evaluation of XPath expressions.


An example to evaluate the XPath expressions against the following XML document:


        <?xml version="1.0"?>
<employees>
<employee>
<name>e1</name>
</employee>
<employee>
<name>e2</name>
</employee>
</employees>

          //parse an XML to get a DOM to query
DocumentBuilderFactory dbfactory = DocumentBuilderFactory.newInstance();
dbfactory.setNamespaceAware(true);
dbfactory.setXIncludeAware(true);
DocumentBuilder parser = dbfactory.newDocumentBuilder();
Document doc = parser.parse(new File("data.xml"));
//get an XPath processor
XPathFactory xpfactory = XPathFactory.newInstance();
XPath xpathprocessor = xpfactory.newXPath();
//set the namespace context for resolving prefixes of the Qnames
//to NS URI, if the xpath expresion uses Qnames. XPath expression
//would use Qnames if the XML document uses namespaces.
//xpathprocessor.setNamespaceContext(NamespaceContext nsContext);
//create XPath expressions
String xpath1 = "/employees/employee";
XPathExpression employeesXPath = xpathprocessor.compile(xpath1);
String xpath2 = "/employees/employee[1]";
XPathExpression employeeXPath  = xpathprocessor.compile(xpath2);
String xpath3 = "/employees/employee[1]/name";
XPathExpression empnameXPath  = xpathprocessor.compile(xpath3);
//execute the XPath expressions
System.out.println("XPath1="+xpath1);
NodeList employees = (NodeList)employeesXPath.evaluate(doc,
XPathConstants.NODESET);
for (int i=0; i<employees.getLength(); i++) {
System.out.println(employees.item(i).getTextContent());
}
System.out.println("XPath2="+xpath2);
Node employee = (Node)employeeXPath.evaluate(doc, XPathConstants.NODE);
System.out.println(employee.getTextContent());
System.out.println("XPath3="+xpath3);
String empname = empnameXPath.evaluate(doc);
System.out.println(empname);

XML/Java-type Mappings


Datatypes in XMLSchema 1.0 are quite exhaustive and popular, and are used by many other XML specifications as well, like XPath, XQuery, WSDL, etc... Most of these datatypes naturally map to the primitive or wrapper datatypes in Java. The rest of the datatypes like dateTime, duration, etc., can be mapped to the new Java types: javax.xml.datatype.XMLGregorianCalendar, javax.xml.datatype.Duration, and javax.xml.namespace.QName. Thus, along with the new datatypes defined in javax.xml.datatype package, all the datatypes supported in XMLSchema 1.0, XPath 2.0 and XQuery 1.0 now have an equivalent datatype mapping in Java.


But, the datatype support would have been much better from a usability perspective if the DatatypeFactory had methods to get a Java object for the given WXS datatype, which has methods to constrain the datatypes using facets, and validate a value against the datatype.


An example using Oracle's XDK:


          import oracle.xml.parser.schema.*;
. . .
//create a simpleType object
XSDSimpleType st = XSDSimpleType.getPrimitiveType(XSDSimpleType.iSTRING);
//set a constraining facet on the simpleType
st.setFacet(XSDSimpleType.LENGTH, "5");
//validate value
st.validateValue("hello");

Changing the Underlying Implementation


A JAXP implementation comes with a default parser, transformer, xpath engine, and a schema validator, but, as mentioned earlier, JAXP is a pluggable API, and we can plug in any JAXP complaint processor to change the defaults. To do that we must set the appropriate javax.xml.xxx.yyyFactory property pointing to the fully qualified class name of the new yyyFactory. Then, when yyyFactory.newInstance() is invoked, JAXP uses the following ordered lookup procedure to determine the implementation class to load:



  1. Use the javax.xml.xxx.yyyFactory system property.

  2. Use the properties file "lib/jaxp.properties" in the JRE directory. The jaxp.properties file is read only once by the JAXP 1.3 implementation and its values are then cached for future use. If the file does not exist when the first attempt is made to read from it, no further attempts are made to check for its existence. It is not possible to change the value of any property in jaxp.properties after it has been read for the first time.

  3. Use the Services API (as detailed in the JAR specification), if available, to determine the classname. The Services API will look for the classname in the file META-INF/services/javax.xml.xxx.yyyFactory in jars available to the runtime.

  4. Use the platform default javax.xml.xxx.yyyFactory instance


where javax.xml.xxx.yyyFactory can be one of the following:

javax.xml.parsers.SAXParserFactory

javax.xml.parsers.DocumentBuilderFactory

javax.xml.transform.TransformerFactory

javax.xml.xpath.XPathFactory

javax.xml.validation.SchemaFactory:schemaLanguage (schemaLanguage is the parameter passed to the newInstance method of SchemaFactory)


For example, to plug in a JAXP complaint SAX parser, say Apache's Xerces, you must set the property javax.xml.parsers.SAXParserFactory to org.apache.xerces.jaxp.SAXParserFactoryImpl, in any of the four ways mentioned above. One of the ways is shown below:


       java -Djavax.xml.parsers.SAXParserFactory=
org.apache.xerces.jaxp.SAXParserFactoryImpl MyApplicationProgram

 

J2SE 5.0 in a Nutshell

Java 2 Platform Standard Edition (J2SE) 5.0 ("Tiger") is the next major revision to the Java platform and language; it is currently slated to contain 15 component JSRs with nearly 100 other significant updates developed through the Java Community Process (JCP).


NOTE: The external version number of this release is 5.0 and its internal version number is 1.5.0, as described at J2SE Naming and Versioning.

With so many exciting changes in this release, you may be wondering where you should start. As in previous releases, the comprehensive list of all changes is available in the Release notes guide. This article, from the J2SE team, will take you through the major changes so that you have a grasp of what J2SE 5.0 has to offer, before diving into the API docs.


The J2SE 5.0 release is focused along certain key themes:



There are a small number of features that are just as important but didn't neatly fit in with the themes; they are listed at the end:




Ease of Development


You may have already seen reports about some of the new Java Language changes that comprise the Ease of Development theme. The changes include generic types, metadata, autoboxing, an enhanced for loop, enumerated types, static import, C style formatted input/output, variable arguments, concurrency utilities, and simpler RMI interface generation.


JSR-201 contains four of these language changes; enhanced for loop, enumerated types, static import and autoboxing; JSR-175 specifies the new metadata functionality, while JSR-14 details generic types.


The new default language specification implemented by the javac compiler is version 5.0, also known as 1.5, so you do not need to supply the option -source 1.5 (as required in beta1).


Metadata


The metadata feature in J2SE 5.0 provides the ability to associate additional data alongside Java classes, interfaces, methods, and fields. This additional data, or annotation, can be read by the javac compiler or other tools, and depending on configuration can also be stored in the class file and can be discovered at runtime using the Java reflection API.


One of the primary reasons for adding metadata to the Java platform is to enable development and runtime tools to have a common infrastructure and so reduce the effort required for programming and deployment. A tool could use the metadata information to generate additional source code, or to provide additional information when debugging.


In beta2 we are pleased to announce the availability of an annotation processing tool called apt. Apt includes a set of new reflective APIs and supporting infrastructure to process program annotations. The apt reflective APIs provide a build-time, source-based, read-only view of program structure which cleanly models the Java programming language's type system. First, apt runs annotation processors that can produce new source code and other files. Next, apt can cause compilation of both original and generated source files, easing development. For more information on apt refer to the apt guide.


In the following example code you can additionally create an AnnotationFactory processor for apt to generate code or documentation when finding the debug annotation tag.









import java.lang.annotation.*;
import java.lang.reflect.*;

@Retention(java.lang.annotation.RetentionPolicy.RUNTIME) @interface debug {
boolean devbuild() default false;
int counter();
}

public class MetaTest {
final boolean production=true;

@debug(devbuild=production,counter=1) public void testMethod() {
}


public static void main(String[] args) {

MetaTest mt = new MetaTest();
try {
Annotation[] a = mt.getClass().getMethod("testMethod").getAnnotations();
for (int i=0; i<a.length ; i++) {
System.out.println("a["+i+"]="+a[i]+" ");
}
} catch(NoSuchMethodException e) {
System.out.println(e);
}
}
}





With a metadata processing tool, many repetitive coding steps could be reduced to a concise metadata tag. For example, the remote interface required when accessing a JAX-RPC service implementation could be implemented as follows:


Before









public interface PingIF extends Remote {
public void ping() throws RemoteException;
}

public class Ping implements PingIF {
public void ping() {
}
}





After









public class Ping {
public @remote void ping() {
}
}





Generic Types


Generic types have been widely anticipated by the Java Community and are now part of J2SE 5.0. One of the first places to see generic types in action is the Collections API. The Collections API provides common functionality like LinkedLists, ArrayLists and HashMaps that can be used by more than one Java type. The next example uses the 1.4.2 libraries and the default javac compile mode.









ArrayList list = new ArrayList();
list.add(0, new Integer(42));
int total = ((Integer)list.get(0)).intValue();





The cast to Integer on the last line is an example of the typecasting issues that generic types aim to prevent. The issue is that the 1.4.2 Collection API uses the Object class to store the Collection objects, which means that it cannot pick up type mismatches at compile time. The first notification of a problem is a ClassCastException at runtime.


The same example with the generified Collections library is written as follows:









ArrayList<Integer> list =  new ArrayList<Integer>();
list.add(0, new Integer(42));
int total = list.get(0).intValue();





The user of a generified API has to simply declare the type used at compile type using the <> notation. No casts are needed and in this example trying to add a String object to an Integer typed collection would be caught at compile time.


Generic types therefore enable an API designer to provide common functionality that can be used with multiple data types and which also can be checked for type safety at compile time.


Designing your own Generic APIs is a little more complex that simply using them. To get started look at the java.util.Collection source and also the API guide.


Autoboxing and Auto-Unboxing of Primitive Types


Converting between primitive types, like int, boolean, and their equivalent Object-based counterparts like Integer and Boolean, can require unnecessary amounts of extra coding, especially if the conversion is only needed for a method call to the Collections API, for example.


The autoboxing and auto-unboxing of Java primitives produces code that is more concise and easier to follow. In the next example an int is being stored and then retrieved from an ArrayList. The 5.0 version leaves the conversion required to transition to an Integer and back to the compiler.


Before









ArrayList<Integer> list = new ArrayList<Integer>();
list.add(0, new Integer(42));
int total = (list.get(0)).intValue();





After









ArrayList<Integer> list = new ArrayList<Integer>();
list.add(0, 42);
int total = list.get(0);





Enhanced for Loop


The Iterator class is used heavily by the Collections API. It provides the mechanism to navigate sequentially through a Collection. The new enhanced for loop can replace the iterator when simply traversing through a Collection as follows. The compiler generates the looping code necessary and with generic types no additional casting is required.


Before









ArrayList<Integer> list = new ArrayList<Integer>();
for (Iterator i = list.iterator(); i.hasNext();) {
Integer value=(Integer)i.next();
}






After









ArrayList<Integer> list = new ArrayList<Integer>();  
for (Integer i : list) { ... }





Enumerated Types


This type provides enumerated type when compared to using static final constants. If you have previously used the identifier enum in your own application, you will need to adjust the source when compiling with javac -source 1.5 (or its synonym -source 5).








public enum StopLight { red, amber, green };




Static Import


The static import feature, implemented as "import static", enables you to refer to static constants from a class without needing to inherit from it. Instead of BorderLayout.CENTER each time we add a component, we can simply refer to CENTER.









import static java.awt.BorderLayout.*;

getContentPane().add(new JPanel(), CENTER);





Formatted Output


Developers now have the option of using printf-type functionality to generate formatted output. This will help migrate legacy C applications, as the same text layout can be preserved with little or no change.


Most of the common C printf formatters are available, and in addition some Java classes like Date and BigInteger also have formatting rules. See the java.util.Formatter class for more information. Although the standard UNIX newline '\n' character is accepted, for cross-platform support of newlines the Java %n is recommended.









System.out.printf("name count%n");
System.out.printf("%s %5d%n", user,total);





Formatted Input


The scanner API provides basic input functionality for reading data from the system console or any data stream. The following example reads a String from standard input and expects a following int value.


The Scanner methods like next and nextInt will block if no data is available. If you need to process more complex input, then there are also pattern-matching algorithms, available from the java.util.Formatter class.









Scanner s= new Scanner(System.in);
String param= s.next();
int value=s.nextInt();
s.close();





Varargs


The varargs functionality allows multiple arguments to be passed as parameters to methods. It requires the simple ... notation for the method that accepts the argument list and is used to implement the flexible number of arguments required for printf.









void argtest(Object ... args) {
for (int i=0;i <args.length; i++) {
}
}

argtest("test", "data");





Concurrency Utilities


The concurrency utility library, led by Doug Lea in JSR-166, is a special release of the popular concurrency package into the J2SE 5.0 platform. It provides powerful, high-level thread constructs, including executors, which are a thread task framework, thread safe queues, Timers, locks (including atomic ones), and other synchronization primitives.


One such lock is the well known semaphore. A semaphore can be used in the same way that wait is used now, to restrict access to a block of code. Semaphores are more flexible and can also allow a number of concurrent threads access, as well as allow you to test a lock before acquiring it. The following example uses just one semaphore, also known as a binary semaphore. See the java.util.concurrent package for more information.









final  private Semaphore s= new Semaphore(1, true);

s.acquireUninterruptibly(); //for non-blocking version use s.acquire()

try {
balance=balance+10; //protected value
} finally {
s.release(); //return semaphore token
}






rmic -- The RMI Compiler


You no longer need to use rmic, the rmi compiler tool, to generate most remote interface stubs. The introduction of dynamic proxies means that the information normally provided by the stubs can be discovered at runtime. See the RMI release notes for more information.



Scalability and Performance


The 5.0 release promises improvements in scalability and performance, with a new emphasis on startup time and memory footprint, to make it easier to deploy applications running at top speed.


One of the more significant updates is the introduction of class data sharing in the HotSpot JVM. This technology not only shares read-only data between multiple running JVMs, but also improves startup time, as core JVM classes are pre-packed.


Performance ergonomics are a new feature in J2SE 5.0. This means that if you have been using specialized JVM runtime options in previous releases, it may be worth re-validating your performance with no or minimal options.



Monitoring and Manageability


Monitoring and Manageability is a key component of RAS (Reliability, Availability, Serviceability) in the Java platform.


The J2SE 5.0 release provides comprehensive monitoring and management support: instrumentation to observe the Java virtual machine, Java Management Extensions (JMX) framework, and remote access protocols. All this is ready to be used out-of-the-box. (See the Management and Monitoring release notes for more details.)


The JVM Monitoring & Management API specifies a comprehensive set of instrumentation of JVM internals to allow a running JVM to monitored. This information is accessed through JMX (JSR-003) MBeans and can accessed locally within the Java address space or remotely using the JMX remote interface (JSR-160) and through industry-standard SNMP tools.


One of the most useful features is a low memory detector. JMX MBeans can notify registered listeners when the threshold is crossed, see javax.management and java.lang.management for details.


J2SE 5.0 provides an easy way to enable out-of-the-box remote management of JVM and an application (see Out-of-the-Box for details). For example, to start an application to be monitorable by jconsole in the same local machine, use the following system property:









java -Dcom.sun.management.jmxremote -jar Java2Demo.jar





and to monitor it remotely through JMX without authentication:









java -Dcom.sun.management.jmxremote.port=5001 
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false -jar Java2Demo.jar





For an idea of how easy the new API is to use, the following reports the detailed usage of the memory heaps in the HotSpot JVM.









import java.lang.management.*;
import java.util.*;

public class MemTest {
public static void main(String args[]) {
List<MemoryPoolMXBean> pools = ManagementFactory.getMemoryPoolMXBeans();
for (MemoryPoolMXBean p: pools) {
System.out.println("Memory type="+p.getType()+" Memory usage="+p.getUsage());
}
}
}





New JVM Profiling API (JSR-163)


The release also contains a more powerful native profiling API called JVMTI. This API has been specified through JSR-163 and was motivated by the need for an improved profiling interface. However, JVMTI is intended to cover the full range of native in-process tools access, which in addition to profiling, includes monitoring, debugging and a potentially wide variety of other code analysis tools.


The implementation includes a mechanism for bytecode instrumentation, Java Programming Language Instrumentation Services (JPLIS). This enables analysis tools to add additional profiling only where it is needed. The advantage of this technique is that it allows more focused analysis and limits the interference of the profiling tools on the running JVM. The instrumentation can even be dynamically generated at runtime, as well as at class loading time, and pre-processed as class files.


The following example creates an instrumentation hook that can load a modified version of the class file from disk. To run this test, start the JRE with java -javaagent:myBCI BCITest









//File myBCI.java
import java.lang.instrument.Instrumentation;

public class myBCI {
private static Instrumentation instCopy;

public static void premain(String options, Instrumentation inst) {
instCopy = inst;
}
public static Instrumentation getInstrumentation() {
return instCopy;
}
}

//File BCITest.java

import java.nio.*;
import java.io.*;
import java.nio.channels.*;
import java.lang.instrument.*;

public class BCITest {
public static void main (String[] args) {
try {
OriginalClass mc = new OriginalClass();
mc.message();

FileChannel fc=new FileInputStream(
new File("modified"+File.separator+"OriginalClass.class")).getChannel();
ByteBuffer buf = fc.map(FileChannel.MapMode.READ_ONLY, 0, (int)fc.size());
byte[] classBuffer = new byte[buf.capacity()];
buf.get(classBuffer, 0, classBuffer.length);
myBCI.getInstrumentation().redefineClasses(
new ClassDefinition[] {
new ClassDefinition(mc.getClass(), classBuffer)});
mc.message();
}catch (Exception e){}
}
}

//OriginalClass.java
//Compile in current directory
//Copy source to modified directory,change message and recompile

public class OriginalClass {
public void message() {
System.out.println("OriginalClass");
}
}





Improved Diagnostic Ability


Generating Stack traces has been awkward if no console window has been available. Two new APIs, getStackTrace and Thread.getAllStackTraces provide this information programmatically.









StackTraceElement e[]=Thread.currentThread().getStackTrace();
for (int i=0; i <e.length; i++) {
System.out.println(e[i]);
}
System.out.println("\n"+Thread.getAllStackTraces());





The HotSpot JVM includes a fatal error handler that can run a user-supplied script or program if the JVM aborts. A debug tool can also connect to a hung JVM or core file using the HotSpot JVM serviceability agent connector.









-XX:OnError="command"  


-XX:OnError="pmap %p"
-XX:OnError="gdb %p"

optional %p used as process id






Desktop Client


The Java Desktop client remains a key component of the Java platform and as such has been the focus of many improvements in J2SE 5.0.


This Beta release contains some of the early improvements in startup time and memory footprint. Not only is the release faster, but the Swing toolkit enjoys a fresh new theme called Ocean. And by building on the updates in J2SE 1.4.2, there are further improvements in the GTK skinnable Look and Feel and the Windows XP Look and Feel.



















Windows XP
Windows XP

Click to Enlarge









Linux/Redhat
Linux/RedHat

Click to Enlarge




Linux and Solaris users, and new in beta2, Windows users, who have the latest OpenGL drivers and select graphic cards can get native hardware acceleration from Java2D using the following runtime property:









java -Dsun.java2d.opengl=true -jar Java2D.jar





The Linux release also has the fast X11 toolkit, called XAWT, enabled by default. If you need to compare against the motif version you can use the following system property:









java -Dawt.toolkit=sun.awt.motif.MToolkit -jar Notepad.jar





(the X11 toolkit is called sun.awt.X11.XToolkit)


The X11 Toolkit also uses the XDnD protocol so you can drag-and-drop simple components between Java and other applications like StarOffice or Mozilla.



Miscellaneous Features


Core XML Support


J2SE 5.0 introduces several revisions to the core XML platform, including XML 1.1 with Namespaces, XML Schema, SAX 2.0.2, DOM Level 3 Support and XSLT with a fast XLSTC compiler.


In addition to the core XML support, future versions of the Java Web Services Developer Pack will deliver the latest web services standards: JAX-RPC & SAAJ (WSDL/SOAP), JAXB, XML Encryption, and Digital Signature and JAXR for registries.


Supplementary Character Support


32-bit supplementary character support has been carefully added to the platform as part of the transition to Unicode 4.0 support. Supplementary characters are encoded as a special pair of UTF16 values to generate a different character, or codepoint. A surrogate pair is a combination of a high UTF16 value and a following low UTF16 value. The high and low values are from a special range of UTF16 values.


In general, when using a String or sequence of characters, the core API libraries will transparently handle the new supplementary characters for you. However, as the Java "char" still remains at 16 bits, the very few methods that used char as an argument now have complementary methods that can accept an int value which can represent the new larger values. The Character class in particular has additional methods to retrieve the current character and the following character in order to retrieve the supplementary codepoint value as below:









String u="\uD840\uDC08";
System.out.println(u+"+ "+u.length());
System.out.println(Character.isHighSurrogate(u.charAt(0)));
System.out.println((int)u.charAt(1));
System.out.println((int)u.codePointAt(0));





See the Unicode section in Character for more details.


JDBC RowSets


There are five new JDBC RowSet class implementations in this release. Two of the most valuable ones are CachedRowSet and WebRowSet. RowSet objects, unlike ResultSet objects, can operate without always being connected to a database or other data source. Without the expensive overhead of maintaining a connection to a data source, they are much more lightweight than a ResultSet object. The CachedRowSet contains an in-memory collection of rows retrieved from the database that can, if needed, be synchronized at a later point in time. The WebRowSet implementation, in addition, can write and read the RowSet in XML format.


The following code fragment shows how easy it is to create and use a WebRowSet object.









Class.forName("org.postgresql.Driver");
WebRowSetImpl wrs = new WebRowSetImpl();
wrs.setCommand("SELECT COF_NAME,TOTAL FROM COFFEES");

wrs.setUsername("postgres");
wrs.setPassword("");
wrs.setUrl("jdbc:postgresql:test");
wrs.execute(); // executes command and populates webset all coffees

wrs.absolute(1); // moves cursor to the first row of wrs
wrs.updateInt(2, 10); // reset total field to 10
wrs.updateRow(); // finishes edits to this row
wrs.acceptChanges(); // writes new total to the data source
wrs.writeXml(System.out); // also exports rowset in XML format
wrs.close();





References


New Language Features for Ease of Development in the Java 2 Platform Standard Edition 5.0: http://java.sun.com/features/2003/05/bloch_qa.html


Tiger Component JSRs


003 Java Management Extensions (JMX) Specification http://jcp.org/en/jsr/detail?id=3


013 Decimal Arithmetic Enhancement http://jcp.org/en/jsr/detail?id=13


014 Add Generic Types To The Java Programming Language http://jcp.org/en/jsr/detail?id=14


028 Java SASL Specification http://jcp.org/en/jsr/detail?id=28


114 JDBC Rowset Implementations http://jcp.org/en/jsr/detail?id=114


133 Java Memory Model and Thread Specification Revision http://jcp.org/en/jsr/detail?id=133


160 Java Management Extensions (JMX) Remote API 1.0 http://jcp.org/en/jsr/detail?id=160


163 Java Platform Profiling Architecture http://jcp.org/en/jsr/detail?id=163


166 Concurrency Utilities http://jcp.org/en/jsr/detail?id=166


174 Monitoring and Management Specification for the Java Virtual Machine http://jcp.org/en/jsr/detail?id=174


175 A Metadata Facility for the Java Programming Language http://jcp.org/en/jsr/detail?id=175


200 Network Transfer Format for Java Archives http://jcp.org/en/jsr/detail?id=200


201 Extending the Java Programming Language with Enumerations, Autoboxing, Enhanced for Loops and Static Import http://jcp.org/en/jsr/detail?id=201


204 Unicode Supplementary Character Support http://jcp.org/en/jsr/detail?id=204


206 Java API for XML Processing (JAXP) 1.3 http://jcp.org/en/jsr/detail?id=206