Friday, June 17, 2005

Lucky Typo

Our project uses Cruise Control for continuous integration, and of late, the builds have been taking way to long to complete. The main culprit are the Struts Test Case tests. One single test was taking over 12 seconds to run, and we have many of them. Something was wrong, but no one could figure out what it was. It was the usual problem; we were so busy, there was no time to look at it. And, of course, the unreasonably long build times were increasing our work load, making it less likely that anyone had the time to find the roor cause.

A couple of days back, while struggling to release a build, I finally decided that something serious had to be done. I had once before noted that the Ant junittask now (that is from 1.6.2) has a parameter called forkmode that can be set to run all unit tests in a single JVM. We had briefly tried that before; noted that it didn't work with our build script and given up. The problem was that the reorted message wasn't very helpful; it reports ClassNotFoundError on a properties file.

That properties file turns out to contain the list of tests to run. It did exist, though I had to create a specially hacked Ant build to discover that (the standard Ant build deletes the temporary file to fast to exmine it). Maybe the file was corrupt? I re-constructed the command line that Ant issues and discovered, in dismay, that the line length exceed the Windows 2000 limit 2048 characters. Our classpath was grotesquely long; with junk and duplicates. I managed to prune it down, only to find out that some items were actually missing. Strangely, the pre-forkmode builds did work, but how, I'm not sure.

And finally, a much hacked classpath contained everything needed, but nothing more. The tests ran blisteringly fast; about 50ms per test -- admittedly in a totally different environment from our actual build. I just needed to plug it back into the build scripts. My classpath was hard-coded; the build script has a nice set of property definitions that define library folders and the like. It took a while, but I was confident that this was going to work. It was worth the effort

I ran the tests and stared in dismay at the timings; better, for sure, but still about 5 secondsper test. What happened. I switched back to my hardcoded classpath; very fast, even in the build environment. Now I am really confused. Ant resolves the definitions up front, so there couldn't possibly be a per-test overhead. Yet that is what appeared to be happening. As I went home, my mind was struggling to make sense of the situation.

That evening, I tried to rerun the situation on my laptop. While setting up the classpath, I noticed that the log4j jar file was in the classpath, but its path was mis-typed. We use lag4j, but indirectly via the apache commons logging library. The commons logging library cleverly uses log4j, if present, but falls back to java logging if it doesn't find log4j. Not strictly needed, in other words. And then I test with and without log4j and reach the moment of truth. By turing off logging (tidily now, using the log4j.properties file) my time per test went down by a factor of 10. Not bad. Tweaking the build to use forkmode could reduce it by almost the same amount again. Not quite down the my standalone and highly artificial environment, but not bad.

A final analsis brought all the pieces together. StrutsTestCase mocks up a servlet container in which it runs our Action classes. It is faithful enough to the real servlet container that it uses the struts-config.xml file to map a path into an Action. Each test re-initialises the mock container; and in doing so re-reads the .xml file. Our log4j.properties file is kept in the same directory (WEB-INF) as the struts-config.xml file, so was visible to the the test setup. Apache have been liberal with debug logging in the libraries -- specifically the commons-digester library, which is used to parse the struts-config.xml file. Our default logging level is still DEBUG (we're still a few months away from going live), so a lot of output. All this chewed up an enormous amount of CPU cycles and disk space. Odd that we didn't notice it, but then our build process generates a monsterous amount of output, so it just got lost the volume.

We may not use the forkmode tweak for the build machine as there are all sorts of complications; it requires Ant 1.6.2 or newer, which again requires Xalan 2, which has compatibility problems with the jdk 1.4.1 we are stuck with. I won't get off track about java wierdness again. Right now everyone on the project is ecstatic that builds are fast again. Sometimes a typo can be the source of good fortune.
Category: Software, Programming, Java

Wednesday, June 15, 2005

Java is wierd

It all began when I wrote a simple class with overloaded methods:


public abstract Object apply (Object obj);

public Object apply (Collection collection)
{
for (Iterator i = collection.iterator (); i.hasNext(); )
{
apply (i.next());
}
return this;
}

Note that the first version is abstract (as was the class, of course). The second method, which takes a Collection and applies the first method on each item. The intent is that you extend the class and provide a implemetation of the abstract method. Invoking the apply (Collection) method will invoke your apply (Object) over the collection. Seemed simple enough to me. My unit tests green barred, so I was ready to plug the code into the project.

I get a compilation error. Huh? I run my unit tests again. Green bar. Now I'm confused. I isolate the offending code into a sandbox environment. and repeated the above. Unit tests green bar; code fails to compile. Now wait a minute. To run a unit test, the code has to compile, correct? Stop and think.

So, here's the situation. When I wrote the code in Eclipse, it gave no errors. Ecplise compiles in the background; so my code does compile. When I build the project I run an Ant build script (started from Eclipse, usually). That build script reports the following:

[javac] C:\John\Closures\src\functor\ClosureTest.java:138: reference to apply is ambiguous, both method apply java.util.Collection) in functor.Closure and method apply(java.lang.Object) in match
[javac] new BufferedClosure ()

So Eclipse thinks my code is good; javac disagrees. That's just dandy, right?

As the practical person I am, I did the quick fix; rename apply (Collection) to applyAll (Collection). Project build works and we are back on track. I have no peace, however, until I understand what is going on.

The answer turns up after some googling. It's a bug; Bug Id 4761586. Furthermore, it's fixed in java 1.5, and interstingly enough, the fix was quietly back ported to 1.4.2. There is a discussion of the issues in the JDJ article, Overloaded Method Matching. (Published Oct 2003 -- how did I miss it?).

The current project is locked down to 1.4.1 — where it is definitely not fixed. Another thing is that Bug Id 4761586 (and it's resolution) refers to "the class in which an overloaded method is declared (the 'declaring class')". Now my two methods were declared in the same class; even if one was first defined in the extending class. As far as I can figure it out, that's a compiler bug, even according to wording of the old spec. which, as the above mentioned article declares, "was simply a mistake in the original specification" with "no logical explanation".

Java is wierd.

Category: Software, Programming, Java