Lucky Typo
Our project uses Cruise Control for continuous integration, and of late, the builds have been taking way to long to complete. The main culprit are the Struts Test Case tests. One single test was taking over 12 seconds to run, and we have many of them. Something was wrong, but no one could figure out what it was. It was the usual problem; we were so busy, there was no time to look at it. And, of course, the unreasonably long build times were increasing our work load, making it less likely that anyone had the time to find the roor cause.
A couple of days back, while struggling to release a build, I finally decided that something serious had to be done. I had once before noted that the Ant junittask now (that is from 1.6.2) has a parameter called forkmode that can be set to run all unit tests in a single JVM. We had briefly tried that before; noted that it didn't work with our build script and given up. The problem was that the reorted message wasn't very helpful; it reports ClassNotFoundError on a properties file.
That properties file turns out to contain the list of tests to run. It did exist, though I had to create a specially hacked Ant build to discover that (the standard Ant build deletes the temporary file to fast to exmine it). Maybe the file was corrupt? I re-constructed the command line that Ant issues and discovered, in dismay, that the line length exceed the Windows 2000 limit 2048 characters. Our classpath was grotesquely long; with junk and duplicates. I managed to prune it down, only to find out that some items were actually missing. Strangely, the pre-forkmode builds did work, but how, I'm not sure.
And finally, a much hacked classpath contained everything needed, but nothing more. The tests ran blisteringly fast; about 50ms per test -- admittedly in a totally different environment from our actual build. I just needed to plug it back into the build scripts. My classpath was hard-coded; the build script has a nice set of property definitions that define library folders and the like. It took a while, but I was confident that this was going to work. It was worth the effort
I ran the tests and stared in dismay at the timings; better, for sure, but still about 5 secondsper test. What happened. I switched back to my hardcoded classpath; very fast, even in the build environment. Now I am really confused. Ant resolves the definitions up front, so there couldn't possibly be a per-test overhead. Yet that is what appeared to be happening. As I went home, my mind was struggling to make sense of the situation.
That evening, I tried to rerun the situation on my laptop. While setting up the classpath, I noticed that the log4j jar file was in the classpath, but its path was mis-typed. We use lag4j, but indirectly via the apache commons logging library. The commons logging library cleverly uses log4j, if present, but falls back to java logging if it doesn't find log4j. Not strictly needed, in other words. And then I test with and without log4j and reach the moment of truth. By turing off logging (tidily now, using the log4j.properties file) my time per test went down by a factor of 10. Not bad. Tweaking the build to use forkmode could reduce it by almost the same amount again. Not quite down the my standalone and highly artificial environment, but not bad.
A final analsis brought all the pieces together. StrutsTestCase mocks up a servlet container in which it runs our Action classes. It is faithful enough to the real servlet container that it uses the struts-config.xml file to map a path into an Action. Each test re-initialises the mock container; and in doing so re-reads the .xml file. Our log4j.properties file is kept in the same directory (WEB-INF) as the struts-config.xml file, so was visible to the the test setup. Apache have been liberal with debug logging in the libraries -- specifically the commons-digester library, which is used to parse the struts-config.xml file. Our default logging level is still DEBUG (we're still a few months away from going live), so a lot of output. All this chewed up an enormous amount of CPU cycles and disk space. Odd that we didn't notice it, but then our build process generates a monsterous amount of output, so it just got lost the volume.
We may not use the forkmode tweak for the build machine as there are all sorts of complications; it requires Ant 1.6.2 or newer, which again requires Xalan 2, which has compatibility problems with the jdk 1.4.1 we are stuck with. I won't get off track about java wierdness again. Right now everyone on the project is ecstatic that builds are fast again. Sometimes a typo can be the source of good fortune.
Category: Software, Programming, Java
A couple of days back, while struggling to release a build, I finally decided that something serious had to be done. I had once before noted that the Ant junittask now (that is from 1.6.2) has a parameter called forkmode that can be set to run all unit tests in a single JVM. We had briefly tried that before; noted that it didn't work with our build script and given up. The problem was that the reorted message wasn't very helpful; it reports ClassNotFoundError on a properties file.
That properties file turns out to contain the list of tests to run. It did exist, though I had to create a specially hacked Ant build to discover that (the standard Ant build deletes the temporary file to fast to exmine it). Maybe the file was corrupt? I re-constructed the command line that Ant issues and discovered, in dismay, that the line length exceed the Windows 2000 limit 2048 characters. Our classpath was grotesquely long; with junk and duplicates. I managed to prune it down, only to find out that some items were actually missing. Strangely, the pre-forkmode builds did work, but how, I'm not sure.
And finally, a much hacked classpath contained everything needed, but nothing more. The tests ran blisteringly fast; about 50ms per test -- admittedly in a totally different environment from our actual build. I just needed to plug it back into the build scripts. My classpath was hard-coded; the build script has a nice set of property definitions that define library folders and the like. It took a while, but I was confident that this was going to work. It was worth the effort
I ran the tests and stared in dismay at the timings; better, for sure, but still about 5 secondsper test. What happened. I switched back to my hardcoded classpath; very fast, even in the build environment. Now I am really confused. Ant resolves the definitions up front, so there couldn't possibly be a per-test overhead. Yet that is what appeared to be happening. As I went home, my mind was struggling to make sense of the situation.
That evening, I tried to rerun the situation on my laptop. While setting up the classpath, I noticed that the log4j jar file was in the classpath, but its path was mis-typed. We use lag4j, but indirectly via the apache commons logging library. The commons logging library cleverly uses log4j, if present, but falls back to java logging if it doesn't find log4j. Not strictly needed, in other words. And then I test with and without log4j and reach the moment of truth. By turing off logging (tidily now, using the log4j.properties file) my time per test went down by a factor of 10. Not bad. Tweaking the build to use forkmode could reduce it by almost the same amount again. Not quite down the my standalone and highly artificial environment, but not bad.
A final analsis brought all the pieces together. StrutsTestCase mocks up a servlet container in which it runs our Action classes. It is faithful enough to the real servlet container that it uses the struts-config.xml file to map a path into an Action. Each test re-initialises the mock container; and in doing so re-reads the .xml file. Our log4j.properties file is kept in the same directory (WEB-INF) as the struts-config.xml file, so was visible to the the test setup. Apache have been liberal with debug logging in the libraries -- specifically the commons-digester library, which is used to parse the struts-config.xml file. Our default logging level is still DEBUG (we're still a few months away from going live), so a lot of output. All this chewed up an enormous amount of CPU cycles and disk space. Odd that we didn't notice it, but then our build process generates a monsterous amount of output, so it just got lost the volume.
We may not use the forkmode tweak for the build machine as there are all sorts of complications; it requires Ant 1.6.2 or newer, which again requires Xalan 2, which has compatibility problems with the jdk 1.4.1 we are stuck with. I won't get off track about java wierdness again. Right now everyone on the project is ecstatic that builds are fast again. Sometimes a typo can be the source of good fortune.
Category: Software, Programming, Java
0 Comments:
Post a Comment
<< Home