I came across a really interesting bug that was plaguing me for about a day and a half and I finally tracked it down today, with JIT assistance from the superhuman @brianm.
I have a fairly simple app that uses Netty for its underlying event handling. Just a Web service, no great shakes. But, there was this weird bug that I found during load testing whereby it would just stop all network activity after a few minutes for no discernable reason. No more connections could be established, existing connections did not finish, etc. At first, I thought it was maybe a bug in Tsung, my load testing tool, but that wasn’t the case.
I upped the logging and found this strange error that was happening right around the time of the network shutdown:
I googled around for this error and found only a few references to it, mostly having to do with a partially-uploaded JAR. My app is packaged in a fat JAR, so I found this unlikely, but I reuploaded it and verified the upload with the MD5 hash and still the error occurred and the network shut down.
I was pretty much out of other options at this point, since there were no other anomalies and YourKit was unable to find any deadlocks or such. I took a stab and sent the gist above to @brianm to see if he had seen it before. He immediately noted that the call to java.security.AccessController.doPrivileged was bad news. He then suggested I try the “bazooka-squirrel approach”: if you can’t hit the squirrel with your .22 because the tree’s a mess, take out the whole tree with a bazooka ;-) In this case, the “tree” was the logging attempt in the backtrace.
However, I wasn’t using java.util.logging in the app. I was using logback. I checked out the source code of NioServerSocketPipelineSink and found that it was Netty using java.util.logging. Oh, really? At this point, java.util.logging was not configured. I configured it with a simple ConsoleLogger and all of a sudden *BLAM*… problem solved.
Netty is using java.util.logging which, if left unconfigured, was causing it to reach out into the fat JAR for the logging configuration which wasn’t there. When it did so, it caused an InternalError due to some kind of JVM security issue which blew up the server socket. Netty catches that and shuts down the boss thread. *sigh*
tl;dr You must configure java.util.logging to go somewhere or if there’s an exception in your boss thread it will decapitate Netty rather effectively.