Every couple of months, I get involved with some bug fix that takes weeks to resolve. I just finished up one this morning. I have been working on it for about 3 weeks nearly steadily. However, the bug has loomed in various forms for several years in our product.
In the end, the solution was the same as a lot of these bugs -- the Sun Java Runtime blows. Its always burning me on these hard bugs. The oddest behaviors are often resolved by boiling it down to a Java issue in the runtime.
Today's was the same. As it turns out, some 3rd party code that we include in MeetingPlace uses the Sun Java Runtime. Over time we noticed that any machines anywhere that we running this code started running fast. Meaning there clocks would start going into the future minutes after we synchronized them via NTP. I could sync the clock and watch my atomic wall clock and compare with the PC. Within minutes, they were off a second or two. Come back in an hour and you were off about a minute. What's the deal with that? A lot of back and forth with the vendor and lots of research and more googling that I cared to ended producing a result. The Sun Java Runtime's implementation on Windows can make your clock run fast if the stars and moon align. And they do in our app.
We ended up finding a half decent bug report on this and a good blog post. The magic -XX:+ForceTimeHighResolution flag ended up getting rid of the clock skew instantly once applied. The fix was simple in the end but took a long time to get to.