arei.net - java
Why Dependency Management Pisses Me Off

Yes, it’s true. Dependency Management Pisses Me Off. Jason van Zyl over at Sonatype needs to be kicked in the groin… repeatedly. (Sorry, I don’t really know Jason and it’s not nice to say such things, but I wanted to really hammer home the point. Jason, I apologize to your testicles.) Seriously, when did we become so amazingly lazy that saving a JAR file into our SVN repositories became a big deal?

Now, I don’t want you to just think I’m some raving lunatic out there on his soap box shouting into the wind despite the accuracy of this picture; I want to at least pretend that you are not going to scream “TL;DR” and actually read the damn posting, so here is why Dependency Management pisses me off. Feel free to reply back to me on Twitter (@areinet) and I will engage you in some spirited debate. And I promise I won’t hurt your testicles in the process.

1). Dependency Management adds unnecessary complexity. Are we not just talking about saving some files into our SVN repository after all? Why is that so hard? And who on earth thinks that writing and changing a pom.xml file is actually easier than this? Also, there are people who would say that you shouldn’t be committing built objects into SVN (or whatever) and that we shouldn’t waste disk space. To these people I say this: Disk space is cheap. Seriously, the going rate for a HD is around 9gb per $1 USD. I assure you, no matter how many JAR files your project needs this is incredibly cheap.

2). Dependency Management puts someone else in your build loop. When I build a project, I want to rely on as few people as possible to fuck things up. Yet Dependency Management injects a completely incalculable third party into your system, and that’s just for one dependency. Sure, that dependency is always there, but with external dependencies, your are practically begging for your build to break because John Bozo three countries away from you removed six bytes of code from that one project you were relying on. Now, of course, you shouldn’t be using LATEST in your dependencies, but I really don’t want to rely on the fact that our build people are smart enough to realize this. If I just committed the version I wanted to the repository, none of these problems happen.

3). Licenses Change. When your build person goes out there and changes a version number, do you think they actually read a license file? Let’s assume for the sake of reality that people are incredibly lazy… now, do you want to take the risk that so and so actually read the license? And that the license didn’t change? Seriously, it would take me about six seconds to change the license on some sub-project you use and then commit it. And suddenly the sub-project owns all of your IP simply because you used them. Now, the legality of that is a debate for other scholars than I, but it could certainly cause a mess. The only thing stopping a sub-project from doing this,is the hassle of suing your ass into oblivion… And we all know that people are getting more and more litigious every day.

4). The Internet is, by it’s nature, unreliable. Do you really want to rely on the fact that the internet providers upstream from you are not going to screw something up right when your absolutely must deliver build has to get run? Seriously, the more people that have input into a process, the more likely that process is to get derailed. I do not want to think that my ability to deliver is dependent on whether or not Anonymous is going to cause a world-wide outage in protest over SOPA (which sucks by the way). Sure, you can run a mirror of x repository and spend your time maintaining that as well, but wouldn’t you rather spend time, oh I don’t know, outside? With a girl? playing WOW? Doing anything else?

So the point here is this and this is the TL;DR for you lazy people as well… Dependency management adds both complexity and unpredictability to your systems and this is not a good thing. A Build process is about Rigor, and Dependency Management is antithetical to rigor. By using a Dependency Management solution you are willingly signing up for problems and extra work. Who wants that? When given the choice between that and just storing the files I need into my repository, I will choose the latter every time.

Now, I do think some dependency systems are way better than others. The node.js NPM system is amazingly clean, but it’s still begging for the problems I outline above. So, maybe not that awesome. It is easy to use though, wish Maven were half that easy.

So, that’s it. That’s why every time my coworker comes in and raves about how awesome Maven is I just point at his crotch and start laughing. I mean, really, Maven? Awesome? You got to be an idiot to think that. (My apologies to my coworkers.)

Developer Drift

Lately I’ve been the subject of what I call developer drift.

Developer Drift is the process by which an unchallenged developer slowly moves from one project to the next. The project may be in house or external or something completely fabricated by the developer’s mind, but it basically means a developer is less interested in the current project than the project over the horizon. It’s the “Grass is always greener” truism made concrete in software engineering.

For me, this has taken the form of the fact that our customer wants really boring user interfaces which I can crank out like they are nothing. Problem is, I almost never crank them out, because they are meaningless and I never feel challenged/creative by them. (For me challenged=creative.) So I take forever to implement them and tend to make a lot of excuses on why this is taking so long. I feel justified in why it takes so long in the fact that when I am challenged, I really do crank out the code at an extraordinary rate which borders on the obscene when compared to average developers. I’m very prolific when I want to be.

The same thing was true when I was back in college studying English Literature. I could crank out a paper that I found interesting in no time flat, but assign me something that was pedantic and I’d sooner rip my own teeth out with a spoon (“because it will hurt more”).

So the real question I’m trying to find an answer to is “How do you deal with developer drift?” How do you stop people from losing interest when they are bored because the project has become boring? A project I used to work on is suffering from this very problem… they want to keep the team together, but the more boring stuff they do, the less likely they are to be able to keep the team together? Is there a way for this project to challenge it’s developers at the same time as doing boring things? Would contests or “feats of skill” help keep things from getting stale? Or should they just accept this as the life cycle of the developer, assume that people are going to drift away, and prepare for the next generation?

You tell me… how does your project deal with developer drift?

Maven Vitriol

I’m an ANT guy. I’ve been using ANT since 1999. It’s amazingly powerful, amazingly useful and very flexible… and I’ve never once written my own ANT task. Just using the ANT tasks available or out there in the community I have been able to build hundreds of software projects.

Now, of course, everyone says, “Go Maven”. So I went Maven… and it sucked.

See, I have very strong feelings about Frameworks. (Not to confuse you here, but Maven, like ANT, is a Build Framework.) The problem with 99% of all Frameworks out there is that they force you into a specific way of doing something. “But that’s the whole point,” you scream at me. And I agree, that’s the point… until the moment you need to do something else.

Now, I use frameworks all the time in my software development, we all do. There’s one listed over on the right side in my links section that I actually endorse. So am I not hypocritical for deriding frameworks in one breathe while using them in another? Of course I am, but here’s where i’ll caveat it… I use frameworks that provide the least limitation on me doing new things. Maven, as an example is a rigid framework. Doing something new with it is difficult challenge. ANT on the other hand, while limiting, isn’t nearly as limited.

So here’s AREI’s Framework Measurement Testing System. First, evaluate a framework for the project you are working one, say building a Java Swing application. Next, evaluate the framework for another, completely different project you might work on in the future, say building a Website. How does the framework stack up for both things? Finally, consider what’s going to happen when you need to take the framework beyond it’s scope into new places. How accepting of that path is it?

Here’s some examples:

I had an ANT script that would append together a bunch of .JS files and then minify the entire set. Worked like a champ. The project I was on decided let’s give Maven a try instead of ANT. So I had to come up with a way to do the same thing in Maven. Two weeks of work later I ended up just calling ANT from inside of Maven.

Anyway, this post was really meant as just a simple link to an awesome blog posting that unleashes some much needed fury on Maven. But I got a little carried away in the intro.

So in summation, Maven sucks. Pick a framework that you can change. Damn the man! Save Empire!

Google Wave

Google announced yesterday a new offering called Google Wave.  There is an excellent article at TechChrunch that gives a complete overview.  All I can say is: THIS IS HUGE PEOPLE. The concept of Google Wave, the underlying idea is exactly the right step that the web and the internet needs to take. It’s a combination of Email, Instant Messaging, Twitter, Transparency, Collaboration, Wiki, Media sharing, and so much more.  Oh, and it’s an Open API and Open Source to boot.

If you’ve ever talked tech with me in the last three years and we’ve had the discussion about what is next in technology, then we’ve had a very similar discussion about the concepts behind Google Wave.  I’m not claiming Google has once again stolen my idea, but what I will claim is that there clearly is a need for this type of product that I saw and google saw and others saw as well.

Two things in particular I want to call your attention to that make Google Wave a huge idea:

1). Adhoc groups… What adhoc grouping really does for you is to let people create internet groups as easily as they create real world groups.  Think of it this way… when you walk into your work place break room and two other people walk in and the three of you start talking, you have created a group.  It’s fast in the real world, why can it not be like that in the internet world? For most of the internet when you want to form a group and start working together there’s a fairly large investment in creating the meta systems the group needs: setting up a mailing list, setting up forums, setting up a media store, setting up a user database, etc etc.  It’s a pain in the ass really, and it make setting up a group fairly non-trivial.  To some degree, Yahoo Groups and Google Groups automates a lot of that, but then you still have to go through the effort of getting people to join etc.

What Google Wave does is to make group creation simple and almost instantaneous.  Basically, you create a group by dragging a bunch of contacts together and off you go.  You can immediately begin discussion, expand the group, whatever.  Additionally, if you need other tools for that group like a map or a document or some other thing, you can add a widget or a robot to participate in that  group and boom, you have more functionality to fill the groups need.

2) The second feature that is huge for me is transparency.  Transparency is the notion behind Twitter in that you broadcast snippets of your activities out to the world and everyone can see what you are doing.  This is the beginning of transparency though.  To take it further you need to automate transparency such that as you do things, they automatically are published.  Of course, this brings up privacy concerns, but I think there are simple ways to solve this.

I know that Wave has some level of transparency, but it’s still to early to tell how much.  I suspect that even if this is an opt in version of transparency, like Twitter, that soon there will be robots and widgets (the extensions to Wave) that will automate a lot of this functionality.

The point is, though, that transparecny can be a huge social tool… but more importantly it could be a huge business tool.  And the company that sees this is going to get a huge edge over the company that does not.

Anyways, that’s my thoughts on Google Wave.  You should efinately check it out.

Minification

There has been a great deal of discussion lately about the value of minification, yet very little concrete value. To that end I have decided to set down my thoughts from these discussions and share them such that everyone can be on the same page. Below I will outline what minification is and what it gains you. This is followed by discussion of minification versus HTTP Compression. Next, I will look at concept of minification as obfuscation and the usefulness of obfuscating in general. Finally, I will share my recommendation on the subject of Minification and which tools I would recommend.

MINIFICATION

To start, lets define what minification is: Minify or Minification is the process of taking some source file and compressing it to a smaller size by removing whitespace and comments. A number of other small rules are sometimes applied as part of the minify process, but these are less significant. For example, Minify will shorten variable names to two or three characters and thus gain some space there.

Consider this simple function which is not minified:

for (var i = 0; i <; 100 ; i++)
{
    var randomnumber = Math.floor(Math.random() * i);
    document.write("A random number between 0 and " + i +
                   " is " + randomnumber);
}

When minification is applied this function becomes thus:

for(var i=0;i<;100;i++){var randomnumber =Math.floor(Math.random()*L);document.write("A
random number between 0 and "+i+" is "+randomnumber);}

(Please note that any line breaks you see in the minified code above are due to word wrapping in your mail reader. Minified code is generally always a single line.)

In essence, minification reduces overall code size, and thus reduces the clients download time. The downside of minification is that by removing whitespace and comments and renaming some variables, the code becomes unreadable, extremely hard to debug, and may introduce unforeseen and hard to find bugs.

When we are talking about Web applications both Javascript (JS) and CSS files can be minified.

In a small test case I put two files through the minification process to see what the net gain would be. The Javascript file is of small size with few comments. The CSS is a large CSS file with few comments.

FILE SIZE                                JS                    CSS             
Bytes                                    17954                 39020
Minified Bytes                           10413                 31178
Net Gain                                 42%                   20%

As can be seen minification is much more useful to Javascript where whitespace is used much more. CSS tends to have less whitespace and thus it's gains are lesser. Also, gains will be much larger if your code is heavily commented. Since comments are removed, minification gains are directly related to the amount of comments.

So, a 42% gain seems like a lot when you are talking about really slow network connections. Yet the question is, are those gains really worth the sacrifice in the ability to debug your code? And if one does not want to sacrifice the ability to debug and read the code, what can be done instead of minification?

HTTP COMPRESSION

HTTP Compression is a standard part of the HTTP 1.1 protocol that is in use in almost all major browsers and web servers on the market today. In essence whenever an HTTP request is made by a browser, if that browser supports HTTP Compression a special parameter is added to the outgoing request that lists the different compression algorithms that the browser can uncompress. When a Web Server sees this special parameter, it checks to see if the requested file can be compressed and if the server has one of the compression algorithms that the browser has said it can use. If all this is true, the server will run the compression algorithm over the response before it is sent back to the browser. The browser receives the compressed response and make it uncompressed before making it available.

HTTP Compression has many advantages and few disadvantages. Foremost, HTTP Compression is an entirely automated process that is handled by the server and requires only a minor configuration change on most Web Servers to enable. The Browser requires no special functionality to use compression. The disadvantage of compression is that there is a minor performance hit to both the server side and the client side. The server side must spend time compressing the response. The client side must spend time uncompressing the response. Yet, in most cases the file sizes involved make the time spent in compression minimal. If files sizes were significantly larger, compression might begin to cause performance problems.

Let us consider our test files again. This time let us look at our net gain from using HTTP compression only.

FILE SIZE                                JS                    CSS             
Bytes                                    17954                 39020
Compressed Bytes                         3906                  6708
Net Gain                                 78%                   82%

Right away its apparent that HTTP compression offers significant gains. When compared to our previous results from minification, HTTP Compression is the clear favorite. The reason for these significant gains that the HTTP Compression works across all the bytes of the source files whereas Minification largely leaves the meat of the CSS and JS files alone and works mostly on the whitespace of those files.

Given the gains of HTTP Compression the next logical step is to ask, what if I did both minification and HTTP Compression. The result is even greater improvements, but not as much as one might imagine. Consider our test files again:

FILE SIZE                                JS                    CSS             
Bytes                                    17954                 39020
Compressed & Minified                    2514                  5907
Net Gain                                 86%                   85%

Overall when employing both techniques, the gain of minification is not as significant. For our JS file there is only an 8% gain over just plain HTTP Compression. There is even less gain when looking at the CSS file. However, please remember that both the JS and CSS files in our test are very light on comments. More comments will increase the gains somewhat, but less than one might think.

Given these results, the question to be asking is does the sacrifice of being able to debug and read my Javascript and CSS outweigh my need to gain an additional 8% over just using HTTP Compression.

So maybe Minification is not the way to go after all, but what about Minification as a means of protecting my code from people whom want to steal my hard work?

OBFUSCATION

In the world of interpretive computer languages such as Java, C#, Javascript, PHP, Groovy, etc there has long been the question of how one can prevent someone from reading or stealing their code. The more openly accessible the language, the easier it is to read and copy it.

Obfuscation, is the process of applying a series of heuristics to code that will make it harder to read and more confusing to understand. The intent is that by applying these rules, the code becomes such an confusing mess that nobody would bother to read the code.

Java Obfuscation is a fairly complex process. There are hundreds of heuristics that are applied to the incoming source code. The resulting end code is a nightmare of crazy classes, method and members names that make scanning the code painful.

Obfuscation for Javascript is much less powerful. Because Javascript is meant to be a widely open language, the ability to obfuscate the code is minimal. Variables, objects and methods in Javascript have no scope privacy and thus can be accessed by anyone and are often designed that way. This means that their names cannot be obfuscated because the names have relevance to the structure. You cannot change the name of a given method, because in many cases there is no way to tell whom has to call that method or where in the code that might happen.

Yet the biggest problem of all with Obfuscation is that it is almost completely worthless. Obfuscation will only stop the most basic of people from copying your code. Extracting obfuscated code is fairly simple in most interpreted languages. With many of today's sophisticated IDEs reformatting and stepping through code is fairly simple. Additionally, many IDEs support variable name matching such that when an obfuscated variable name is selected all of the matching uses are highlighted. All of this means taking obfuscated Javascript back to readable code is far easier than one might suspect.

Another big strike against obfuscation is that Javascript can be employed in very diverse and powerful ways. Closures, functions as objects, objects as associative arrays, and the like, can all lead up to some fairly complex Javascript. Often times this kind of programming can confuse an obfuscator that is not designed to handle the diversity of the language. Even when writing this email I managed to break one obfuscator with just the sample for loop code from above.

Obfuscation, at its best, serves to help keep honest people honest. It is not going to stop someone whose intent is to copy your code, it's not even going to put up a decent struggle.

Minification, while not technically an obfuscator, does provide some obfuscation like processes. The removal of whitespace and the collapsing of localized variables all makes the code harder to read. Yet, not that much is really changed otherwise. Consider our sample code block that we looked at earlier. With minification obfuscation, it is still fairly readable:

for(var L=0;L<;100;L++){var dS0=Math.floor(Math.random()*L);document.write("A random number between 0 and "+L+" is "+dS0);}

Given this code and about 2 minutes on the internet or a few search and replace calls and you can turn it back into fairly readable code as shown here:

for (var L = 0; L <; 100; L++) {
    var dS0 = Math.floor(Math.random() * L);
    document.write("A random number between 0 and" + L + "is" + dS0);
}

It's not a perfect match of our original code, but it is fairly close and takes little real work to achieve.

Ultimately, Obfuscation can have some utility in giving you a very basic first line of defense, but the ability to debug and read your code is reduced and in some instances obfuscation can add to code size, albeit a very small amount. These limitations must be carefully considered before undertaking obfuscation. What are you really trying to do when you obfuscate your code?

Unfortunately, there is no way to really protect your Javascript code on the internet. The design of HTML, Javascript, and CSS is all meant to be plain text readable formats and this means that absolutely nothing stands between the site code and the end user. The only real protection you have from people reading and copying your code is in how much the worry about your legal recourses.

CONCLUSIONS

So where does this leave us with regards to Minification?

Well, we showed that from a size compression stand point, minification has some small value, but not nearly as much value as just turning on HTTP Compression. From an obfuscation standpoint we outlined the fundamental weakness of obfuscation in general. Given these factors my recommendation is for careful consideration of whether or not minification is actually needed in the particular case you are making.

I would ask myself these questions:

1). How important is the ability to read and debug my code?

2). How much time and resources do I wish to invest in tracking down bugs in minified code?

3). How relevant is the commenting in my code to the ability to debug and read my code?

4). With Obfuscation, whom am I trying to protect my code from and how competent are they?

Ideally, I think the best application is to do no Obfuscation, HTTP Compression, and a reduced form of Minification in which only comments are removed from the code. This provides you with high compression and still maintains the ability to read and debug your code. In my opinion obfuscation really is not worth the effort to employ it for the benefit you receive from it.

CONFIGURATION NOTES

HTTP Compression can be turned on in either Tomcat of Apache Web Server with just a little bit of effort. Tomcat requires a mere 3 lines of code to get HTTP Compression working. Apache HTTP Server requires a few more lines, but it is still relatively easy. For more information about configuring tomcat see http://viralpatel.net/blogs/2008/11/enable-gzip-compression-in-tomcat.html. For more information about configuring Apache Web Server see http://httpd.apache.org/docs/2.0/mod/mod_deflate.html.

TOOLS NOTES

Finally, I will plug the YUI Compressor. There are a lot of Minification programs out there, but YUI Compressor seems to allow for the most configurability. Also, I liked YUI Compressors ability to minify both CSS and JS files within one simple program. You can find out more about YUI Compressor at http://developer.yahoo.com/yui/compressor/

Free is the new Black

So, being a Java guy I’m fairly religious to Sun as a company.  I invested in their stock a while back (for good or bad) and thus I always kind of keep an eye on the company.  Recently I came across a series of video blogs by Jonathan Schwartz, the CEO of Sun.  The one I want to share with you is his second video blog.  The blog talks about the Sun view of the technological market place, and how giving brands away for free (like Java and MySQL and others) drives adoption, which in turn can drive revenue. Now, I’m no corporate financial officer, just a code monkey, but I found the blog well thought out and I thought I would share it with you.

My own company is currently working on a new project with the goal of productizing it.  We’re still in the early development of this product and haven’t had the discussion about how to sell it to consumers yet.  However, as I go about building the User Interface for this site I cannot help but keep thinking to myself (even prior to reading Jonathan’s blog) that the value of the site can only be realized by driving users to the site (or as Schwartz put it: building adoption).  In order to achieve this, the site would need to be largely free to the masses and make its revenue either through some ancillary stream or by charging only for some specific “higher role” usage.  As I designed the front end and go about coding it, I keep telling myself that the site is meant to be used by millions of people freely and must scale accordingly.

Like I said, I’m not the business guy, just a code monkey, but I really can see this notion of giving something away and finding revenue beyond just usage.  I like to think of it as the Field of Dreams model… In the movie Ray, the main character, is told to “Build it and they will come” which is much like the philosophy Schwartz is espousing.  The existence of the Field of Dreams brings the people, and there is surely revenue to be made after the fact: selling popcorn or something; the method is not as important as getting the people, driving adoption.  If you have the people, selling them something they want beyond what you are offering should be easy.

Anyway, if you’d like to read/watch the video blog I am talking about, you can find it here: http://blogs.sun.com/jonathan/date/20090306

If you’d like to read/watch the entire video blog series (there are four of them), you can  find it here: http://blogs.sun.com/jonathan/date/20090302

If you’d like to follow Jonathan Schwartz’s blog, you can find it here: http://blogs.sun.com/jonathan/