Not into reading text? Click here for the code.
Like a lot of people at Small Improvements I’m fascinated by functional programming. After coming back from our company trip in San Francisco I had trouble beating jet lag due to spending the evenings reading about monad transformers, I’m not kidding, it actually kept me awake.
For a while I’ve been thinking about cleaning up a little in our codebase, mainly the backend which is written in Java. I have known for ages that Haskell is really good with abstract syntax trees (ASTs) and was playing with the thought of creating a Haskell tool that would help me with this. However, to not completely violate the “do not reinvent the wheel” rule I first had a quick look at what’s already out there.
Finding An Existing Tool or Building My Own
Most of the developers at work use IDEA (for editing Java) which has built in tools for finding unused code and do all different kinds of code analysis. I tried using it for finding unused code a couple of times with different settings but didn’t manage to get acceptable results. The number of false positives was way too high for it to be useful, in addition to this it was incredibly slow. I also tried Findbugs without satisfying results.
I’m sure it’s possible to configure some existing software, but rather than spending more time finding a COTS-tool I figured I might just code it myself. I was thinking that if it’s specific to our project it shouldn’t be so hard. I quickly realized regular expressions wouldn’t be enough or would be very tricky to use and limit my flexibility. This left me with the choice of writing a custom parser or building a proper AST and work with that.
I have bad experience of working with ASTs in Java, but Haskell is another story, traversing a tree is a piece of cake. I had a quick look at Hackage and noticed that someone already has written a parser for Java in Haskell, so it was settled, I was starting Small Improvements’ first, albeit small, Haskell project. Finally I got to use Haskell at work!
My Solution For Finding Unused Code
It is actually quite simple to find unused Java code. Let’s have a look at my solution. In essence I’m reading all the .java-files in a folder, building an AST using language-java and then traversing the AST to collect information that can later be used to decide if a file is used or not.
The main information I’m looking for is whether any other file imports a file. However, since Java does not require an import statement if the dependency is within the same package I also look for other things such as method calls. After this I’m using the information to actually find unused files.
To find unused files I’m building a graph. Nodes are files and an edge means that a file is used by another file. So the challenge here is to actually add an edge every time a file is used. An obvious thing to do is to add an edge for every import statement.
To improve the result further I’m adding edges for references within a package, eg. used classes or methods within the package. However, this is not enough since Spring MVC has a powerful dependency injection system. It supports injecting dependencies and still only relying on interfaces. You can get all classes of a type (interface) injected or one specific instance but still only depending on its interface.
When harvesting the AST I also collected autowired classes and superclasses. Using this I filtered out files that are autowired, either directly or via an interface. The result is not 100% perfect, but with a small blacklist of classes and some other trivial filtering I managed to make it good enough for it to be very useful. Everything I get from the AST is modeled using the following data structure:
data Result = Result { fileName :: String , imports :: [String] , references :: [String] , topLevelAnnotations :: [String] , methodAnnotations :: [String] , implements :: [String] , autowired :: [Autowiring] } deriving (Show)
Have a look at the code and try it on your own Spring MVC project. Feel free to comment here if you need help or have suggestions of improvements. Let’s now compare coding Haskell with Java / JavaScript that we normally do at Small Improvements.
Reflection of Development With Haskell
I’m a big fan of Haskell and have been for ages. One of the first things I noticed is the wonderful support you get from the compiler. When the compiler blesses your code it is very likely to just work. Once you have established that your code works, that it behaves correctly, then it is really difficult to accidentally change its behavior when refactoring. You might break it, as in making it not compile, but once it compiles again it is very likely to behave like before.
Composition is just beautiful. It strongly promotes breaking your program into trivial pieces and then glueing them together. Types are excellent documentation, the type signature together with the function name often makes it easy to guess exactly what the function does. It’s easy to write relatively clean code in Haskell. I think that the pureness and composition of small functions almost automatically makes it happen.
Actually, in Haskell it is a bit difficult to write functions that are hundreds of lines of code doing many different things. In Java or JavaScript that is what many people begin doing, and something they only unlearn as they become more skilled. I think that it is possible to produce nice code in all languages, but Haskell does help you quite a lot to keep your code nice, not to mention hlint. Haskell does not guarantee that you produce good code though, let’s look at some of my learnings from this project.
Learnings From This Project
One thing I learned is that type aliases are very useful, you should use them whenever it makes your code more readable. Comments are in general not needed if the type signature and function name is good.
Naming your code increases readability, for example extracting out small pieces of code to the where clause of a function or simply making them top-level functions in the module. Putting too many functions that are relatively complex in the where clause is a bad idea, because you lose the explicit type signature (you should always specify it for top-level functions) which makes it difficult to directly understand when they can be used and how they can be combined. A small example of a nice usage of the where clause is:
transformToEdges :: Result -> Node transformToEdges r = (r, fileName r, outgoingEdges) where outgoingEdges = references r ++ imports r ++ implements r
Note the increased readability in the top level expression. The where-clause is used to hide the messy details of what outgoing edges are behind a simple name. By using where it is often possible to make the top level expression very easy to read.
Curried functions are just awesome, they make it possible to compose almost any function. A good way to design them is to think of functions as being configured and getting what they operate on as the final argument.
Lazy evaluation is powerful, I still need to practice how to leverage it fully, but it is important to be aware of it. For example in my case I ran into problem when reading all files lazily. This caused my program to have too many open file handles. It was easily solved though, by hacking a bit to force the complete file to be read directly:
readFileStrict :: FilePath -> IO String readFileStrict path = do file <- readFile path _ <- evaluate $ length file return file
Recursion further promotes clean code (small functions) and is quite easy to work with when you think of it in terms of base-case and induction/normal case. An interesting thing is that a lot of principles and ideas can be transferred to other languages.
Transferable Knowledge
One example of a transferable idea is solving problems through composition of many small functions, this can be used in JavaScript (eg. using Lodash-fp or Ramda) quite easily. Composition promotes having many small functions solving simple subproblems, and does often result in cleaner code.
It doesn’t end here, Hindley-Milner type signatures might be worth to use in JavaScript as well, even if they aren’t used for more than documentation. Without them all the functions you end up with can be quite difficult to read.
Currying is easy to use in JavaScript (eg. with Lodash-fp or Ramda). I think I would go as far as to say that composition is not especially useful without curried functions.
It is important to be aware of differences between Haskell and other languages though. For example lazy evaluation is a quite unique feature of Haskell, another feature is tail call optimization, which means that you can use recursion without constantly worrying about your stack blowing up. I think there are a lot of other transferable learnings, but they are a bit deeper and you simply have to code Haskell to learn them. If you don’t want to walk the path via Haskell, for JavaScript you might find Professor Frisby’s Mostly Adequate Guide to Functional Programming useful.
Final Words
I would like to encourage every programmer to experiment with different languages and concepts. It is easy to just use what is immediately required for your daily job. But you miss out on a lot of ideas from other languages and risk getting caught in a small bubble, hindering you from developing as a developer.
At Small Improvements we get to spend around 20% of our time doing other things such as fixing pet peeves and working on side-projects (for example this one). In addition to this we have hackathons and ship-it weeks. I would recommend every company to introduce these kind of events, because I don’t think I’m the only developer who would agree with that programming is way more fun when you keep learning new things and growing as a developer.
To be a good developer you need to keep learning and don’t be afraid of not being instantly awesome when picking up something new. Keep exploring the beautiful world of coding!