Looking Back at GOTO 2016

By Peter Crona and Michael Ruhwedel

goto

First of all, it was an amazing conference as always. None of us presented this year, but look for us in the future. Many of us at Small Improvements tend to go to more specific conferences, such as React Europe, DockerCon or JSUnconf. GOTO is more of a generic software engineering conference, focusing on issues such as architecture, security and new trends in the field. It doesn’t go as deep into the topics as the specialized conferences, but it serves well to give an overview and an introduction to interesting topics. Some of the most interesting and most popular topics were, as expected, microservices, data science, security and ethics. Let’s start with microservices.

Microservices are the Future

Something interesting about the future is that it is also always in the present, just initially hiding a bit in the corners. A clear message from Mary Poppendieck was that microservices are the future. Regardless of whether we want it or not, we need to learn it and will eventually use it.

Susanne Kaiser from Just Software talked about their ongoing journey from a monolith to microservices. She warned us from doing too much at once, but concluded that going from a monolith to microservices was worth it in the end. She also told us about the importance to not underestimate the effort required to do so. Later on Ilya Dmitrichenko walked us through Socks Shop, a demo application to show how an application built with microservices can look. He also showed us how a microservice-based application is deployed.

I urge you to read up on microservices if you haven’t. It is truly fascinating how convenient the configuration is nowadays, and if you’ve been around for awhile, you will find it interesting to compare with how we did it in the good old days. Have a look at this configuration for example, lovely, isn’t it? Let’s move on to another topic, which I have a very strong interest for, namely data science.

Seeing into the Future

It is truly fascinating how quickly data science has become popular and advanced. One of the first talks I went to was “Applied data science and engineering for local weather forecasts” by Nikhil Podduturi from Meteogroup. He took us through how they started using machine learning, running everything on their own laptops and then moving into the cloud. He showed us a bit of their architecture that process more than a terabyte daily. I enjoyed his talk very much and had a chat afterwards, in which he pointed out that, when getting started with data science, it is sensible to start with the basics, learning/repeating the mathematics and then move on to hot techniques such as deep learning. This will make it easier to develop an intuition for which technique to use when and how to find the best parameters. He recommended using Python since it has a very mature ecosystem for machine learning.

Robert Kubis from Google tutored us in Tensorflow, by working the Hello World of machine learning, namely classification of handwritten digits. He pushed the success rate of a neural network, up to an impressive 98%, while touching the basics of the Python API. This was a very interesting and hand-on talk, showing you how to use Tensorflow and giving you an introduction to deep learning.

How to find insights without using machine learning, was the topic of Michael Hunger from Neo Technology talk. He demonstrated how data can be modelled and queried using a graph. His talk focused on how Neo4J was used by journalists to analyze the panama papers.

Even your code repository is a datasource in itself that can be mined. This concept was presented by Dr. Elmar Juergens. By coloring new additions of code and test-coverages of functional tests, he clearly demonstrated that a dev- and a test-department at one of his clients had a serious communication problem: There was little overlap in what was tested and what was newly implemented.

The last two talks about data science were focusing a bit more on possibilities, philosophy and ethics. “Deep Stupidity: What Neural Networks Can and Cannot do …” by Prof J. Mark Bishop discussed about whether we can build general intelligence or not. “Consequences of an Insightful Algorithm” by Carina C. Zona focused on the importance of thinking through the ethical aspects when developing algorithms and using them. We are giving a lot of power to algorithms, and algorithms tend to reinforce prejudices and do not necessarily care about what is right, but are still used to make decisions that affect people’s lives. Let’s now have a look at the security talks.

A Secure Internet

When you learn a new concept, such as microservices, it is important to read up on security. It is easy to make mistakes that introduce vulnerabilities when you are new to technologies. Phil Winder talked about how to make your microservices secure. He was very practical and showed us common mistakes people do, such as running as root in containers and not setting up a sensible network policy. Dr. Jutta Steiner introduced us to Blockchain technology. She pointed out how we can use techniques from safety critical systems development, such as N-version programming, to securely implement it and minimizing the risk of bugs. The talk was unfortunately not going into implementation details of blockchain technology itself, but she made it clear that the technology can be used for much more than just a currency such as Bitcoin. Finally, let’s have a look at the ethics focused talks.

Ethics in Technology

The great thing about goto is, that it’s not got the latest technology topics covered, but also how to better get along with your fellow human beings.

Jamie Dobson encouraged us to think beyond capitalism in his inspiring “Postcapitalism” talk. It’s possible that the  power of 3D printing small and large can bring back the capital and onshore work in developed countries again.

Beginning with a short meditation Jeffery Hackert build a compelling argument for giving our full presence. With a full awareness of ourselves and our workplace come better informed observations, decisions and implementations. After all if you’re ever involved in a trolley problem, it would be really unfortunate if you’d be focused on your cellphone and not the lever.

If you’ve been exhausted by office politics Kate Gray and Chris Young can help you. Their great talk “How to Win Hearts and Minds” is about how the finesse of real world politics were used to push a blocked IT project to success.

Talks ranging from microservices to ethics shows you the great variety offered at GOTO, the conference really has a lot to offer.

Something for Everyone

Let’s end with some words about the conference itself. GOTO has five different tracks and the mix is very good, covering important and trending topics such as architecture (in particular microservices), security, data science and much more. In addition to this you find plenty of interesting people there to share ideas and pain points with. My only disappointment was that there was not a single talk about functional programming. But hey, you can’t fit everything into one conference.

Using Haskell to Find Unused Spring MVC Code

Screen Shot 2016-12-02 at 14.43.09.png

Not into reading text? Click here for the code.

Like a lot of people at Small Improvements I’m fascinated by functional programming. After coming back from our company trip in San Francisco I had trouble beating jet lag due to spending the evenings reading about monad transformers, I’m not kidding, it actually kept me awake.

For a while I’ve been thinking about cleaning up a little in our codebase, mainly the backend which is written in Java. I have known for ages that Haskell is really good with abstract syntax trees (ASTs) and was playing with the thought of creating a Haskell tool that would help me with this. However, to not completely violate the “do not reinvent the wheel” rule I first had a quick look at what’s already out there.

Finding An Existing Tool or Building My Own

Most of the developers at work use IDEA (for editing Java) which has built in tools for finding unused code and do all different kinds of code analysis. I tried using it for finding unused code a couple of times with different settings but didn’t manage to get acceptable results. The number of false positives was way too high for it to be useful, in addition to this it was incredibly slow. I also tried Findbugs without satisfying results.

I’m sure it’s possible to configure some existing software, but rather than spending more time finding a COTS-tool I figured I might just code it myself. I was thinking that if it’s specific to our project it shouldn’t be so hard. I quickly realized regular expressions wouldn’t be enough or would be very tricky to use and limit my flexibility. This left me with the choice of writing a custom parser or building a proper AST and work with that.

I have bad experience of working with ASTs in Java, but Haskell is another story, traversing a tree is a piece of cake. I had a quick look at Hackage and noticed that someone already has written a parser for Java in Haskell, so it was settled, I was starting Small Improvements’ first, albeit small, Haskell project. Finally I got to use Haskell at work!

My Solution For Finding Unused Code

It is actually quite simple to find unused Java code. Let’s have a look at my solution. In essence I’m reading all the .java-files in a folder, building an AST using language-java and then traversing the AST to collect information that can later be used to decide if a file is used or not.

The main information I’m looking for is whether any other file imports a file. However, since Java does not require an import statement if the dependency is within the same package I also look for other things such as method calls. After this I’m using the information to actually find unused files.

To find unused files I’m building a graph. Nodes are files and an edge means that a file is used by another file. So the challenge here is to actually add an edge every time a file is used. An obvious thing to do is to add an edge for every import statement.

To improve the result further I’m adding edges for references within a package, eg. used classes or methods within the package. However, this is not enough since Spring MVC has a powerful dependency injection system. It supports injecting dependencies and still only relying on interfaces. You can get all classes of a type (interface) injected or one specific instance but still only depending on its interface.

When harvesting the AST I also collected autowired classes and superclasses. Using this I filtered out files that are autowired, either directly or via an interface. The result is not 100% perfect, but with a small blacklist of classes and some other trivial filtering I managed to make it good enough for it to be very useful. Everything I get from the AST is modeled using the following data structure:

data Result = Result { fileName :: String
                     , imports :: [String]
                     , references :: [String]
                     , topLevelAnnotations :: [String]
                     , methodAnnotations :: [String]
                     , implements :: [String]
                     , autowired :: [Autowiring]
                     } deriving (Show)

Have a look at the code and try it on your own Spring MVC project. Feel free to comment here if you need help or have suggestions of improvements. Let’s now compare coding Haskell with Java / JavaScript that we normally do at Small Improvements.

Reflection of Development With Haskell

I’m a big fan of Haskell and have been for ages. One of the first things I noticed is the wonderful support you get from the compiler. When the compiler blesses your code it is very likely to just work. Once you have established that your code works, that it behaves correctly, then it is really difficult to accidentally change its behavior when refactoring. You might break it, as in making it not compile, but once it compiles again it is very likely to behave like before.

Composition is just beautiful. It strongly promotes breaking your program into trivial pieces and then glueing them together. Types are excellent documentation, the type signature together with the function name often makes it easy to guess exactly what the function does. It’s easy to write relatively clean code in Haskell. I think that the pureness and composition of small functions almost automatically makes it happen.

Actually, in Haskell it is a bit difficult to write functions that are hundreds of lines of code doing many different things. In Java or JavaScript that is what many people begin doing, and something they only unlearn as they become more skilled. I think that it is possible to produce nice code in all languages, but Haskell does help you quite a lot to keep your code nice, not to mention hlint. Haskell does not guarantee that you produce good code though, let’s look at some of my learnings from this project.

Learnings From This Project

One thing I learned is that type aliases are very useful, you should use them whenever it makes your code more readable. Comments are in general not needed if the type signature and function name is good.

Naming your code increases readability, for example extracting out small pieces of code to the where clause of a function or simply making them top-level functions in the module. Putting too many functions that are relatively complex in the where clause is a bad idea, because you lose the explicit type signature (you should always specify it for top-level functions) which makes it difficult to directly understand when they can be used and how they can be combined. A small example of a nice usage of the where clause is:

transformToEdges :: Result -> Node
transformToEdges r = (r, fileName r, outgoingEdges)
  where outgoingEdges = references r ++ imports r ++ implements r

Note the increased readability in the top level expression. The where-clause is used to hide the messy details of what outgoing edges are behind a simple name. By using where it is often possible to make the top level expression very easy to read.

Curried functions are just awesome, they make it possible to compose almost any function. A good way to design them is to think of functions as being configured and getting what they operate on as the final argument.

Lazy evaluation is powerful, I still need to practice how to leverage it fully, but it is important to be aware of it. For example in my case I ran into problem when reading all files lazily. This caused my program to have too many open file handles. It was easily solved though, by hacking a bit to force the complete file to be read directly:

readFileStrict :: FilePath -> IO String
readFileStrict path = do
  file <- readFile path
  _ <- evaluate $ length file
  return file

Recursion further promotes clean code (small functions) and is quite easy to work with when you think of it in terms of base-case and induction/normal case. An interesting thing is that a lot of principles and ideas can be transferred to other languages.

Transferable Knowledge

One example of a transferable idea is solving problems through composition of many small functions, this can be used in JavaScript (eg. using Lodash-fp or Ramda) quite easily. Composition promotes having many small functions solving simple subproblems, and does often result in cleaner code.

It doesn’t end here, Hindley-Milner type signatures might be worth to use in JavaScript as well, even if they aren’t used for more than documentation. Without them all the functions you end up with can be quite difficult to read.

Currying is easy to use in JavaScript (eg. with Lodash-fp or Ramda). I think I would go as far as to say that composition is not especially useful without curried functions.

It is important to be aware of differences between Haskell and other languages though. For example lazy evaluation is a quite unique feature of Haskell, another feature is tail call optimization, which means that you can use recursion without constantly worrying about your stack blowing up. I think there are a lot of other transferable learnings, but they are a bit deeper and you simply have to code Haskell to learn them. If you don’t want to walk the path via Haskell, for JavaScript you might find Professor Frisby’s Mostly Adequate Guide to Functional Programming useful. 

Final Words

I would like to encourage every programmer to experiment with different languages and concepts. It is easy to just use what is immediately required for your daily job. But you miss out on a lot of ideas from other languages and risk getting caught in a small bubble, hindering you from developing as a developer.

At Small Improvements we get to spend around 20% of our time doing other things such as fixing pet peeves and working on side-projects (for example this one). In addition to this we have hackathons and ship-it weeks. I would recommend every company to introduce these kind of events, because I don’t think I’m the only developer who would agree with that programming is way more fun when you keep learning new things and growing as a developer.

To be a good developer you need to keep learning and don’t be afraid of not being instantly awesome when picking up something new. Keep exploring the beautiful world of coding!

Creating The Right Knobs

I recently participated in Softwareskills’ Liar’s Dice competition, and since people have expressed interest to hear about how I managed to win, I decided to summarize the process and results.

soft

As I prize I got 500SEK at Teknikmagasinet (Swedish store), a USB memory and this nice piece of paper 🙂

Liar’s Dice and the Competition

Liar’s dice is a game where each player starts with six dice. One player starts by announcing a bet. A bet consists of a number and a face, for instance four fives. The next player can then either challenge the bet or raise the bet. You can read about the details at https://en.wikipedia.org/wiki/Liar%27s_dice. Softwareskills’ made a competition about writing the best AI. I noticed that it is not clear to everyone what an AI is in this context, and how to get started writing one. So let’s discuss that briefly.

Simplified View of an AI

An AI (actually Intelligent Agent [IA], but I’ll continue to call it AI) in its simplest form can be seen as a program that given a state responds with a valid action. A good AI must, in addition to producing valid actions, produce the best valid action given a state. What best means is the real challenge which typically requires mathematical analysis and creativity to figure out. In some cases you might calculate what the best action is, but it might not be easy given that you don’t know how your opponents behave. In the case of Liar’s dice you don’t know your opponents’ dice and they might behave differently depending on what dice they have. Let’s have a look at how I approached this challenge.

The Development of My AI

First off, I’m not developing AIs on a day to day basis. I had some experience from before, but not much. I started by drawing the “Simplified View of an AI” on a piece of paper, this made it easier for me to break the problem down into the main components:

  1. Generate all valid actions given a state
  2. Put a score on all valid actions
  3. Pick the valid action with the highest score

Let’s look at each problem separately, starting with how to generate valid actions.

Action Generation

To generate actions you need to create a function that given the current state and the rules of the game can produce all valid actions. This is pretty boring work that I wanted to get through as quickly as possible to move on to what could actually get me a good place on the top list. Softwareskills’ provided a model with a track that has 27 positions. A die on a position corresponds to a specific bet with the face of the die (some positions require the bet to be a star, where star = six). Now, either the bet can be challenged or raised. A raise can either be of a “higher” face on the same position or any face at a later position. So to generate valid actions I separated the problem into the following sub problems:

  1. Find the position of the current bet on the “track”
  2. Given the position and face:
    1. Get all bets with a higher face on the same position
    2. For all subsequent positions, generate bets with all possible faces

This allowed me to quite quickly generate all possible actions. Now, to value all the possible bets, provide them with a score, I needed to revisit some basic statistics.

Statistics

When I first saw the competition I thought it was just a matter of calculating the optimal action using statistics. I asked myself whether I was more likely to win if I challenged the previous bet or if I raised the bet, and if I raise, how much should I raise the bet. Based on these questions I created new questions which I could directly “solve”, namely:

  1. Given my known dice, and the number of unknown dice in game, what is the probability of the previous bet to be true?
  2. Given my known dice, and the number of unknown dice in game, what is the probability of my raised bet to be true?

Simple was my first thought before I actually got started. I was easy to come up with a solution, and it could even work decently. The only problem was that it was wrong. And to reach the top of the toplist the number of afforded mistakes is pretty low. It was not easy to figure out whether the statistics was properly calculated or not. When I was stuck at a score far from the best, I challenged my code by doing static analysis, which in this case meant verifying the mathematics. I realized, by reasoning, that it was wrong. This led me to reach for my old book from school (Introduction to Probability and Statistics: Principles and Applications for Engineering and the Computing Sciences by J. Susan Milton, Jesse Arnold) as well as to look at a couple of lectures at Khan Academy (https://www.khanacademy.org/math/probability). After a couple of hours studying combinations, permutations and the good old “number of favorable events / number of total outcomes” as well as experimenting with simple examples, I managed to get it right. I jumped to the top but not to the first place. Something was still missing.

Getting Desperate

I wanted to go for a clean solution based on simple mathematics. But analysing the games made it clear that my AI was often put into bad situations, where no action was particularly good, and that it challenged the bet too often. It was not really playing the game, it was still just following the rules but a bit more intelligently. I started to add a lot of knobs now. For instance weighted averages and factors and terms of various kinds. I created an massive amount of mathematical functions whose graphs looked approximately as I wanted. Many linear (y=kx+m) as well as some polynomials of a higher degree. I experimented by taking into account everything I could think of, such as if I have a larger or smaller part of the game than the previous player or next player, if the game just started, the previous bets of other players and many other things. The factors could influence the score of an action. But I didn’t know how much they should do so. So I created knobs, factors that I could adjust to make the impact higher or lower. Then I tried to find the optimal values for the knobs.

func
Function used for deciding how much to trust other opponent’s bets. Eg. the second bet is trusted to 100%. Note that this function was combined with other functions to calculate the final trust factor.

I basically created a big optimization problem which I only could try to solve empirically. I did this in iterations. I created a lot of knobs, and realized that I couldn’t grasp what I actually did. Then I removed all of them and tried to create a new set of knobs. Repeated this a couple of times until I finally overtook the throne, meaning I finally conquered first place on the toplist. Even if I wouldn’t have won, I feel that this kind of competition is very valuable. To share why I think so I will briefly mention some things I learned from participating.

Lessons Learned

When I started the competition I didn’t even know there was a prize. This was just a nice bonus. I entered the competition for the challenge. I was sure that I would learn something. Afterwards I think my biggest gain is refreshing statistics skills. It’s quite fascinating how easy it was to do so despite not having worked with it for a long time. I also learnt that it is important to carefully measure the results when you have knobs, or parameters, that you can adjust. With the risk of using too big words, I would say that it taught me to be a slightly better scientist by clearly showing the necessity of careful measurements when empirically looking for optimal parameters. From a software engineering point of view it made it even clearer that correctness of the core is very important, and just because something seems correct it might still have bugs in it. Proper testing of the basic units (function in this case) is very valuable. And just because your solution is based on mathematics it doesn’t mean it will be right or wrong. It might still be slightly wrong.

Furthermore, I learnt the value of keeping the code in a at least decent state at all time, to not hinder oneself from experimenting due to the effort being too high. And finally, I learnt the value of to kill your darlings over and over. Even if you are at second place, when you’re stuck, start over and just keep the generic core which lets you build working solutions quickly. Don’t stick to your good solution just because you feel that you invested so much time in it. Challenge it! You managed to come up with it once, and probably learnt from it, so the next time you might come up with something even better.