Looking back at the FullStack Fest

It was a couple of months or so ago when I came across this conference called FullStack Fest, skimming through the agenda, I was immediately intrigued and thought “I’ve got to check this out”. The coolest bit? The conference was taking part in the beautiful city of Barcelona.

September finally came around, and just as the Berlin air was getting chilly and bringing signs of the impending winter, I was flying off to the warmth of Spain. I got there a bit early and spent a nice Sunday roaming around on the streets, admiring the architecture and the history. The next day began the Backend bit of the FullStack Fest. It was interesting to step into the intricate world of the architecture of buildings one day, and after admiring it, to step into the equally intricate world of Software Architecture the next.

Sun baked and refreshed, I went to the conference, all set with a notebook and a bag of goodies from the organisers. One must collect stickers for the laptop after all.

The backend days of the conference were abstractly divided into “DevOps” and “Architecture” with the topic being “Problems of Today, Wonders from the Future”. To describe the theme of the conference in a single word, I would say “Distributed Systems”.

Day 1:  DevOps

The first talk was by Karissa McKelvey (@okdistribute). She talked about a project which would allow people to share their scientific work without the consumers having to pay for it. A common problem in research is getting access to the required data, journals, publications etc. This is so because a lot of bureaucracy, censorship and corporate licenses get in the way of open sourcing knowledge. Karissa and her team have worked on something called the Dat Project. This creates a distributed network of many data hosts (mostly universities), through which you can upload your files and download any file through a little identifier. You can access Karissa’s presentation from the conference using this little identifier (dat://353c5107716987682f7b9092e594b567cfd0357f66730603e17b9866f1a892d8) once you install the dat tool on your machine. Though this is still vulnerable to being used as an illegal file hosting service, it’s a good step towards making data and knowledge more reachable and transparent.

Following up on this was an interesting introduction to Ethereum as a way to enter ‘contracts’ without trusting a third party such as a notary, this is done by distributing the idea of trust amongst many participants. As Luca Marchesini (@xbill82) said in his talk:

“The machine is everywhere.. The machine is nowhere”.

With the beautiful underlying power of the Nakamoto consensus protocol that powers the blockchain and the added flexibility of Turing complete capabilities, allowing you to express the intent of your contract and its fulfilment in terms of an actual computer program, you can have the word of truth floating around in the world, verifiable and undeniable.

With the buzz words “microservices” and “serverless” applications going around, one would of course be expecting a talk on these topics. Marcia Villalba (@mavi888uy) gave a great talk on what “serverless” really means…and no, it does not mean there is no server (of course). The idea of a serverless application is to utilise the cloud and write self contained functions to do simple tasks. Some highlights from the talk worth remembering are:

  • Functions in the cloud are triggered by events, they do not have state.
  • Pay as you go, scale automatically.
  • Create a proof of concept and then optimise your solution to take advantage of the cloud.
  • Automate: your CI pipeline and your testing.
  • Reduce latency by being at the edge of the cloud.

Next we stepped into the world of cyber security with Dr. Jessica Barker (@drjessicabarker), who talked about tackling vulnerabilities, specifically those introduced by negligence on the part of an end user. She talked about educating users on security instead of treating them as the weakest link in the chain and ‘dumbing things down’. She made her case in light of the Pygmalion Effect, according to which higher expectations lead to better performance. A common problem when building human friendly security guidelines is that the user is treated as a dumb entity and that leads to the user acting like a dumb entity.

Frank Lyaruu (@lyaruu) then came in with an anecdote about how he wanted a swiss army knife that did everything when he was a child, and ended up with an utterly useless one. It was quite easy to see the analogy here… we have all faced feature bloat, we’ve all wanted a framework to do everything and then been frustrated with the abstractions that make customisations a nightmare. Frank introduced the concept of ‘fullstack databases’. The key idea? Identify your use case and use the right database for it. While SQL databases may work for one scenario, GraphQL would be much better in another. The take away:

“Your thinking influences your choice of tools and your choice of tools influences your thinking.”

A representative from Booking.com, Sahil Dua (@sahildua2305) , then told us how Booking.com handles their deep learning models in production. The problem they need to solve is that different data scientists need access to an independent environment for training. They have their training script in a container, and a container runs on every needed server. The load of containers is managed by Kubernetes. This indeed was a lesson in how to manage different containers and independent environments with very high performance needs.

As Software Engineers, we know one thing for sure, and that is that things will, at some point, fail.

“There are two kinds of systems, those which have fails and those which will.”

Aishraj Dahal (@aishraj) walked us through chaos management. Some useful principles that he talked about were to:

  • Automate what you can and to have a framework for dealing with incidents.
  • Define what a “minor” and “major” incident means..
  • Define business failures in terms of business metrics, for example, the amount of revenue lost per hour of down time..
  • Single Responsibility Principle: One person should be responsible for one task in an incident, if everyone is combing through the git history looking for the last stable commit, its redundant work..
  • Never hesitate to escalate.
  • You need an incident commander, this person is the one who orchestrates the efforts to get back on track.

Day 2: Architecture

The second day of the FullStack Fest began with an exciting talk by John Graham Cumming (@jgrahamc) on the Internet of Things as the vector for DDoS attacks. He showed how vulnerable IoT devices are, with simple lapses like having telnet open on port 23. These devices are exploited by sending small http requests to a server, and sending A LOT of them, demanding a large response targeted towards a victim. As an employee of Cloudflare he could shed some light on how network patterns are used to discern legitimate and other requests. Some ways to protect yourself against DDoS attacks are to install something to do rate limiting, block every entry point that you do not need and use DDoS protection tools from a vendor such as Cloudflare.

One of my favourite talks from Day 2 included James Burns’ (@1mentat) introduction to chaos engineering and distributing tracing. He began by defining a practical distributed system as one that is observable and resilient. Observability comes with tracing whereas resilience can be tested through Chaos Engineering i.e. intentionally causing a system to fail as a “drill” and having the engineers on board try to fix it without knowing the cause of the problem or even what the problem is. If you have many such drills, when real chaos hits the team will be well prepared to tackle it.

Chris Ford (@ctford) took the stage and talked about a hipster programming language called Idris which can be used to specify distributed protocols. In Ford’s words, his 10th rule of microservices is:

“Any sufficiently complicated microservice architecture contains an ad-hoc, informally-specified, bug-ridden, slow implementation of a distributed protocol.”

A distributed protocol’s specification can be tricky to get right. With a language like Idris, whose compiler checks the types, where functions are value and even types are values, the level of strictness when specifying a protocol is greatly increased and the chances of runtime bugs reduced as the compiler is smart enough to capture protocol violations. A protocol can be thought of as a finite state machine and is so specified in the Idris programming language. Be forewarned though, this is still ongoing research and definitely not production ready!

We then dove into philosophy, the nature of order and structure preserving transformations with Jerome Scheuring(@aethyrics). He talked about identifying the core of the application and then building transformations around it. The key being that the structure of your application remains the same when more layers are added onto it. He hinted at functors as a tool for achieving such transformations of architecture.

After some lightning talks and a tutorial on ‘hacking’ into systems that only exist for a few milliseconds (lambdas which are only alive for the scope of a simple execution) and then on how to defend such systems, the backend bit of the conference came to a close.

The conference was a pretty cool look into research topics meeting with the software industry and creating some innovative solutions to existing problems. Though I haven’t listed all the talks here, you can check them out on youtube: https://www.youtube.com/watch?v=_V4tGx85hUA&t=536s.

I left Barcelona having felt that I have gazed into the future of technology and seen the wheels set in motion for many advancements set to come in the next few years. Though the conference could have been even better if it had some more topics related more explicitly to everyday software development, I feel that I walked out a more knowledgeable person than before.

Screen Shot 2017-10-04 at 13.45.21

Broadening one’s horizons, beyond the scope of their job description is not only intellectually stimulating but also makes for a more content and productive mind. Small Improvements, by sponsoring this trip (and many others for their employees’ learning and development) is creating a happier and smarter workplace. I am yet again in Berlin, at my desk, ready to tackle more challenges and apply the numerous things I gleaned from the FullStack Fest. Looking forward to next conference!

Running our App Engine Application in the Flexible Environment (Java 8)

It’s no secret that we at Small Improvements love to use cutting edge technologies for our application. On the client side, there’s no limit, that’s why we’re rapidly transitioning to React. In the backend, we’re pushing the limits too, but we’re currently bound by what the App Engine has to offer. The main grievance for us is that we’re still using Java 7.

There are hints that Google will bring Java 8 to the App Engine, but during our recent Ship-It week, we decided to take matters into our own hands and run Small Improvements on a Java 8 Flexible Runtime, aka Flexible Environment or Managed VM, the name changes frequently ;).

completablefuture_in_java_8_

If you never heard of the Flexible Runtime, it’s basically a Docker container that will run your App Engine application. To get started quickly, you can use Google’s Java 8 / Jetty 9.3 Compat Runtime container without touching (or even seeing) any Dockerfile.

While Google provides a couple of Hello World examples, this won’t help you much when your app won’t start and you can’t figure out why.

If you’re like us and prefer to use the Cloud SDK to deploy over Maven, please read on and I’ll show you how we managed to get our app running.

Caveat: It’ll work, but it’s definitely not quite production ready. We wouldn’t recommend it for your main app, but if you have a non-mission critical service,  you could give it a shot.

Bye bye XML! Hello YAML!

XML was quite nifty when it was introduced 20 years ago. But YAML is so much easier on the eyes.

Lucky for us, the Flexible Runtimes are configured by YAML files. You can generate them from your exploded App Engine project using appcfg.sh which is included in the Java SDK:

appengine-java-sdk/bin/bin/appcfg.sh stage\
   your-exploded-app stage-directory

Have a look at the generated YAML files: Cron, Dispatch, Dos, Index and Queue. They should all be deployable and contain the exact same configuration as their XML counterparts.

To get app.yaml into production it requires some additional steps …..

App.yaml and its gotchas

Static files … or not?

Our generated app.yaml was a bit crude and yours might be too. For us, the static files and their expiration settings were very verbose:

- url: (/resources/.*\.jpg)
  static_files: __static__\1
  upload: __NOT_USED__
  require_matching_file: True
  login: optional
  secure: always
  expiration: 5d
- url: (/remote_api/.*\.jpg)
  static_files: __static__\1
  upload: __NOT_USED__
  require_matching_file: True
  login: optional
  secure: optional
  expiration: 21d
- url: (/api/tasks/.*\.jpg)
  static_files: __static__\1
  upload: __NOT_USED__
  require_matching_file: True
  login: admin
  secure: optional
  expiration: 21d
...

You’ll notice a lot of duplications. In our case so many, that the deploy failed since there is a hard limit for the number of entries :D. But no worries the handler syntax supports regular expressions.

So for example, you can configure the serving/caching of your and fonts and images in a single handler:

- url: (/.*\.(ttf|eot|svg|woff|gif|jpg|png|ico))
  static_files: __static__\1
  upload: __NOT_USED__
  require_matching_file: True
  secure: always
  expiration: 21d

It’s unclear if Google will support serving static files automagically in the Flexible Environment. Currently, they suggest that you upload the files to a cloud storage bucket and serve them from there.

Our hope is that this is only an intermediate step. Who knows, they are not so forthcoming with their roadmap 😉

What we’ve gathered by monitoring the logs of our deployed app: Currently Flexible Environment deployments ignore the static_files handlers. So whatever you write in the handlers your application will still serve the files.

Security Constraints

If you have security constraints for your Servlets/Resources, you’ve expressed them so far in web.xml:

  <security-constraint>
      <web-resource-collection>
          <url-pattern>/admin/tasks/*</url-pattern>
      </web-resource-collection>

      <auth-constraint>
          <role-name>admin</role-name>
      </auth-constraint>
  </security-constraint>

This won’t work in a Flexible Runtime. For us, it closed all the responses of the server unexpectedly. You can safely remove the constraints from this file and express them in app.yaml.

Here’s how the example from above looks in the app.yaml:

- url: /api/tasks/.*
  script: unused
  login: admin

Selecting the Runtime

The last missing piece is to actually configure your app to run in a Flexible Environment:

vm: true
runtime: java
runtime_config:
   jdk: openjdk8
   server: jetty9
threadsafe: True
resources:
  cpu: 4

Line 1 is the big switch that will let your app run in the Flexible Environment.
Line 2 will upgrade you to Java 8.
Lines 3-5 are optional, just in case you’d like to try different Java/Jetty combinations.
Lines 7-8 are specifying how powerful your compute engine machine is – and how expensive.

Check out Google’s documentation to  learn what other settings you can play with.

Cleaning up

Remove the XML configurations

Now that all your YAML files are ready, take off the training wheels and delete the following the XML configurations:

  • cron.xml
  • queue.xml
  • datastore-indexes.xml
  • dos.xml

Bonus (almost) get rid of application-web.xml

Whatever you’ve got in application-web.xml you can configure it in app.yaml now. Here are the only settings you’ll need to keep in there:

<?xml version="1.0" encoding="utf-8"?>
<appengine-web-app xmlns="http://appengine.google.com/ns/1.0">
    <vm>true</vm>
    <threadsafe>true</threadsafe>
    <sessions-enabled>true</sessions-enabled>
</appengine-web-app>

Test Run

The Cloud SDK brings its own App Engine development server dev_appserver.py. You can use it to test your upgraded application in a Flexible Environment locally:

# install the dev_appserver.py
gcloud components update app-engine-python
dev_appserver.py stage-directory/app.yaml

If everything worked as expected, you’ll be able to access the development server.

Deployment (Fingers Crossed)

For our deployment, we choose not to use the Maven plugin from Google’s examples (who would after getting rid of so many XML files 😉 ).

You can elegantly use gcloud from the Cloud SDK to deploy:

cd stage-directory
gcloud\
  app deploy\
    --no-promote\
    --version=any-version\
    --project=your-project\
    app.yaml\
    cron.yaml\
    dispatch.yaml\
    dos.yaml\
    index.yaml\
    queue.yaml

Congratulations you’ve upgraded your application to Java 8 and a modern Jetty!

So I can use Flexible Runtimes, or what?

We’ve encountered a lot of errors before the deployment worked.

Sometimes the cloud build timed out. Or the generated app.yaml file broke the gcloud deploy. (Google support helped us patch the Python executable: Big thanks!)

The main problem we have is, that the deployment of our application – composed of two modules – is taking 15~18 minutes in the Flexible Environment. To put this in perspective: The regular re-build and deploy of our application is well below 10 minutes.

Also from the development perspective, we’re not ready to forgo the convenience of firing up a development server in IntelliJ Idea. The development server from the Cloud SDK is cool, but it would need some more tweaking to develop locally without a lot of restarts (read: too many 😉 ).

Conclusion

All in all, it was a fun and interesting project for us. It’s good to see that our application can run on the latest stable Java version.

The Flexible Environment is still in beta and NOT production ready. It’s NOT covered by any SLA.

We decided to let a lesser important microservice run in the Flexible Environment. It doesn’t require many redeploys and has been happily serving for two months. So far it only had the forgivable quirk of logging to standard error instead of the request log.

Nevertheless, don’t be discouraged. If our instructions worked for you, you’ll be ready when Google finally ships the Flexible Runtime … we know we are 🙂

Running TeamCity on Kubernetes

Definition

Kubernetes – κυβερνήτης • (kyvernítis) mBt_pEfqCAAAiVyz

  1. governor (leader of a region or state)
  2. (nautical) captain, skipper
  3. pilot (of an aircraft)

Motivation

61066550We recently moved to a new office and dis­covered that one of our bare metal Con­tin­uous Integration build agents didn’t survive the move. Since other developers were already unhappy with the fact, that the CI could only be reached from within the office, we took that as an in­cen­tive to give  building on the Google Cloud Platform a shot.

We already experimented with CoreOS for our internal logging system. Although we liked it, we were quite curious how a cluster setup with Kubernetes (K8S if you like) performs.

So in this post, we’ll show you how to set things up, what pitfalls to avoid, and how we managed to create a robust and easy to set up solution.

Continue reading

Hackathon3: iCal Integration

We conducted our 3rd SI Hackathon on Feb 19th to Feb 20th. This is one of the hackathon results.

Background

My previous Hackathon project was way too ambitious, so I decided to pick something easier this time, which would also help reduce our support load a bit. A frequent pain point is that client admins set up a review timeline, start the process, but then forget about the timeline and get taken by surprise when an important date passes, like the “editing lockdown”-date or the “360 feedback gets released”-date

There are several ways to improve visibility of scheduled dates and events, and we’re approaching this from multiple angles already, working on a visual roadmap for the dashboard and reminder notes within the application too.

But if a SI admin forgets to look into SI altogether, then those won’t help.

Continue reading