ServerlessConf Paris : The present and future of Serverless observability, by Yan Cui
The leading global conference devoted to the serverless trend, Serverlessconf is a community conference that aims to share experiences about developing applications for so-called serverless architectures. Serverlessconf will take place in Paris on 14 and 15 February. Today we choose to present you one of the inspiring speakers that will join ServerlessConf, Yan Cui and talk about Serverless monitoring and observability.
Yan is an experienced engineer who has worked with AWS for nearly 10 years. He has been an architect and lead developer with a variety of industries ranging from investment banks, e-commerce to mobile gaming. In the last 2 years he has worked extensively with AWS Lambda in production, and he has been very active in sharing his experiences and the lessons he has learnt, some of his work has even made their way into the Well-Architected whitepaper published by AWS.
Yan is polyglot in both spoken and programming languages, he is fluent in both English and Mandarin, and counts C#, F#, Scala, Node.js and Erlang amongst programming languages that he has worked with professionally. Although he enjoys learning different programming languages and paradigms, he still holds F# as his undisputed favourite.
Yan is a regular speaker at user groups and conferences internationally, and he is also the author of AWS Lambda in Motion and a co-author of F# Deep Dives. In his spare time he keeps an active blog at The Burning Monk where he shares his thoughts on topics such as AWS, serverless, functional programming and chaos engineering.
What is you experience with Serverless ?
I joined Space Ape games a year ago, and i am building the backend system for real time multiplayer games, and I use Serverless for some of this system. I don’t use serverless as much as i did in my previous job where I was working as an architect. We took a monolithic backend system and migrated it to AWS Lambda. Along the way we have learned a lot about not just simple Lambdas but also the different challenges you can run into as you have to deal with scale in terms of both complexity and traffic. You have to address new complex operational subjects, because at that point you are building microservices, which means you have to trace invocations, monitor the flow of user requests, API triggering background processing…how to understand all this movement of data. So there was a lot of work monitoring, logging, tracing, collecting and forwarding IDs…the kind of things a lot of people are running into now. Many companies have tried Serverless on one or two projects, as they try to build a new architecture on top of this new paradigm, they will run into theses problems. So the talk i will be doing at ServerlessConf is focusing on the observability side of things : what are some of the new problems that show up with serverless, specifically around the lack of control of the underlying operating system, a lot of practices that we have perfected in the microservices, what tools we use, how do we collect informations, how do we send them so that they don’t impact the critical path. How of those things are slightly different, not necessarily harder than before. There are tools right now for tracing, they are OK, but they don’t cover many of the cases you can run into (like concurrent executions limits).
What did Serverless bring to you ? Cost savings? Agility ?
For me the agility is probably the biggest win. In terms of costs, it’s great, it’s cheaper when you run at a relatively low scale, but at a certain scale it will be cheaper to run servers 24/7. But that depends on the use cases. Even if it’s more expansive with Lambda, you get to do things faster, so overall you still win.
How did it change the way you work ?
It does change the way you run you engineers team, engineers can be much more productive and you can focus on the business problems rather than operational details. I worked on so many different technologies in the past, Docker, Kubernetes and now Lambda and the shifting focus is the biggest change for me. We are focused on what are the things we should be building. It also changes team composition, where before you had lot of operation specialists, writing Terraform scripts, managing Cloudformation templates and managing infrastructures. With Lambda there are a lot of things done by AWS platform for you. It means at startups you can delay the point where you need those specialised ops skills to later. It also means the developer needs to have the mindset of an ops person. And you can spend more time prototyping, understanding your business requirements, what your users actually want. You can test much more quickly, instead of spending 3 months, you can get something out in two weeks and measure the reaction of your users, before deciding to go to full scale.
Does it give more power to developers ?
For a developer, Serverless takes care of a lot of the heavy lifting. A lot of people have been talking about event-driven architecture since a long time, but the barrier entry was high : frameworks, tools, processes for messages…all the different pieces you needed for this architecture. With Lambda you get everything out of the box, all you need is a Kinesis stream, a Lambda function…all these pieces are like Lego blocks that you put together, and you just have to write a minimal amount of code to make them work together, without having to worry about deployment or scaling. All that problems just go away. As a developer, you have access to a lot more than before.
What about monitoring tools ? Are the actual tools sufficient for what we need to do ?
Cloudwatch, IOpipe, some others vendors…they do a good job, but one common problem is that they require you to send metrics to their system during your function’s invocation, which means your customer has to wait longer while your system is sending metrics. There are also some tools that allow you to send data asynchronously. IOpipe is also working on a profiling feature, which lets you profile your code and see a flame graph to help you work out where you have performance issues inside your code. We will have more and more complex system using lambdas so we will need tools focused on tracing, understanding the entire execution flow.