At RisingStack, we are highly interested in building scalable and resilient software architectures. We know that a lot of our readers share our enthusiasm, and that they want to learn more about the subject too.
To expand our blogging & training initiatives, we decided to launch a new series called Top of the Stack which focuses on architecture design, development trends & best practices for creating scalable applications.
In the first episode of Top of the Stack, we interviewed Patrick Kua, the CTO of N26, a successful banking startup from Germany.
In the second episode, we interviewed Angel Cereijo & Roberto Ansuini from Fintonic!
During our ~30 mins long conversation we discussed a wide range of topics, including the reasons behind going with Node.js, tests they run to ensure quality, the process of migrating to Kubernetes, and the way issues are handled in their architcture.
The conversation is available in a written format – no audio this time. For the transcript, move on!
To help you navigate a little, we list the topics we cover with the anchors you can use:
Fintonic Interview Transcript
Welcome everybody on the second episode of the Top of the Stack Podcast, where we are talking about services and infrastructures that developers build. I am Csaba Balogh your host, sitting with our co-host, Tamas Kadlecsik CEO of RisingStack.
Today we are are going to talk about the architecture of Fintonic – a successful Spanish startup. Fintonic is a personal finance management app, which helps users by sending them overviews and alerts about their expenses.
Fintonic is currently available in Spain a Chile and has more than 450.000 users at this point. Our guest today are Angel Cereijo – Node.js Engineering Lead and Roberto Ansuini Chief Software Architect at Fintonic.
It’s a pleasure to have you here Angel and Roberto! Can you guys please tell us more about how you became a part of Fintonic and how you started out there?
Roberto: Yes, well, this is Roberto, I started at Fintonic in October 2011 as the Development Director during the early stages of the project. We developed the base architecture for the PFM (Personal Finance Management) system, which is the core of our platform. So we had our provider, and we wanted to test what could we do with the information we obtained using the framework of our provider.
The first stages of the project were mainly the development of the aggregation and classification of financial data. Given this, we presented summarized information on our user expenses and developed an alerting system around it. We started with a very small team, in the first few weeks, it was 2 people, me and my tech lead and then we had 2 more people, one back-end developer and one front-end developer. We started with only the web application, and later on, we added the iOS and the Android application.
RisingStack: What are the main languages you use for developing the project?
Roberto: When Fintonic was launched, we started mainly with Java and the Spring framework, and later on, we started adding more features and developing the loan service where we gave users the possibility to quote a loan, a consumer loan. To do so, we partnered with a fintech named Wanna (it’s a consumer loan fintech) to integrate their products into our platform. During this time we developed the first iteration of what we called the Fintonic Integration API (finia) developed in Node.js by my teammate Angel Cereijo.
RisingStack: What made you decide to use Node.js instead of Java?
Roberto: The reason to choose Node.js for this part of our platform was because of the nature of the Integration API. It proxied and added some business logic to our partners. The deadline was very tight and Node.js allowed us to have a running MVP in a very short timeframe.
RisingStack: So basically, right now you exclusively use Node.js on the backend, right?
Roberto: We are using Node.js mainly as the core technology for what we call our Marketplace of financial products (loans, insurances, etc.)
RisingStack: Then, any other logic or infrastructural parts like payments or so are implemented in Java right now, right?
Roberto: Yes, Java is totally for the PFM (Personal Finance Management System), that is the aggregation service, the alerting, and the core functionality of Fintonic. What we are building around the core application of Fintonic is the so-called marketplace of Fintonic. This marketplace is for every product, let’s say, loans, insurances, or credit cards, debit accounts, etc.. Everything that we’ll include in here is probably going to be in Node.js.
RisingStack: I see. Do you have any shared infrastructural code between your services?
Roberto: We have some parts in Java, yes. We have main libraries for this. And we also have an automation infrastructure with Chef, and we’re doing some Ansible now where we automate the configuration of the infrastructure.
Angel: We have Sinopia or npm private repository, and we have a lot of custom packages. Some of them are just a layer of another package, and the rest of them are codes shared between the projects. We have around twenty-something custom modules.
RisingStack: About databases: What database do you operate with?
Angel: For Node.js we use MongoDB. Fintonic has been using MongoDB since it began. And for us in the Node.js part, it fits quite well. Sometimes we use Mongoose and other times we just make queries and something like that.
RisingStack: Do you use managed MongoDB or do you host it yourself?
Roberto: We have own-hosted MongoDB cluster, but we are evaluating the enterprise edition or maybe Atlas or some other cluster. So far we have maintained our own clusters on Amazon.
RisingStack: Have you had any difficulties when maintaining your cluster?
Roberto: Oh, we have learned a lot over the years, we had our pitfalls. MongoDB has improved a lot since we started using it. So far it’s been kind to us, except for little issues, but it’s okay.
RisingStack: Can you tell us what kind of communication protocols do you use between your services?
Roberto: It’s mainly HTTP REST. We tried Apache Thrift, but now we are mainly on HTTP REST.
RisingStack: I see and what were the problems with it (Thrift)?
Roberto: Ah because on the Java side we want to start using some more features that bring the Netflix Oss, that comes with the SpringCloud, so they are more suitable for HTTP REST protocols. We have a lot of services that have big latencies, and these kind of services with strong latencies are not a good fit for Thrift.
RisingStack: Do you use maybe messaging queues between your services, or only HTTP?
Roberto: We also have a RabbitMQ with AMQP protocol to communicate between services. It’s mostly for load balancing, for having control of the throughput of services and scaling workers, so yes. We have a lot of use cases right now with Rabbitmq.
RisingStack: When we built Trace at RisingStack, we’d quite often seen problems with it when it had to handle a lot of messages or maybe even store a lot of messages. So when workers ran fast enough to process the messages, and it had to write to disc, it quite often went down altogether. Have you met any problems like that or any other?
Roberto: No, no.
RisingStack: At RisingStack, our guys take testing a code very seriously and deploy only after running tests multiple times so it would be great if you could share with us how you handle testing and what kind of tests do you have in place right now.
Angel: Okay. In the Node.js part we have, I think, 90% of our code covered. We unit test our code. We run testing on the local machine and then we make a push on GitLab. And we run all the test code and also check the code style with some rules we have. So we take it very seriously. Right now we use Mocha and Chai for testing. In the front end, we have a very high-coverage, around 90%, I’d say.
RisingStack: Do you have any other kind of tests, like integration tests in-between, or smoke tests?
Angel: We use some mocked servers to test contracts, but we also have Staging environments where we test all of the services in an end-to-end manner.
RisingStack: I am not sure I understand it correctly. When you say that you test everything together, we are talking about end-to-end tests here, right?
Roberto: Yes. We have several stages.
The first one is the unit tests stage where we have this coverage we were talking about before. Then we have some tests that perform some kind of integration with other APIs. They are automated with our GitLab environment. And then, in the front-end side – as most of our applications are used on the Android and iOS applications the test are covered there. So they have some interface tests, which they used to test against our pre-production development environments.
And for frameworks, well, we don’t use that much end-to-end testing. It’s mostly manual testing right now, and we want to start doing some mobile testing maybe with some tools like the device Swarm or something like that, but it’s not yet done.
RisingStack: Let’s assume you have, say, 2 services that depend on each other. So you want to test the integration between them – the service boundary. But the downstream service also depends on another one, and so forth and so forth. So, how do you handle these cases? Do you make sure that only the 2 services in question are tested, and you mock the downstream dependencies somehow? Or do you run integration tests on full dependency trees?
Angel: We are not very mature yet.
When we have to call another service, we mock the dependency, because it’s quite tricky to start several services and run tests on them. I think we have to study more and consider how we can implement other kinds of tests.
Roberto: On the Java side we are doing some WireMocks and some mock testing, but we have to mature in that.
Angel: In the Node.js side we have a library dependency, I think it’s called Nock. (This library is used to create a mock call to services to make sure we are respecting contracts between services.) We call some endpoints with some parameter and some headers, and we can say what the response we want to get (body, HTTP code, headers) is.
RisingStack: Do you use any specific CI tools?
Roberto: Yes, we started with Jenkins, but right now we have migrated by the end of 2017, we migrated our pipelines to GitLab CI, it’s very cool, and we are happy with it. And we are doing right now, CI and CD delivery. We are building and deploying our containers in the staging environment, and we are releasing them in a container registry so we can use it locally or in any environment. That one is working quite well, we are very happy with it.
RisingStack: Can you tell us where your application is deployed?
Roberto: Right now we are using AWS. We are in Spain and also we’re in Chile. We have 2 environments for this. The one in Spain is based on EC2 instances, where we have our applications deployed with Docker. So we have our autoscaling groups, and we have load balancers and stuff. In Chile, we are testing out what we want to be our new platform which is on Docker and Kubernetes. So we just finished that project by the end of 2017. And we’re monitoring it, so we can bring it to Spain, which is a much larger platform.
RisingStack: Can you tell us a little bit about how easy or difficult was it to set up Kubernetes on AWS?
Roberto: Actually, it was quite easy. We’re using Kops. It was quite straightforward. They did a great job with developing this script. We had to figure it out, do some learning, figure out the network protocol, how to control the ingresses… It was tricky at the beginning, but once you did it a couple of times, it’s easy.
RisingStack: I see. Maybe it would be interesting to our listeners – how much time did it approximately take to set up Kubernetes?
Roberto: We deployed the project in production by the end of August, then we started doing the Gitlab CI migration in September. Then, we did a lot of work by adapting our projects so they fit in a Docker environment. Then, by the end of November and start of December we started doing the Kubernetes part. Within 1 month we had it all up an running in production.
RisingStack: Can you tell us how many developers were needed for that?
Roberto: We have 3 people in the DevOps team, and for the rest, we had the development team making some adaptations, like health checks, configurations, etc..
RisingStack: Did you face any scaling problems in your architecture? Or in the previous one?
Roberto: With the previous one, the problem was mainly versioning the deployments. How to control, what version was deployed, what wasn’t. Right now we are trying to fix this problem with the container registry and controlling the versioning of the deployments in Kubernetes. That’s how we are trying to solve all those issues.
RisingStack: What do you base the versioning of your containers on?
Roberto: We are doing several kinds. We are versioning by tagging the containers. We are also doing some versioning with some infrastructure files with Ansible. And in Kubernetes you can do some tricks to do versioning with the deployments – so, if you have different names for the deployment, you can roll out the version of the services. Each deployment has a config map associated with it and an image associated with it so you can have a deployment and a new version at any specific time. So that’s helping a lot also.
RisingStack: I see – and what is the tag of your containers?
Roberto: We are just starting off with it. When we promote the container to production we have a production tag for it, and then we do some versioning with timestamps. We are trying to do something based on the commits, so we can track the commit to the container to do versioning of the deployment.
RisingStack: What infrastructural elements or deployment strategies do you use to ensure the reliability of your product?
Roberto: Well, that’s mainly what we are doing right now. We are really trying to mature and have all the information possible of having a specific version that we can know exactly what is deployed, what configuration we had at the deployment’s time. That has given us a lot of peace of mind. So we can roll back and we can see what is running.
RisingStack: Do you automate the rollbacks or you do it by hand if there is an error?
Roberto: It’s manual to a certain point, since its done on-demand. But the process is very automated. All we have to do is configure our scripts to redeploy a given version of a container on our ASG (for the Spanish platform) or a deployment on our Kubernetes cluster (for the rest).
RisingStack: How do you prevent errors from cascading between your services – in case the barriers fail, and the error starts cascading? What kind of tools do you use to locate the root cause – like logs, metrics, and distributed tracing for example?
Roberto: We use ELK to monitor application behavior and Cloudwatch to monitor infrastructure. (Recently, after our conversation, we started using Datadog, and it looks promising.) What we basically do is monitoring the latency of the services, we have some processes that perform like a global check of the system. The alerting system consists of some automated scripts that simulate a typical workflow of data in our system. If a service fails in any part of the chain, the workflow doesn’t complete, and an alarm is triggered so we can fix it.
When a service falls down, our system handles errors, also, shows you the information that is available. So when a service comes down it’s not affecting all of the systems, but only that part of the system. So if you take it to the app, it’s maybe only one section of the app that is not loading – it’s very isolated. Basically, the microservices approach is helping us here. Also, the use of RabbitMQ and asynchronous messages with queues help us to have our system restored without losing any of the processing.
RisingStack: Did I understand correctly, that you said you have alerts for when a message goes into the system but doesn’t come out where you expect it?
Roberto: There are some checks automated like that. The example I’ve mentioned before covers this.
RisingStack: How do you track these messages?
Roberto: We have some daemons that are connected to a Rabbit queue, and they are just checking if the messages are coming through. So if the messages are coming through, we assume that the system is performing right.
RisingStack: And how do you monitor your infrastructure? What are the main metrics to monitor on your platform right now?
Roberto: It’s mainly CPU, memory, and network, it’s not so much. Also, the performance in our message rates and queued messages in RabbitMQ helps us to monitor the health of the system. We are looking to integrate some DataDog, but it’s a project that we want to do this quarter.
RisingStack: Have you considered using a distributed tracing a platform before?
Roberto: Yes we have seen a couple of frameworks, but we haven’t done anything on that.
RisingStack: We talked a lot about your past and current architecture, so we would like to know if there are any new technologies you are excited about and you are looking forward to trying out in 2018?
Roberto: The Kubernetes project is exciting for us because we started using it by the end of 2017. We want to mature it a lot; we want to do so much more automation, maybe some operations that we can integrate with Slack, with some bots, so we can automate the deployment. Especially, what we want to do is to create environments on demand. So we can have several testing environments on demand to simplify the developers work. That’s probably going to be one of the technologically interesting projects that will come. Alerts, and the delivery – doing some more automation so we can track much better which web commits go to which deployment.
Thank you very much guys for being here and telling us all these exciting things. Thank you very much, Roberto, thank you very much, Angel, for being here.