platform#

Tinka’s platform is build heavily around microservices, with most services running as Docker container. Sites are either deployed as container or utilize Cloudflare Pages.

Learn more about Cloudflare Pages

Runtime#

An overview of components and layers that make up our runtime.

AWS#

Infrastructure is deployed in AWS where we operate many accounts, each separated in their function or role. Low-level AWS CloudFormation is managed in datacenter-provisioning, which is synced with an AWS Code-Commit repository managed mostly by our service provider.

The larger bits are managed through Terraform in aws-provisioning where we apply a generic GitHub workflows-based provisioning model through PR reviews. Several smaller projects come with their own repository, such as s3-provisioning or ecr-provisioning. While also making use of GitHub workflows and PR requirements.

Consul#

At the heart of our API platform we find HashiCorp’s consul, a distributed key/value store and services catalog. It is accessible over HTTP and used to store service configuration and endpoint locations.

Mesos#

The container orchestration engine. Mesos serves as the workload cluster, operating in a master+agents model and using ZooKeeper to manage leader elections. The leader uses the agents to schedule and execute processes, which we use to run Docker containers. Agents are bare AWS EC2 nodes, and upon joining the cluster they offer their resources to host workloads.

Marathon#

The application scheduler. Deployment of services goes through Marathon, where it sends the required configuration to Mesos which then spawns the requested number of instances. An service or website is referred to as application, with a predefined number of required instances. These instances are individual containers, each running somewhere in the Mesos cluster on some agent, exposed by their respective IP-address and port number.

This information is dynamically picked up by a registrator process running on every Agent, and stored in Consul. It is used to update the Consul service catalog, and used to (re-)configure the proxy process on each Agent such that it knows where every endpoint of any given application is available.

Prometheus#

The centralized metrics store, uses the Consul catalog to find instances of services and scrapes their /metrics endpoints roughly every minute. These metrics are stored on disk and available through the web-interface or in Grafana.

Alerting is handled here as well, through the prometheus-alertmanager. Alerts are configured in alert-provisioning, where they are collected and published to Consul using Jenkins. Any changes will automatically cause Prometheus and the Alert Manager to reload their configuration, which is written to disk from a local consul-template process.

Kafka#

Kafka is used to store and process events coming in from producers and read by consumers. It uses topics to store data.

Elasticsearch#

This is an AWS managed document-database used for logging.

Buildtime#

Code sits in GitHub and we generally apply a Pull Request-driven work process, where changes are created in branches from where they may be tested and reviewed.

Branches#

Repositories typically use a workflow to build containers, which are shipped to a private AWS Elastic Container Registry (ECR) repository. When in a Pull Request (PR), a comment will be placed indicating the push to ECR has been performed. The artifact can now be deployed to an environment using Jenkins.

Deployments#

Jenkins uses blaze-cli to write service configuration into Consul and to update the application configuration in Marathon, after which a blue-green-style deployment will be performed. Marathon will launch a new set of containers, assess their health and once the new instances are healthy it removes the old instances.

When deploying to a development environment, integration tests are automatically launched through the blaze-marathon-event-dispatcher-service and their results will be available in #qa-reporting.

Learn more about testing

When a deployment is started, a message will promptly show in either #dev-deployments or #prod-deployments in Slack, through the blaze-deployment-logging-service.

Releases#

The release-please workflow manages the release process, carefully looking after commits that hit the master or main branch. Any code commit adhering to the Conventional Commits style will be picked up, and automatically merged in a release branch that uses semver-type versioning. Here it will once again receive automated testing and a new entry will be added to the CHANGELOG.md.

The release branch will be visible in a PR and upon merging will result in a tagged artifact plus a corresponding git-tag. After merging a release, any new commit landing in master will trigger creation of a new release branch and subsequent PR. Properly incrementing the version is done based on the commit found, as per Conventional Commits.