HomeTechnologyCI/CD, or how I...

CI/CD, or how I learned to stop worrying and love DevOps – Ars Technica


/ DevOps, DevOps, DevOps!

ArtemisDiana / Getty Images

One of the most important things to happen in the evolution of development over the past many years is the widespread adoption of , or CI/CD. (Sometimes the “CD” stands for “continuous delivery,” depending on who you’re talking to.)

It’s a concept that jettisons a lot of older ideas about how systems should be managed and instead gives you a way to update code and integrate changes as live rolling deployments while ensuring that the new code is tested and slots in smoothly with stuff that’s already running. A properly architected CI/CD pipeline means you can get code changes into production faster and with fewer errors. But what does that look like in practice?

It looks like Ars Technica, because we’ve adopted a CI/CD workflow to take full advantage of the flexibility afforded us by serverless cloud hosting. Welcome to part three of our four-part series on how we host Ars—here, we’re going to swing away from the “ops” side of “DevOps” and peer more closely at the “dev” part instead. Join us for a look behind the curtain at how Ars uses CI/CD in both our deployed applications and our infrastructure management!

Version control is not optional

For the benefit of folks who only do the “ops” part of DevOps, let’s get a working definition going for “,” as the term underpins our entire approach toward maintaining code. When we say “version control” in this context, we’re talking about a method by which we’re able to track changes made to our production codebase—that is, the repository of files that makes Ars function.

Ensuring that production codebase is subject to some form of version control is a lot like turning on “track changes” in Word: the version control system keeps a record of every change made to every file, along with a correlated list of who made the change and when it happened. Version control is a critical component of most large-scale IT projects, and in some cases, it’s even a .

But version control is a hard problem to solve, and many of the solutions that are common now—including and especially —are still relatively young. Not that long ago, there was a time I don’t remember with fondness—a time in which you edited code in a text editor and then FTP’d it to a production server. This was a low and filthy era, rife with lost changes, production crashes, and ad-hoc backups with names like oops-1997-05-21.tar.gz. To be sure, even back in those primitive days, there were bearded wizards that spoke of inscrutable technologies like CVS (, not the pharmacy), but such things were conspicuously absent from most folks’ experience within the burgeoning universe of web development.

While many of us probably recall (and with many thoughts and prayers to those of you still dealing with SVN), it wasn’t until Git that version control gained massive popularity. Why? In part because tools like Git and GitHub made it dead simple to create and maintain repositories—just , and you’re up and running. Most of the world’s developers now maintain code repositories on GitHub, and Ars is no exception.

How do changes flow from GitHub into deployed applications?

Well, we start by firing up our favorite FTP client, Transmit. I kid, I kid—but Transmit was (and apparently ) an awesome app. For real, now: We start by working on a particular branch in one of our repositories. Remember from our previous installments that Ars is composed of four main applications, each running in its own container inside AWS ECS tasks:

  • Arx: Our local Docker Compose development setup and Nginx server container
  • Acta: The main WordPress application
  • Civis: Our discussion forum software
  • Taberna: Our e-commerce and subscription system

Each of these applications has its own repository on GitHub. When large changes are made, a new branch is created. Eventually, that new feature branch will be merged into a staging branch via a pull request. After testing, staging will be merged into the main branch with another pull request.

Branches, merges, and pull requests, oh my!

In version control terms, a “branch” is simply a named deviation from the main code repository. For example, if we decided to replace the Ars logo with the letter “X”—wait, that’s too relevant, let’s say the letter “Y”—we would start by checking out a new branch along the lines of this: git checkout -b feature-l33t-new-ars-logo.

That command creates a clean copy of the main repository—one we can start messing with in our local development environment without having to worry about stepping on anything in production. Once we’ve completed an exhaustive find-and-replace to change all the places where the code says ars-technica-logo.png to y-technica-logo.png, we’re ready to “commit” the change to the branch. Committing does not make your changes live in production—instead, it’s just a way to describe your changes. In this case, our commit might be something like git commit -m "Sweet new logo for Ars - Y is the future".

Next up, we need to actually get the change from our local development environment up into the GitHub repository. In Git terms, we need to “push” our changes. This is accomplished with, perhaps unsurprisingly, the git push command. After that, the code exists in the GitHub repo, and the next step—if we feel good about what we’ve coded—is to test it in a staging environment. Doing this means we need to get our changes from the feature-l33t-new-ars-logo branch and put them somewhere—like our staging branch. There are many ways to integrate code between branches, but we like the formality of doing things with , which lets us merge our new code into an existing branch and also leaves us with good documentation around every change. A pull request could consist of one or hundreds of commits that took place during the development of the feature branch.

Git will automatically flag any “conflicts” when you merge branches. A conflict usually arises when someone has committed newer changes to the same files affected by the new feature branch. Working through large conflicts can be a nightmare, which is why it’s important to work as granularly as possible when creating new features—only check out what you need, when you need it, and be mindful of where others are working. Ideally, you introduce as little new and disruptive code as possible—and, fortunately, our new Y logo fits the bill.

/ Perfection. Nobody tell Aurich.

When code is added to key branches we have set up, like staging or main, a series of automated events are kicked off. These all start with and testing. Linting—a term that refers to a static check of one’s code and configuration files to make sure there aren’t any typos or other issues—ensures that submitted code conforms to our internal styles and practices. After linting, testing ensures that the major features and functions of the applications are all continuing to produce the desired results so that you don’t introduce unanticipated problems in other areas of your software when you push new code.

On each push, GitHub fires up a build system that runs the linting and testing commands you specify, using a simple YML configuration file and . It’s worth noting that these tests could also run inside AWS CodeBuild, but GitHub’s build system is very fast and automatically integrated with the repository user interface, so we can see at a glance what horrors we’ve introduced with our latest pulls.

/ Oops! Well, dang, I guess we’re not getting a new logo today.

Once our tests have passed, GitHub uses a to notify AWS that we’ve got some code to deploy to production, and this is where the real fun begins.

AWS CodePipeline and CodeBuild

is an AWS tool that allows you to take code from a repository and pass it around to different services within AWS (hence the “pipeline” part!) with an eye toward helping out with building and deployment tasks. Additionally, at least in our case, changes to some key GitHub repositories and branches will also kick off a process.

CodeBuild is a service that actually fires up an instance of preconfigured build environments, inside of which code will be compiled to your specs. There are . The image and build instructions are read from a buildspec.yml file in each application’s root directory. (Note that when we say “root directory” here, we mean that the buildspec.yml file is stored in the root of the GitHub repo.)

This build file tells CodeBuild what runtimes we need (e.g., “Please provide PHP 8.1 and Node.js 18”) and breaks down into a series of build phases that are if you’ve seen YAML before. These buildspec.yml files can become extremely complex and even include other build files, but our setup for the main WordPress application is fairly straightforward:

  1. Log in to our AWS (Elastic Container Repository)
  2. Use to install any defined PHP packages we need
  3. Use to install and build any frontend packages, minify our Javascript, etc.
  4. Delete a lot of unnecessary files (we don’t need that .gitignore or README in production, do we?)
  5. Run the docker build command, which will create our image per a project-specific
  6. And finally, push this new image to the container repository, where it can be later pulled down by ECS

Whew! That sounds like a lot (and it is!), but it’s mostly a “design once and rarely edit” setup. There’s more involved with tagging and storing Docker images in ECR, but this gives you the general overview. (You can think of ECR like our own isolated version of .) Once CodeBuild is done, it returns its results back to CodePipeline, which will continue its execution.

So now we’ve committed code, tested it, built our final Docker image with all our application files copied over, and pushed that image to ECR, where it can be accessed by a simple docker pull command. What’s next? From our first installment, we know our newly built Docker images need to eventually end up living on one of our Fargate ECS clusters as containers in a “task,” but how, exactly?

/ This is how we do it. (Mostly.)

Enter AWS CodeDeploy and blue/green deployments

If nothing else has piqued your interest about this approach to CI/CD, I think may be the “Excelsior!” moment. It certainly was for me. Imagine: Even after running successful tests and builds, you eventually get to a point where you must flip a switch, finally saying goodbye to your stable production environment and puckering all orifices while you wait for the new deployment to come online and take over production work.

This was the way things were for us a few years ago—production changes were made by deploying code using a makeshift Debian repository (which was kind of a genius approach—hat tip to Ars developer emeritus Lee Aylward). Even with Fargate managing the tasks, a lot can go wrong. An errant incorrect environment variable, something wrong with the underlying provisioning service, an unintended change that passed all the tests but still breaks the site—there are a million potential problems.

Beholding the ghosts of past deployments failed. Anything could go wrong—<em>

Beholding the ghosts of past deployments failed. Anything could go wrong—“…an undigested bit of beef, a blot of mustard, a crumb of cheese, a fragment of underdone potato…”

Blue/green is one of multiple strategies provided out-of-the-box by , and this is where it saves the day. With blue/green deployment, a completely new target group containing your updated container(s) will be spun up in parallel to your existing production application, containing an identical number of tasks but accessible through an alternate port on the application load balancer. Thus, you can peruse the new stuff on the alternate port to your heart’s content until you’re completely satisfied that it’s ready for public traffic. From here, it’s a simple flip of the switch to swap public traffic to the new target group. This is one part of our CI/CD process that we do manually, for obvious reasons.

For folks who require a car analogy to properly grok a tech concept (I see you raising your hand back there, Lee!), think of blue/green deployments as being a bit like the DevOps take on a —the transition between gears goes a lot faster in a DCT because the next gear change is already being handled by the other clutch while you’re still accelerating, just as we get our replacement environment built and brought to hot standby while the current environment runs. When it’s time to shift—gears or production containers!—the changeover happens much more quickly because all the work to accomplish the transition, along with the validation that it will be successful, has already happened. You just failover between clutches in the DCT and between “blue” and “green” environments for Ars.

There are competing deployment strategies, to be sure—such as slowly shifting traffic to your new setup over time. Personally, I love blue/green for high-volume websites because your commitment is zero until you’re happy with how everything is running. Even then, you have the option to roll your deployment back to the previous version, which—as long as you haven’t terminated its tasks—is still running in parallel. Failing back, if one needs to do it, takes only seconds. The entire process greatly reduces anxiety associated with deploying large changes, which any developer will be able to relate to.

While we’re talking anxiety, nothing has caused more insomnia among developers than the state of one’s infrastructure. (“It’s 10 pm—do you know where your servers are?”) We’ll discuss next how we’ve managed to reduce stress on that front.

IaC (“infrastructure as code”)

“” is an idea that has been around for a long time in one form or another. Think about all the wild things we have to do to get a single web server up and running from scratch—maybe, for annoying legacy reasons, you need to edit /etc/hosts to include local machines in a cluster. You have to use apt or yum or whatever to install all the right software, then add specific configurations to Apache or Nginx.

If you stand up enough servers, you’ll likely end up with your own runbook (mental or physical), but the steps—and all the variations thereof that you likely have to keep track of for edge cases—can be a lot to keep track of. The problem gets worse if you’re standing up, say, fifty web servers instead of just one. And what if you also needed to initialize remote databases and configure routers, cache servers, and search appliances? And what if you needed to do the same thing again in an isolated testing environment, too?

This is why the concept of “infrastructure as code” exists. The idea is to take all the crazy things we do to get infrastructure up and running and reduce them to abstracted, readable code that can be used again and again. And, if you’re feeling particularly jaunty, you keep that abstracted, readable code under version control so you can see exactly how it changes over time—and so you can roll back to a previous revision when something goes wrong.

Of course, as you might anticipate, changing our approach from “infrastructure as infrastructure” to “infrastructure as code” also requires changing our toolset. I’ve done plenty of work in the past with tools like , , and Chef (now which doesn’t quite have the same ring to it), the last of which ran Ars Technica’s infrastructure for many years. While the AWS console is quite lovely to look at, and I do spend a great deal of time staring at it, I would not want to use it alone to configure a complex infrastructure. And even if a single environment might be manageable that way, that flies out the window when you add two, three, or a hundred more. That’s where IaC becomes a necessity: it allows you to create infrastructure in a repeatable way in the cloud or otherwise.

Terraform

The tool we use at Ars to manage all our infrastructure is called Terraform. Anyone doing web development from the mid-2000s on has undoubtedly spent time with an excellent piece of software called , which made dealing with different virtual machine providers like VMBox or Parallels a much simpler prospect. Simply run vagrant up with your configuration file, and viola—no need to struggle with VMBox’s weird Java GUI. Vagrant was () a product from Hashicorp, so named for its founder Mitchell Hashimoto. And perhaps unsurprisingly, Hashicorp is also the creator of Terraform.

is a tool that takes a series of simple, descriptive configuration files, built in a language called HCL (HashiCorp Configuration Language), and turns them into instructions for erecting infrastructure on many (Alibaba Cloud, anyone?). One of the key features of Terraform is that this instruction set is , which means—practically speaking—you can execute it against an infrastructure and expect it to only make changes if you’ve really altered something, no matter how many times you run it. That’s a relief when small alterations can wreak havoc.

When you run the command terraform plan, you’ll be told precisely what resources will be created, destroyed, or modified in place before issuing any commands to AWS. This is incredibly useful when your cloud environment contains hundreds of resources. And indeed, in absolute terms, the Ars infrastructure has 295 managed AWS resources in it, despite the simplicity of the overview charts we’ve shared. Keeping track of those components and their interconnectedness using only the AWS console can be challenging—it gets extremely challenging if you’ve got a hundred or more environments, as many DevOps pros deal with.

/ The repository that stores our infrastructure definitions.

There are multiple ways to work with Terraform, and while we do use their CLI tool to validate code, we primarily use a managed application called to handle actual deployments. Much like the application CI/CD process described above, we have a very similar setup for managing infrastructure. Once again, it starts from a GitHub repository that stores all the configuration files that describe our infrastructure. From the VPC subnets to the scaling parameters on the serverless products we use, every detail is stored in a versioned repository.

Depending on the GitHub repository and branch we’ve pushed changes to, a webhook from GitHub triggers a planning process in Terraform Cloud. Terraform will automatically generate a set of proposed changes, which can be applied to your cloud environment instantly or manually. Like with production code deployments, we rely on a manual switch here because even with all the safeguards in the world, it’s possible to destroy your entire infrastructure with the click of a button. (I’m not going to say anyone named Lee has ever done this, but I’m not going to say anyone named Lee has ever not done this, either. It’s why we have backups, right?)

Some of our projects in Terraform.
/ Some of our projects in Terraform.

If Terraform HCL and template files aren’t your jam, there are many other tools to accomplish the same thing. Amazon has multiple options like or if you like the visual approach. There’s another up-and-coming tool in that I’ve really enjoyed working with. Like Terraform, Pulumi is a multi-cloud tool, but it allows you to write your infrastructure in a number of programming languages. There are lots of options out there!

Wrap up

Planning, documentation, and version-controlled code—when you come down to it, that’s what sits underneath the Ars Technica front page. All the ECS tasks and Lambdas and serverless Aurora databases in the world won’t do a thing to help you if you can’t direct them properly, and we’ve put a lot of time and effort into a setup that is both resilient and flexible—and one that can change and grow with us. Tying together CI/CD and IaC and implementing them in an API-driven cloud environment is a bit like having a magic wand that you can point at the sky and cause fully realized system designs to materialize directly from the cosmic aether—it’s a powerful operating methodology enabled by powerful tools.

“AWS offers customers a comprehensive set of CICD services to support their people and processes,” said AWS senior solutions architecture lead William Torrealba during an architecture discussion for this piece. “This allows our customers to easily and quickly implement C/ICD processes in a very cost-effective way without complex hardware management. But it is also flexible enough to extend the functionality using third party partners and tools.”

We’re small potatoes in the grand scheme of things, too. Sites like Netflix can see our monthly traffic volume in a single day (or less!), and similar CI/CD practices serve them at that scale just as well as they do for us. I can’t imagine going back to a time before these tools and processes were in place—at this point, it would be like abandoning fire. We’ve adapted our entire business around workflows enabled by these tools, and we wouldn’t be Ars Technica without them. (We would be remiss in not giving credit to former lead developer Steven Klein for helping to pull much of the Ars IaC project together. We miss you, Steve!)

Coming up next

Part four, dear readers, will be our final installment. We have a grab bag of topics to cover, including how we do DNS, a bit more about our content delivery network, and a short discussion on architectural decisions—there are lower-cost 64-bit ARM offerings available on AWS, and although moving parts of one’s infrastructure from x86-64 to ARM isn’t necessarily easy, we’ve done some preliminary investigation, and it’s not exactly difficult, either. What things lurk in our 64-bit future? Tune in next Wednesday for our series finale and see!



Source link

Most Popular

LEAVE A REPLY

Please enter your comment!
Please enter your name here

More from Author

Read Now

Team India Squad for T20 World Cup 2024 Announced: Here’s India’s official team for T20 WC – Republic World

India T20 World Cup squad announcement | Image:APTeam India's squad for the upcoming ICC T20 World Cup 2024 has been announced. On Tuesday, the selection committee led by chief selector Ajit Agarkar convened in Ahmedabad and zeroed in on a 15-member unit, which they deem is the...

Justice Minallah says state has to protect judges, independence of judiciary

Justice Athar Minallah on Tuesday said the state had to protect the judges and the judiciary’s independence as the Supreme Court took up a suo motu case pertaining to allegations of interference in judicial affairs.A six-member bench resumed...

Stock futures slip slightly as investors look ahead to Fed decision, megacap earnings: Live updates

Traders work on the floor of the New York Stock Exchange during morning trading on February 29, 2024 in New York City. Michael M. Santiago | Getty ImagesU.S. stock futures fell slightly Tuesday morning after a positive start to the week, as investors brace for megacap earnings,...

Europe’s Economic Laggards Have Become Its Leaders

Something extraordinary is happening to the European economy: Southern nations that nearly broke up the euro currency bloc during the financial crisis in 2012 are growing faster than Germany and other big countries that have long served as the region’s growth engines.The dynamic is bolstering the...

Trump’s Plans for the Fed Make No Sense, Even for Him

A second Trump administration might be very different from the first, and that includes how the president treats the Fed. Donald Trump complained a lot about the US Federal Reserve when he was president, jawboning for lower interest rates and questioning its competence. Yet at the...

Police to launch raids to find migrants to deport to Rwanda, Cabinet Minister claims

Police will mount raids to find missing migrants so they can be deported to Rwanda, a Cabinet minister has said.Health Secretary Victoria Atkins was commenting on reports that the Home Office has lost contact with thousands of people who are set to be removed from the...

The French #Metoo Scandal Unraveling in Weinstein’s Shadow

French actor Gérard Depardieu was ordered to stand trial for allegedly sexually assaulting two women on a film set three years ago, marking the latest legal escalation for the 75-year-old movie star who has become a central figure in France’s #MeToo movement.The announcement coincides with a...

Hong Kong Bitcoin and Ether ETFs Have Soft Debut

Please note that our privacy policy, terms of use, cookies, and do not sell my personal information has been updated.CoinDesk is an award-winning media outlet that covers the cryptocurrency industry. Its journalists abide by a strict set of editorial policies. In November 2023, CoinDesk was acquired...

Customization Overview | Halo Infinite CU32

Operation: Banished Honor arrives on April 30 and you’re gonna want to look the part! After all, the Banished welcome all who pledge their service to Atriox, and your new allegiance and mindset demands a new outfit, so let’s find out more about the customization that...

T20 World Cup 2024 Squads: From India To Australia, Check Here Team-Wise Full Players List, Venues, Fixture, Timings And More

ICC T20 World Cup 2024 Cricket Matches Full Schedule: The T20 World Cup 2024 promises to be an exhilarating showcase of cricketing talent from around the globe. With teams from various nations competing for the prestigious title, fans can expect intense matches filled with thrilling moments...

How the Twins’ summer sausage celebration got made: It sparked the offense, but should they eat it?

CHICAGO — With Abe Froman unavailable, I called sausage expert Elias Cairo to address Rocco Baldelli’s concerns about a potentially hazardous pre-encased meat currently residing in the Minnesota Twins clubhouse.Nearly a week after it arrived and with the package showing visible signs of wear, tear and...