BlueAllyBlueAlly
Blog

Network Validation with pyATS

Automation

TL;DR: pyATS is an automation testing framework that includes a parsing library called Genie. With over 1500 parsers available, Genie can parse device output from multiple vendors, including Cisco, Juniper, and BIG-IP. In combination with pyATS, you have a complete test suite that can provide confidence your network is healthy. 

Have you ever been asked by your manager, “Can we confirm feature ‘X’ is configured and working across every device in our network?” This may be a simple feature such as SNMPv3, or something more complex like a specific routing design. Validating your network’s operation at any time can be challenging. However, we can ease that pain with an automation testing framework, such as pyATS. 

Cisco pyATS is an automation testing framework that Cisco has built and used internally to test/validate features in their NOS platforms. It was released to the public in 2014. The core pyATS framework is still closed-source, as the code is not publicly available. However, the companion parsing library, Genie (you may also hear it called pyATS library), has been open-sourced and encourages the public to contribute. This post will focus on the pyATS framework. 

pyATS OVERVIEW 

The pyATS framework is expansive and can seem daunting with the number of features it provides. The goal of this blog post is to stay at the “10,000 ft view”. There may be further blog posts that dive deeper into the individual features. 

PyATS is a test automation framework, designed to create and run consistent tests against your infrastructure. The companion parsing library I mentioned earlier, Genie, is designed to be used with pyATS. Together, these two libraries create a complete test suite, with a testing framework and vendor-agnostic data parsers. The Genie library can be a whole separate blog post, but the one takeaway is that it is a powerful data parsing library that couples very well with pyATS, but it is separate from pyATS. 

pyATS COMPONENTS 

Now that you know that pyATS is used for building and running tests against your infrastructure, let us dive into some of the components that make up the library. 

TESTBED 

The testbed is your device inventory file. Other automation frameworks, such as Nornir and Ansible, have a similar concept. A testbed file is a YAML file that describes the devices you are running tests against. Some of the important components include the device hostname, IP address, OS type (used to dictate which Genie parsers to use), and a set of credentials to connect to the device. There are plenty more data points you can include in the testbed, such as how to connect to the device (CLI, YANG, REST). You can even describe the testing environment’s topology by defining device interfaces and how the devices connect to one another by defining links within your testbed file. If you are interested in learning more about the different data points, check out the links in the references at the end of the post. 

TESTCASE 

A testcase is a collection of smaller tests, aiming to validate a specific feature or functionality. For example, you may write a testcase to validate that BGP is up and operational. This testcase may include smaller tests that validate BGP neighbor relationships, BGP routes are present in the routing table, etc. The individual test results roll up to the testcase result. If you would like to learn more about testcases and other sections that make up a testscript, check out the links in the references at the end of the post. 

TESTSCRIPT 

A testscript is a Python file used to structure test sections. Each testscript has its own reporting and logging. Testscripts are meant to be extensible so that you can add testcases in the future. A testscript can be executed as a standalone script, with results printed to STDOUT, or as part of a job. Standalone execution is popular for rapid development but should be executed as part of a job once it is ready for production use. 

JOB 

Jobs in pyATS allow you to run multiple testscripts. Within a job, each testscript is executed as a task. Each task aggregates its logs and results in a single log file and reporter object. The logging and reporting mechanisms within a job can be a separate post. For now, just know that a task’s logs and results are aggregated when being run within a job. After a job is run, an archive is created. An archive is a zipped folder containing results files (XML/JSON formats), log files, and some additional runtime information. These archives can be useful for further results analysis. 

There are many more components that make up pyATS, but these are some of the important pieces. For more information about the other components, I highly recommend checking out the pyATS documentation (link in the references). 

USE CASES 

To get your creativity flowing, let’s take a look at a few use cases that would be great fits for utilizing pyATS. 

  1. Certifying a new network OS version
  2. Validating operational state of the network before/after a change 
  3. Running intrusive tests to ensure network resilience 

This list is not exhaustive and only used for demonstration. Let us take a quick look at each one. 

CERTIFYING A NETWORK OS VERSION 

One of the worst things that can happen when you are upgrading devices on your network is running into a bug. This bug may be obvious or rooted deep in the OS and only triggered when a specific feature is configured. Regardless, management and other stakeholders do not care that a bug was triggered. They want to know why it was not caught before rolling out OS upgrades to production devices. PyATS can provide a level of certainty that a new OS version works with the specific hardware and software features you have configured in your network. The pyATS testing framework can configure the features you care about, validate each feature’s functionality, and clean up after testing has been completed. It is an automated approach that can quickly become a de facto process before a new OS is rolled out to production. 

VALIDATING OPERATIONAL STATE OF THE NETWORK BEFORE/AFTER A CHANGE 

Validating changes on the network has been an issue as old as time. It is a part of every engineer’s change plan but can sometimes be forgotten when it comes to documentation. PyATS provides the framework to confirm the operational state of your network and has built-in reporting functionality for you to quickly figure out what validation checks have passed or failed. PyATS also provides extensive logging that captures device logs, so you will be able to provide all the proper documentation that shows you confirmed the change was successful. 

RUNNING INTRUSIVE TESTS TO ENSURE NETWORK RESILIENCE 

I would consider this to be a more advanced use case, and one that should not be attempted until you have buy-in from higher levels of management. Once you are comfortable with running read-only tests against your network, then you can begin introducing a little bit of “chaos”..” The proper name for this type of testing is “chaos engineering.” Netflix became popular for utilizing this practice through a tool they built called Chaos Monkey (https://netflix.github.io/chaosmonkey/). The idea is that random configuration is pushed to your production environment to ensure the infrastructure is resilient to failures. This random configuration may include shutting down BGP on a core router or rebooting a few devices. Whatever the chaos may be, the idea is to purposely cause failures within the infrastructure. You may be asking yourself, “Why should I perform such a cruel act against myself and my team?”. Well, the intention is for you to gain exposure to the faults within your network (and fix them!) before a catastrophic failure occurs. 

Disclaimer: I’ve never experienced chaos engineering in a production environment. I included this use case to help showcase what is possible once you have gained confidence in your infrastructure, using automation. 

SAMPLE CODE 

In December 2022, I held an internal tech talk about writing pyATS testscripts. In the demo, I built a pyATS testscript that contains a testcase for testing BGP. The BGP testcase contains the following tests: check for established BGP neighbors, shut down BGP by shutting the WAN interface, check the routing table for received BGP routes (should be none), reactivate BGP, and finally, check the routing table again for received BGP routes (should see BGP routes). The purpose of this demo was to show how we can check BGP functionality using ‘show’ commands, while changing the test environment. 

Here is a link to the code repository: https://github.com/dannywade/20221215-pyATS-Testscripts 

Feel free to open a GitHub issue to ask questions or provide feedback. 

WRAPPING UP 

We went over a lot in this blog post, and I definitely didn’t cover all the features of pyATS. The purpose of this post was to touch on some of the high-level concepts within the pyATS framework, and hopefully get you thinking about how you can introduce automated network testing into your environment. If you are interested in automated network testing and not sure where to start, please feel free to contact us and we can get you started! 

REFERENCES/BACKGROUND READING 

 

Contact BlueAlly

Connect with BlueAlly today to learn more.