Validations

This section explores several areas that can be used to improve the quality of programmatic changes. These areas are well established concepts in the software development world, and apply to Infastructure as Code in a similar fashion. It is worth to note that many of these areas are addressed in commercial offers from Cisco such as Nexus Dashboard Insights. Features such as pre-change validation and delta analysis can be used to verify business intent and whether the configuration meets your requirements. That being said, a plethora of open-source tools are available that can provide users with a set of tools to develop their own set of validations.

Linting, semantic and syntactical validation

To improve the quality of your code it is recommended to verify that the meaning of user input is both valid and meaningful. This means ensuring that all user input is relevant, accurate, and appropriate for the intended purpose.

Linting is a process of analyzing code for programmatic and stylistic errors. It helps to identify potential issues such as syntax errors, missing semicolons, excessive whitespace, and formatting inconsistencies. Linting can help make sure that code follows a standard style and complies with best practices, making it easier to read and debug. By automating certain parts of the process, linting can help developers save time and write better quality code.

There are many ways to lint code, from online tools such as yamllint.com, that lets users copy & paste code, or the more commonly used CLI-driven tools. A common linter for yaml is the yamllint python package, which can be installed through:

pip install yamllint

yamllint is also packaged for all major operating systems, see installation examples (dnf, apt-get...) in the documentation.

Running yamllint . within a directory will lint all *.yaml files, and show any errors:

yamllint .                                                         %
./tenant_PROD.nac.yaml
  24:43     error    trailing spaces  (trailing-spaces)
  25:14     error    syntax error: expected <block end>, but found '<block sequence start>' (syntax)
  26:15     error    wrong indentation: expected 15 but found 14  (indentation)

Note that if you run this from a directory that has the NAC terraform modules downloaded (for example after a terraform init), yamllint will lint all files, recursively. Meaning this will also lint any yaml files in the .terraform directory. You can exclude files by creating a new configuration file .yamllint:

---
yaml-files:
  - '*.yaml'
  - '*.yml'
  - '.yamllint'

ignore: |
  .terraform

When using a git repository it is advised to include these steps in a pre-commit hook. This is a client side action that will run each time that you commit a change. To do that you can create a .pre-commit-config.yaml file. Note that it is possible to write your own scripts but one of the advantages of pre-commit is that you can leverage a large ecosystem of hooks made by other people. Many other examples for pre-commit can be found at https://pre-commit.com/hooks.html

Create .pre-commit-config.yaml with the following content:

repos:
-   repo: https://github.com/adrienverge/yamllint.git
    rev: v1.28.0
    hooks:
      - id: yamllint

To make use of this configuration, you must first install and initialize pre-commit:

> pip install pre-commit
> pre-commit install
pre-commit installed at .git/hooks/pre-commit

Note that the pre-commit install must be run from the root of the repository.

And make sure to add .pre-commit-config.yaml with git add:

git add .pre-commit-config.yaml

Next time you run git commit, the hook will initiate yamllint:

> git commit -m "updating configuration"
yamllint.................................................................Failed
- hook id: yamllint
- exit code: 1

data/tenant_PROD.nac.yaml
  29:43     error    trailing spaces  (trailing-spaces)
  30:14     error    syntax error: expected <block end>, but found '<block sequence start>'
  31:15     error    wrong indentation: expected 15 but found 14  (indentation)

The downside of pre-commit hooks is that they run exclusively on your system. If a contributor to your project does not have the same hooks installed, they may commit code that violates your pre-commit hooks. If you use GitHub you can integrate pre-commit hooks in your CI workflow. At the moment of writing, this is only available for GitHub. Visit https://pre-commit.com for more information. For GitLab CI users it is possible to run a job to check whether the pre-commit hooks were properly applied. Below is an example of adding a linting stage to your gitlab-ci.yml.

yamllint:
  stage: linting
  image: registry.gitlab.com/pipeline-components/yamllint:latest
  script:
    - yamllint data/

Semantic validation is the process of checking the meaning behind data to ensure accuracy and correctness. This can involve validating data against a set of rules, making sure it conforms to certain expectations. Syntactic validation is the process of validating data against a set of predetermined rules. This ensures that information entered into a system meets the requirements for it to be accepted and processed correctly. Syntactic validation can involve using regular expressions to check for specific patterns, or comparison operators to check if values meet certain criteria.

The open-source nac-validate python tool may be used to perform syntactic and semantic validation of YAML files. Syntactic validation is done by basic YAML syntax validation (e.g., indentation) and by providing a Yamale schema and validating all YAML files against that schema. Semantic validation is done by providing a set of rules (implemented in Python) which are then validated against the YAML data. Every rule is implemented as a Python class and should be placed in a .py file located in the --rules path.

Each .py file must have a single class named Rule. This class must have the following attributes: id, description and severity. It must implement a classmethod() named match that has a single function argument data which is the data read from all YAML files. It should return a list of strings, one for each rule violation with a descriptive message.

Python 3.10+ is required to install nac-validate. It can be installed using pip:

pip install nac-validate

It may also be integrated via a pre-commit hook with the following .pre-commit-config.yaml, assuming the default values are used (.schema.yaml, .rules/).

repos:
  - repo: https://github.com/netascode/nac-validate
    rev: v1.0.0
    hooks:
      - id: nac-validate

In case the schema or validation rules are located somewhere else, the required CLI arguments can be added like this:

repos:
  - repo: https://github.com/netascode/nac-validate
    rev: v1.0.0
    hooks:
      - id: nac-validate
        args:
          - '--non-strict'
          - '-s'
          - 'my_schema.yaml'
          - '-r'
          - 'rules/'

An example .schema.yaml can be found here.

When validating your *.yaml code against the above .schema.yaml, you can perform checks. For example, if you change the Bridge Domain setting subnets: list(include('ten_bridge_domain_subnets'), required=False) in the default .schema.yaml to subnets: list(include('ten_bridge_domain_subnets'), required=True), this will make sure that a subnet is always added to a Bridge Domain. If the subnet is omitted, the validation will fail:

> nac-validate --non-strict data/ 
ERROR - Syntax error 'data/tenant_PROD.nac.yaml': apic.tenants.0.bridge_domains.1.subnets: Required field missing
ERROR - Syntax error 'data/tenant_PROD.nac.yaml': apic.tenants.0.bridge_domains.2.subnets: Required field missing

Note that the --non-strict flag is added above, which allows unexpected elements in the .yaml files. In other words, this means that it is not required to have a matching check in .schema.yaml for each resource defined in the .yaml files.

More complex logic can also be applied to run semantic validation. Below is an example subnet_overlap.py to avoid creating overlapping subnets:

import ipaddress

class Rule:
    id = "100"
    description = "Verify VRF subnet overlap"
    severity = "HIGH"

    @classmethod
    def match(cls, data):
        results = []
        try:
            for tenant in data["apic"]["tenants"]:
                for vrf in tenant["vrfs"]:

                    # get a list of all bridge domain subnets of a vrf
                    subnets = []
                    for bd in tenant["bridge_domains"]:
                        if bd["vrf"] == vrf["name"]:
                            for subnet in bd.get("subnets", []):
                                subnets.append(
                                    ipaddress.ip_network(subnet["ip"], strict=False)
                                )

                    # check subnet overlap with every other subnet
                    for idx, subnet in enumerate(subnets):
                        if idx + 1 >= len(subnets):
                            break
                        for other_subnet in subnets[idx + 1 :]:
                            if subnet.overlaps(other_subnet):
                                results.append(
                                    "apic.tenants.bridge_domains.subnets.ip - {}.{}.{}".format(
                                        tenant["name"], vrf["name"], subnet
                                    )
                                )

        except KeyError:
            pass
        return results

The following error is returned when overlapping subnets have been specified in any of the *.yaml files in the data/ folder:

> nac-validate --non-strict data/
ERROR - Semantic error, rule 100: Verify VRF subnet overlap (['apic.tenants.bridge_domains.subnets.ip - prod.prod.prod-vrf.10.1.201.0/24'])

For more example rules, click here.

Pre-change Validation using NDI

Users of Nexus Dashboard Insights (NDI) can leverage this solution for automated pre-change validation. Hundreds of different checks have been codified and can be used to validate your changes on-demand. Compared to the --rules argument in nac-validate, there is no need to write Python code that contain your checks. NDI will notify you of any anomalies imposed by your new configuration. Besides looking at potential configuration errors, NDI provides a framework that enables users to write their own configuration and compliance rules. Each time a pre-change validation is run, that set of requirements is evaluated. Consider the following examples:

In this example, Endpoint Group (EPG) web and db must be able to communicate:

In this example all bridge domains need to be configured with at least one private subnet:

Compliance requirements in NDI drastically reduce the time required to write tests, in order to meet business requirements when driving automated changes.

Using the commandline tool Nexus-PCV, you can automate Pre-Change Validations in NDI. Nexus-PCV can either work with provided JSON file(s) or a terraform plan output from a Network-as-Code project. A planned change can be validated before applying it to a production environment by running a terraform plan operation first and then providing the output to nexus-pcv to trigger a pre-change validation.

The tool can easily be integrated with CI/CD workflows. Arguments can either be provided via command line or environment variables. The tool will exit with a non-zero exit code in case of an error or non-suppressed events being discovered during the pre-change analysis. The --output-summary and --output-url arguments can be used to write a summary and/or a link (URL) to a file, which can then be embedded or parsed into notifications (e.g., Webex).

Python 3.10+ is required to install nexus-pcv. Nexus-pcv can be installed using pip:

pip install nexus-pcv

To create a plan output that can be used by nexus-pcv in order to create a PCV in NDI, the following can be run:

> terraform plan -out=plan.tfplan
> terraform show -json plan.tfplan > plan.json
> nexus-pcv --hostname-ip 10.0.0.1 --username admin --password Cisco123 --group <yoursitegroup> --site <yourfabric> --name pcv123 --nac-tf-plan plan.json --output-summary output-summary.txt --output-url output-url.txt

This will trigger a new PCV in NDI:

After a few minutes you can evaluate the results:

Alternatively you can refer to the output-summary.txt to see if any anomalies have been found based on your intended configuration.

Testing (post-change validation)

The ACI fabric uses a policy model to combine data into a health score. Health scores can be aggregated for a variety of areas such as for the system, infrastructure, tenants, applications, or services. Every configured object is also retrievable via the APIC REST API, which can be used to compare this against the intended configuration. This makes verifying that changes were succesful very simple. This is a much more efficient way than parsing the output of SSH commands. You might want to make this an automated step to validate your changes have been succesful. To do that you can make use of nac-test tool. nac-test is a CLI tool that renders and executes Robot Framework tests using Jinja templating. Robot Framework is a generic test automation framework for acceptance testing. Combining Robot's language agnostic syntax with the flexibility of Jinja templating allows dynamically rendering a set of test suites from the desired infrastructure state expressed in YAML syntax.

All data from the YAML files (--data option) will first be combined into a single data structure which is then provided as input to the templating process. Each template in the --templates path will then be rendered and written to the --output path. If the --templates path has subfolders, the folder structure will be retained when rendering the templates.

Python 3.10+ is required to install nac-test. nac-test can be installed with pip:

pip install nac-test

Example output of running nac-test from the cli:

> nac-test --data ./data --output ./tests/results/aci --templates ./tests/   23s
Robot Framework remote server at 127.0.0.1:8270 started.
Storing .pabotsuitenames file
2023-09-08 12:26:36.921185 [PID:73054] [0] [ID:2] EXECUTING Aci.Templates.Ntp
2023-09-08 12:26:36.922841 [PID:73055] [1] [ID:1] EXECUTING Aci.Templates.Nodes
2023-09-08 12:26:36.923017 [PID:73057] [2] [ID:0] EXECUTING Aci.Templates.Bgp Rr
2023-09-08 12:26:39.902193 [PID:73054] [0] [ID:2] PASSED Aci.Templates.Ntp in 2.9 seconds
2023-09-08 12:26:39.916829 [PID:73057] [2] [ID:0] PASSED Aci.Templates.Bgp Rr in 2.9 seconds
2023-09-08 12:26:40.310553 [PID:73055] [1] [ID:1] PASSED Aci.Templates.Nodes in 3.3 seconds
22 tests, 22 passed, 0 failed, 0 skipped.
===================================================
Output:  /path/tests/results/aci/output.xml
XUnit:   /path/tests/results/aci/xunit.xml
Log:     /path/tests/results/aci/log.html
Report:  /path/tests/results/aci/report.html
Stopping PabotLib process
Robot Framework remote server at 127.0.0.1:8270 stopped.
PabotLib process stopped
Total testing: 9.10 seconds
Elapsed time:  3.99 seconds

The example template folder used for testing can be found here. This repository contains multiple .robot files that serve as an example to start writing your own tests. Examples are being provided for retrieving health-scores, configuration and functional tests (for example to verify that the NTP configuration resulted into nodes being synced with the specified NTP server.)

Consider the following fabric registration test:

*** Settings ***
Documentation   Verify Fabric Nodes
Suite Setup     Login APIC
Default Tags    apic   day1   config   node_policies
Resource        ./apic_common.resource

*** Test Cases ***
# Verify node fabric registration
{% for node in apic.node_policies.nodes | default([]) %}

{% if node.role != 'apic' %}
Verify fabric registration for Node-{{ node.id }}
    ${r}=   GET On Session   apic   /api/mo/uni/controller/nodeidentpol/nodep-{{ node.serial_number }}.json
    Should Be Equal Value Json String   ${r.json()}   $..fabricNodeIdentP.attributes.nodeId   {{ node.id }}
    Should Be Equal Value Json String   ${r.json()}   $..fabricNodeIdentP.attributes.podId   {{ node.pod }}

{% endif %}
{% endfor %}

With the following example node_policies.yaml:

---
apic:
  node_policies:
    inb_endpoint_group: inb
    oob_endpoint_group: default

    nodes:
      - id: 101
        pod: 1
        role: leaf
        serial_number: FDOAAAA9JB
        name: leaf-101
        oob_address: 10.61.124.141/24
        oob_gateway: 10.61.124.1
        update_group: MG1
        fabric_policy_group: all-leafs
        access_policy_group: all-leafs

      - id: 102
        pod: 1
        role: leaf
        serial_number: FDAAAAA9V8
        name: leaf-102
        oob_address: 10.61.124.152/24
        oob_gateway: 10.61.124.1
        update_group: MG2
        fabric_policy_group: all-leafs
        access_policy_group: all-leafs

Running nac-test will start by merging all inventory files together, so that all nodes within the input folder become accessible in apic.node_policies.nodes. The robot test will then loop the tests for each node with the for loop contained in {% for node in apic.node_policies.nodes | default([]) %}. Because this test is only applicable to nodes that are not APIC, these are filtered out using {% if node.role != 'apic' %}.

Robot makes use of a construct called keywords. Think of a keyword as a single step. Just as a test is conceptually made up of many steps, a robot test is made up of many keywords. Several keywords may be leveraged by different tests and have been made available in apic.common.resource so that the logic to handle authentication against APIC is defined only once. This logic can then be leveraged by each test. The Suite Setup calls the Login APIC keyword which creates an authenticated session against APIC.

Verify fabric registration for Node-{{ node.id }} will run for each instance of node.id found in the inventory. Leveraging the authenticated session against APIC, the GET On Session keyword can be used to GET a particular object and its attributes. For this particular node test, objects are created for each node with their unique serial number. This information is retrieved from node.serial_number. The reply from APIC is then stored in variable ${r} which can be used to evaluate the configuration. The Should Be Equal Value Json String custom keyword is used to compare attributes found in the payload of the reply against the provided inventory values. In this case, the nodeId and podId are being compared against the values of node.id and node.pod. The tests will only succeed if these values match.

Running the test will result in a report.html in the specified output folder. The results of each test can be observed in detail:

The same logic can be used to validate other types of configuration. Robot has an extensive library of pre-defined keywords, such as the ability to SSH into a device, ping and much more.

If you are not quite sure which objects to leverage for testing, you can save configuration from the APIC UI to understand which objects are of interest. Navigating to https://your-apic-url/model-doc will provide documentation for each object and their attributes. Alternatively this is also available publicly, at https://developer.cisco.com/site/apic-mim-ref-api/.

This method of inventory driven testing can also be used for brownfield environments. This can be useful for any existing configuration that is not deployed using Terraform with the Network-as-Code module for ACI. As long as the inventory follows the structure provided by the data-model, it can be used as input for testing.

Gitlab provides a useful, native integration with these tests. The XUnit.xml output can be used to view the results directly in Gitlab. See the following url for more documentation.

Linting, semantic and syntactical validation​

Pre-change Validation using NDI​

Testing (post-change validation)​

Linting, semantic and syntactical validation

Pre-change Validation using NDI

Testing (post-change validation)