Design a Cisco SD-WAN solution

2.2.a Controller Deployment Models (not on blueprint)

General information on “SDW Controller Deployment Models”:

There are three possible deployment models for the controllers:
- Cisco Cloud: vBond, vManage and vSmart are hosted in the Cisco Cloud
- MSP Cloud: vBond, vManage and vSmart are hosted in the MSP Cloud
- Private Cloud: vBond, vManage and vSmart are hosted in the Private Cloud/are deployed on-premise
Customer can choose which model fits best regarding to his company/security policies/rules
Hosting the controllers in the Cisco Cloud speeds up deployments (especially in conjunction with ZPT) and costs no additional money
Cisco recommends deploying all controllers in the cloud
Officially supported clouds as of now (march 2020) are AWS and Microsoft Azure

2.2.a Infrastructure Setup (not on blueprint)

General information on “SDW Infrastructure Setup”:

On-Premise environment deployment steps:
1. Setup and configure vManage
2. Setup and configure vBond and vSmart
3. Add controllers (vBond, vSmart) to vManage
4. Add WAN Edge devices
Recommended SD-WAN deployment steps:
1. Design the site/fabric
2. Design the topology (type of network to be formed, routes to be advertised, …)
3. Configure the policies (QoS, application-aware routing, traffic engineering, …)
4. Activate the policies
5. Configure the WAN Edge devices

2.2.a Initial Configuration Settings (not on blueprint)

General information on “SDW Initial Configuration settings”:

SD-WAN depends on signed certificates to work
Each component within the SD-WAN fabric needs 3 settings:
- System-IP: Unique identifier of a SD-WAN component. 32-bit dot decimal notation (like router-id in OSPF). Logically a VPN0 loopback interface (referred to as “system”).
- Organization Name: Must match on all components. When spaces are included the quotation "" must be used.
- Site-ID: Configured on every WAN Edge router. Identifies different locations. When the site-id matches, the same location is assumed.
vManage initial configuration:
1. Configure System (host-name, system-ip, site-id, organization, vBond address, NTP server)
2. Configure VPN 0 (set default route, set DNS, assign interface, assign IP address, “no shut” interface)
3. Configure VPN 512 (set default route, assign interface, assign IP address, “no shut” interface)
4. Commit CLI changes!!!
5. Configure Org. name and vBond in the GUI (Administration > Settings)
vBond initial configuration:
1. Configure System (host-name, system-ip, site-id, organization, vBond address w/ local keyword, NTP server)
2. Configure VPN 0 (set default route, assign IP address to interface, VPN 0 is pre-configured for WAN, Cisco recommends to disable the tunnel-interface configuration while integrating the controller because it blocks NETCONF and SCP by default, which is needed for initial exchanges between vBond and vManage)
3. Configure VPN 512 (set default route, assign interface, assign IP address, “no shut” interface)
4. Commit CLI changes!!!
vSmart initial configuration:
1. Configure System (host-name, system-ip, site-id, organization, vBond address, NTP server)
2. Configure VPN 0 (set default route, set DNS, assign interface, assign IP address, “no shut” interface)
3. Configure VPN 512 (set default route, assign interface, assign IP address, “no shut” interface)
4. Commit CLI changes!!!
After initial configuration, show control local-properties should be used for validation
Adding Controllers to vManage:
1. Add vBond and vSmart via GUI to vManage using their VPN 0 IP addresses
2. Create certificates for all controllers
3. Enable tunnel-interface command under all VPN 0 interfaces of all controllers:
  - vBond only: Set encapsulation to IPsec
4. On the vManage each controller should now be in up state on the main dashboard
5. Import WAN Edge license file
After adding controllers, show control connections should be used for validation
WAN Edge initial configuration:
1. Configure System (host-name, system-ip, site-id, organization, vBond address, NTP server)
2. Configure VPN 0 (set default route, set DNS, assign interface, assign IP address, “no shut” interface, enable tunnel-interface)
3. Verify connection to the controllers via ping
4. Install Root CA
5. Grab unused UUID/OTP from the vManage GUI and activate the WAN Edge router manually via CLI with request vedge-cloud activate chassis <UUID> token <OTP>
Unlike in IOS, all configuration must be committed at the end to save it!
Important: All interface names must be exactly as they are within the system. No abbreviations allowed or else the configuration commit won’t work/will give out an error message!

2.2.a Additional Lab Tools (not on blueprint)

General information for “SDW Additional Lab Tools”:

WAN Emulator: WANem
Realistic Traffic Generator: TRex

2.2.a i Orchestration plane (vBond, NAT)

General information on “SDW Orchestration plane (vBond, NAT)":

Orchestrates the connectivity between the management (vManage), control (vSmart) and data plane (v/cEdge)
Holds a permanent connection to the vControllers (vManage, vSmart) and a temporary one to the WAN Edge devices (vEdge/cEdge) for initial bring up
Serves as a first point of contact/authentication for vSmart and WAN Edge devices
Requires a public IP address and/or FQDN (could sit behind 1:1 NAT)
All other components (vManage, vSmart, WAN Edges) need to know the IP address and/or FQDN of vBond
Authorizes all control connections based on a white-list of “trusted devices” which consists of the serial numbers of the WAN Edge devices and can be pushed to vBond/vSmart via vManage
When an encrypted tunnel is successfully established between vBond and the WAN Edge device, the IP addresses of the vManage and vSmart nodes along with their serial numbers will be pushed out
vBond will also notify all vManage and vSmart nodes that a new WAN Edge device has joined the fabric and that they should except an incoming connection from its public IP address
If the vSmart and/or vManage controllers are behind NAT, vBond will facilitate NAT traversal
vBond can be used as on-premise ZTP server but this requires an additional vBond which only acts as that role
For this to work, tztp.viptela.com must be internally DNS-hijacked to work
An “air gap” solution (vBond w/o internet access) is possible, if needed
Clustering for high availability is possible
Multi-tenancy is supported (multiple customer using a single vBond instance)
A single vBond instance can handle up to 1500 connections
Cisco recommends 2 NICs for the vBond (1x VPN0, 1x VPN512)

2.2.a ii Management plane (vManage)

General information on “SDW Management Plane (vManage)":

The web-based Network Management System for the whole SD-WAN solution
Holds a permanent connection to the vBond orchestrator, the vSmarts and the WAN Edge devices
A single pane of glass for all day 0/1/2 operations
Used to do all provisioning/design/configuration/changes of the SD-WAN fabric and its devices
Provides real-time and history/logged information to see what is currently going on or what was previously going on within the SD-WAN fabric (= telemetry data)
Access control is role-based with a privilege system (comparable to Cisco Prime, Cisco ISE)
Supports NETCONF (to controllers/WAN Edge) and REST API (from external) for programmability/automation
Also supports CLI, SYSLOG and a read-only SNMP
Clustering for high availability is possible with each instance being active (never standby!) and requires 3 nodes
Georedundancy is achieved using database replication between sites
Multi-tenancy is supported (multiple customer using a single vManage instance) which must be set up initially and can’t be changed later
A single vManage instance can handle up to 2000 devices
vManage needs always the same or higher software version of all controllers (vBond, vSmart) and WAN Edge devices (vEdge, cEdge) or else onboarding of these devices will fail
Cisco recommends 3 NICs for the vManage (1x VPN0, 1x VPN512, 1x Cluster Message Bus)

2.2.a iii Control plane (vSmart, OMP)

General information on “SDW Control Plane (vSmart, OMP)":

The control plane of the whole SD-WAN fabric
Holds a permanent connection to the vBond orchestrator, vManage and WAN Edge devices
Used to enforce centralized policies such as QoS, application-aware routing, traffic engineering, …
Establishes peerings with all WAN Edge devices, distributes connectivity and security policies
Distributes routes based on the principle of a route-reflector (comparable to the BGP route-reflector)
If multiple vSmarts exist, each controller has an identical view of the SD-WAN fabric
This is done by establishing a full mesh of DTLS control connections as well as a full mesh of OMP sessions between the vSmarts themselves
Reduces the complexity of the entire network since the control plane is separated from the data plane
A single SD-WAN domain can have up to 20 vSmart controllers whereas each WAN Edge device connects to two of them by default (can be configured)
A single vSmart instance can handle up to 5400 connections per controller
Cisco recommends 2 NICs for the vBond (1x VPN0, 1x VPN512)

// Graphic missing - Coming soon //

2.2.a iv Data plane (vEdge/cEdge)

General information on “SDW Data Plane (vEdge/cEdge)":

WAN Edge routers and can be deployed either physically or virtual:
- vEdge = Viptela-branded physical hardware appliance (also available as virtual appliance)
- cEdge = Cisco SD-WAN image for various supported routers (eg. Catalyst 8000V, ISR1k/4k, …)
- Cisco recommends 3 NICs for the vBond (1x VPN0, 1x VPNx aka Service Side, 1x VPN512)
General WAN Edge router behavior:
- Implements data plane policies and application-aware routing
Connection between WAN Edge and vSmart:
- Secure connections to vSmarts are used for the control plane (= TLS/DTLS tunnels)
- OMP runs within the TLS/DTLS tunnels from the WAN Edge to vSmarts
- Multiple control connections to multiple vSmarts can be established (redundancy)
- WAN Edge routers can connect to up to 3 vSmarts at the same time
- Each available WAN transport creates a single control plane connection to each vSmart controller
- Connections between WAN Edge devices and vSmart controllers can be preferred/limited (eg. WAN Edges in USA prefer vSmarts in USA, vSmarts in Europe as secondary and vSmarts in Asia as third option or not at all)
Connection between WAN Edge and vManage:
- Secure connections to vManage are used for the configuration pushes and other management purposes
- Telemetry data (performance) statistics are exported to vManage automatically
Connections between WAN Edge routers themselves:
- Secure connections between WAN Edge routers themselves are used for the data plane
- By default, IPsec tunnels are used for WAN Edge to WAN Edge connection (GRE is possible)
- For each TLOC an own IPsec tunnel will be built
- IPsec tunnels are built w/o IKE since the encryption keys get exchanged between WAN Edge routers and vSmarts
- vSmarts don’t store the IPsec encryption keys but rather pass key primers
- Encryption keys are automatically negotiated by using Diffie Hellman between the WAN Edge routers
- Failure of one/several/all vSmarts won’t affect the local routing
- As long as there’s at connectivity to at least one vSmart established, there’s no impact between the WAN Edge routers themselves
- If all vSmarts fail, the connection between the WAN Edge routers will continue operating on a known good state for a configurable amount of time (no configuration changes allowed/possible in this time period)
- The data plane won’t stop until the next key rotation
- By default, a full-mesh is established between all WAN Edge routers
BFD between WAN Edge routers:
- BFD runs within the IPsec tunnel between all WAN Edge routers and can’t be disabled
- Since each TLOC has its own IPsec tunnel, each TLOC also has its own BFD session
- BFD timers should be kept in a manner of seconds, not milliseconds
- The default BFD timers are 1 seconds (hello)/7 seconds (dead)
- Cisco doesn’t recommended using absolute minimum BFD timers (50ms and multiplier of 3) to prevent traffic flip-flop
- BFD is not only used for neighbor-down detection but also for measuring path liveliness and quality (up/down, loss/latency/jitter, IPsec tunnel MTU)
WAN Edge router transport:
- If a WAN transport link fails it will automatically trigger TLOC updates
- If there are multiple WAN transports (eg. MPLS, Internet, …), then active-active load-sharing will be used by default (can be modified)
- Weighted active-active load-sharing is possible (amount configurable per device)
- Active-standby load-sharing is possible (either for everything or per-application)
- Application-aware routing (SLA compliant) is possible (eg. use MPLS for VoIP and Public-Internet for everything else)
- Optimal application experience is maintained by proactively using PMTU over all SD-WAN tunnels and the interoperation with host PMTUD process
Segmentation:
- Native end-to-end segmentation support is built in
- The term “VPN” used within SD-WAN is equal to classic Cisco VRFs
- VPN 0 (Transport VPN) = Front-door VPN (WAN transport links)
- VPN n (Service VPN) = Can be used for segmentation where n is a arbitrary number
- VPN n routes will be advertised via the OMP
- VPN 512 (Management VPN) = Out-of-band management VPN
- Interfaces and sub-interfaces (dot1q tags) are mapped to VPNs
- 62 VPNs are supported and some devices will support up to 300 VPNs in the future
- Segmentation is done using MPLS labels (MPLS label within the IPsec header) (RFC4023)
- The assigned MPLS Labels are advertised to vSmarts and from there pushed to all WAN Edge routers
- Policies are VPN-aware and can be applied globally or on a per-VPN basis
- Route leaking (like in MPLS) is possible and configured/enforced as centralized policy on the vSmarts
WAN Edge devices behind NAT:
- When having multiple WAN Edge devices behind NAT, a static port offset should be configured so that NAT can properly identify each device (default base port is 12346)
- When not configuring a static port offset, WAN Edge devices will cycle through 5 different base ports to find an unused one (12346, 12366, 12386, 12406, 12426)
- Supported NAT modes:
  - Full-Cone NAT: Classic 1-to-1 NAT, meaning 1 internal IP address = 1 external IP address. Bidirectional communication is allowed without any prerequisites.
  - (Address-)Restricted Cone NAT: Requests from the same internal IP/port are mapped to the same external IP/port. Internal device must send an IP packet to the external address first before the external device can send something the internal device (external source IP must match on incoming packets).
  - Port-restricted Cone NAT: Requests from the same internal IP/port are mapped to the same external IP/port. Internal device must send an IP packet to the external address first before the external device can send something the internal device (external source IP and source port must match on incoming packets).
  - Symmetric NAT: Requests from the same internal IP/port are mapped to a unique external IP/port for each destination. Port-restricted Cone NAT rules apply (external source IP and source Port must match on incoming packets).
Important: Symmetric NAT is only possible on one side of a WAN tunnel (only one vEdge/cEdge can be behind symmetric NAT) and a BFD tunnel cannot be established between symmetric NAT and anything else than Full-Cone NAT!
Important: In order for Symmetric NAT to work, vManage and vSmart control connections must be configured to use TLS instead of the default DTLS!