The Software-Defined Network (SDN) feature in Proxmox VE enables the creation of virtual zones and networks (VNets). This functionality simplifies advanced networking configurations and multitenancy setup.
The Proxmox VE SDN allows for separation and fine-grained control of virtual guest networks, using flexible, software-controlled configurations.
Separation is managed through zones, virtual networks (VNets), and subnets. A zone is its own virtually separated network area. A VNet is a virtual network that belongs to a zone. A subnet is an IP range inside a VNet.
Depending on the type of the zone, the network behaves differently and offers specific features, advantages, and limitations.
Use cases for SDN range from an isolated private network on each individual node to complex overlay networks across multiple PVE clusters on different locations.
After configuring an VNet in the cluster-wide datacenter SDN administration interface, it is available as a common Linux bridge, locally on each node, to be assigned to VMs and Containers.
The Proxmox VE SDN stack has been available as an experimental feature since 2019 and has been continuously improved and tested by many developers and users. With its integration into the web interface in Proxmox VE 6.2, a significant milestone towards broader integration was achieved. During the Proxmox VE 7 release cycle, numerous improvements and features were added. Based on user feedback, it became apparent that the fundamental design choices and their implementation were quite sound and stable. Consequently, labeling it as ‘experimental’ did not do justice to the state of the SDN stack. For Proxmox VE 8, a decision was made to lay the groundwork for full integration of the SDN feature by elevating the management of networks and interfaces to a core component in the Proxmox VE access control stack. In Proxmox VE 8.1, two major milestones were achieved: firstly, DHCP integration was added to the IP address management (IPAM) feature, and secondly, the SDN integration is now installed by default.
The current support status for the various layers of our SDN installation is as follows:
Core SDN, which includes VNet management and its integration with the Proxmox VE stack, is fully supported.
IPAM, including DHCP management for virtual guests, is in tech preview.
Complex routing via FRRouting and controller integration are in tech preview.
Since Proxmox VE 8.1 the core Software-Defined Network (SDN) packages are installed by default.
If you upgrade from an older version, you need to install the libpve-network-perl package on every node:
apt update apt install libpve-network-perl
|
|
Proxmox VE version 7.0 and above have the ifupdown2 package installed by default. If you originally installed your system with an older version, you need to explicitly install the ifupdown2 package. |
After installation, you need to ensure that the following line is present at the end of the /etc/network/interfaces configuration file on all nodes, so that the SDN configuration gets included and activated.
source /etc/network/interfaces.d/*
The DHCP integration into the built-in PVE IP Address Management stack currently uses dnsmasq for giving out DHCP leases. This is currently opt-in.
To use that feature you need to install the dnsmasq package on every node:
apt update apt install dnsmasq # disable default instance systemctl disable --now dnsmasq
The Proxmox VE SDN stack uses the FRRouting project for advanced setups. This is currently opt-in.
To use the SDN routing integration you need to install the frr-pythontools package on all nodes:
apt update apt install frr-pythontools
Then enable the frr service on all nodes:
systemctl enable frr.service
Configuration is done at the web UI at datacenter level, separated into the following sections:
The Options category allows adding and managing additional services to be used in your SDN setup.
Controllers: For controlling layer 3 routing in complex setups
DHCP: Define a DHCP server for a zone that automatically allocates IPs for guests in the IPAM and leases them to the guests via DHCP.
IPAM: Enables external for IP address management for guests
DNS: Define a DNS server integration for registering virtual guests' hostname and IP addresses
The Proxmox VE Software-Defined Network implementation uses standard Linux networking as much as possible. The reason for this is that modern Linux networking provides almost all needs for a feature full SDN implementation and avoids adding external dependencies and reduces the overall amount of components that can break.
The Proxmox VE SDN configurations are located in /etc/pve/sdn, which is shared with all other cluster nodes through the Proxmox VE configuration file system. Those configurations get translated to the respective configuration formats of the tools that manage the underlying network stack (for example ifupdown2 or frr).
New changes are not immediately applied but recorded as pending first. You can then apply a set of different changes all at once in the main SDN overview panel on the web interface. This system allows to roll-out various changes as single atomic one.
The SDN tracks the rolled-out state through the .running-config and .version files located in /etc/pve/sdn.
A zone defines a virtually separated network. Zones are restricted to specific nodes and assigned permissions, in order to restrict users to a certain zone and its contained VNets.
Different technologies can be used for separation:
Simple: Isolated Bridge. A simple layer 3 routing bridge (NAT)
VLAN: Virtual LANs are the classic method of subdividing a LAN
QinQ: Stacked VLAN (formally known as IEEE 802.1ad)
VXLAN: Layer 2 VXLAN network via a UDP tunnel
EVPN (BGP EVPN): VXLAN with BGP to establish Layer 3 routing
The following options are available for all zone types:
The nodes which the zone and associated VNets should be deployed on.
Use an IP Address Management (IPAM) tool to manage IPs in the zone. Optional, defaults to pve.
DNS API server. Optional.
Reverse DNS API server. Optional.
DNS domain name. Used to register hostnames, such as <hostname>.<domain>. The DNS zone must already exist on the DNS server. Optional.
This is the simplest plugin. It will create an isolated VNet bridge. This bridge is not linked to a physical interface, and VM traffic is only local on each the node. It can be used in NAT or routed setups.
The VLAN plugin uses an existing local Linux or OVS bridge to connect to the node’s physical interface. It uses VLAN tagging defined in the VNet to isolate the network segments. This allows connectivity of VMs between different nodes.
VLAN zone configuration options:
The local bridge or OVS switch, already configured on each node that allows node-to-node connection.
QinQ also known as VLAN stacking, that uses multiple layers of VLAN tags for isolation. The QinQ zone defines the outer VLAN tag (the Service VLAN) whereas the inner VLAN tag is defined by the VNet.
|
|
Your physical network switches must support stacked VLANs for this configuration. |
QinQ zone configuration options:
A local, VLAN-aware bridge that is already configured on each local node
The main VLAN tag of this zone
Allows you to choose between an 802.1q (default) or 802.1ad service VLAN type.
Due to the double stacking of tags, you need 4 more bytes for QinQ VLANs. For example, you must reduce the MTU to 1496 if you physical interface MTU is 1500.
The VXLAN plugin establishes a tunnel (overlay) on top of an existing network (underlay). This encapsulates layer 2 Ethernet frames within layer 4 UDP datagrams using the default destination port 4789.
You have to configure the underlay network yourself to enable UDP connectivity between all peers.
You can, for example, create a VXLAN overlay network on top of public internet, appearing to the VMs as if they share the same local Layer 2 network.
|
|
VXLAN on its own does does not provide any encryption. When joining multiple sites via VXLAN, make sure to establish a secure connection between the site, for example by using a site-to-site VPN. |
VXLAN zone configuration options:
A list of IP addresses of each node in the VXLAN zone. This can be external nodes reachable at this IP address. All nodes in the cluster need to be mentioned here.
Instead of manually defining all the peers, use a Fabric for automatically generating the peer list.
Because VXLAN encapsulation uses 50 bytes, the MTU needs to be 50 bytes lower than the outgoing physical interface.
The EVPN zone creates a routable Layer 3 network, capable of spanning across multiple clusters. This is achieved by establishing a VPN and utilizing BGP as the routing protocol.
The VNet of EVPN can have an anycast IP address and/or MAC address. The bridge IP is the same on each node, meaning a virtual guest can use this address as gateway.
Routing can work across VNets from different zones through a VRF (Virtual Routing and Forwarding) interface.
EVPN zone configuration options:
A VXLAN-ID used for dedicated routing interconnect between VNets. It must be different than the VXLAN-ID of the VNets.
The primary EVPN-controller to use for this zone. The primary controller is used for auto-deriving the VTEP IPs, based on the peer definition. For more information see the controller section.
Additional EVPN controllers to use for this zone. Controllers listed here will announce the L2VPN EVPN routes for this zone.
Anycast MAC address that gets assigned to all VNets in this zone. Will be auto-generated if not defined.
Nodes that shall be configured as exit gateways from the EVPN network, through the real network. The configured nodes will announce a default route in the EVPN network. Optional.
If you use multiple exit nodes, force traffic through this primary exit node, instead of load-balancing on all nodes. Optional but necessary if you want to use SNAT or if your upstream router doesn’t support ECMP.
This is a special option if you need to reach a VM/CT service from an exit node. (By default, the exit nodes only allow forwarding traffic between real network and EVPN network). Optional.
Announce the full subnet in the EVPN network. If you have silent VMs/CTs (for example, if you have multiple IPs and the anycast gateway doesn’t see traffic from these IPs, the IP addresses won’t be able to be reached inside the EVPN network). Optional.
Don’t suppress ARP or ND (Neighbor Discovery) packets. This is required if you use floating IPs in your VMs (IP and MAC addresses are being moved between systems). Optional.
Allows you to import a list of external EVPN route targets. Used for cross-DC or different EVPN network interconnects. Optional.
Because VXLAN encapsulation uses 50 bytes, the MTU needs to be 50 bytes less than the maximal MTU of the outgoing physical interface. Optional, defaults to 1450.
After creating a virtual network (VNet) through the SDN GUI, a local network interface with the same name is available on each node. To connect a guest to the VNet, assign the interface to the guest and set the IP address accordingly.
Depending on the zone, these options have different meanings and are explained in the respective zone section in this document.
|
|
In the current state, some options may have no effect or won’t work in certain zones. |
VNet configuration options:
An up to 8 character ID to identify a VNet
More descriptive identifier. Assigned as an alias on the interface. Optional
The associated zone for this VNet
The unique VLAN or VXLAN ID
Enables vlan-aware option on the interface, enabling configuration in the guest.
Sets the isolated flag for all guest ports of this interface, but not for the interface itself. This means guests can only send traffic to non-isolated bridge-ports, which is the bridge itself. In order for this setting to take effect, you need to restart the affected guest.
|
|
Port isolation is local to each host. Use the VNET Firewall to further isolate traffic in the VNET across nodes. For example, DROP by default and only allow traffic from the IP subnet to the gateway and vice versa. |
A subnet define a specific IP range, described by the CIDR network address. Each VNet, can have one or more subnets.
A subnet can be used to:
Restrict the IP addresses you can define on a specific VNet
Assign routes/gateways on a VNet in layer 3 zones
Enable SNAT on a VNet in layer 3 zones
Auto assign IPs on virtual guests (VM or CT) through IPAM plugins
DNS registration through DNS plugins
If an IPAM server is associated with the subnet zone, the subnet prefix will be automatically registered in the IPAM.
Subnet configuration options:
A CIDR network address, for example 10.0.0.0/8
The IP address of the network’s default gateway. On layer 3 zones (Simple/EVPN plugins), it will be deployed on the VNet.
Enable Source NAT which allows VMs from inside a VNet to connect to the outside network by forwarding the packets to the nodes outgoing interface. On EVPN zones, forwarding is done on EVPN gateway-nodes. Optional.
Add a prefix to the domain registration, like <hostname>.prefix.<domain> Optional.
Some zones implement a separated control and data plane that require an external controller to manage the VNet’s control plane.
Currently, only the EVPN zone requires an external controller.
The EVPN, zone requires an external controller to manage the control plane. The EVPN controller plugin configures the Free Range Routing (frr) router.
To enable the EVPN controller, you need to enable FRR on every node, see install FRRouting.
EVPN controller configuration options:
A unique BGP ASN number. It’s highly recommended to use a private ASN number (64512 – 65534, 4200000000 – 4294967294), as otherwise you could end up breaking global routing by mistake. Every EVPN controller configured on a node must have the same ASN configured.
A Fabric that contains all the nodes part of the EVPN zone. Will be used as the underlay network.
An IP list of all nodes that are part of the EVPN zone. (could also be external nodes or route reflector servers)
A list of nodes where this controller should be active. This can be used to configure EVPN peering sessions on specific nodes only.
EVPN controllers use the peer group name VTEP by default. When defining multiple EVPN controllers, each additional controller needs to specify a custom peer group name to be used in the FRR configuration.
Allows configuring the type of BGP session manually. The auto mode preserves the old behavior of the SDN stack: EVPN controllers were iBGP, unless there was a BGP controller configured. In that case the EVPN controller re-used the BGP session from the BGP controller. With this option, the type of BGP session can be overridden manually - allowing for explicitly configuring the type of BGP session that should be established with the EVPN controller.
Increase the number of hops to reach peers, in case they are not directly connected or they use loopback. (only for BGP Mode external).
Every router can be member of exactly one Autonomous System, so any iBGP sessions on a given node (irregardless of address family) need to have the same ASN configured. Any configuration that has two iBGP sessions with different ASNs is rejected and an error thrown, since it is by definition not possible to have two iBGP sessions with different ASNs on the same BGP instance, as one instance can only have one local ASN.
The local ASN of the FRR instance is auto-derived based on the settings in the SDN controllers:
If only a BGP or EVPN controller exists (but not both), then the respective ASN from the controller is used.
For backwards-compatibility reasons, the SDN stack checks if there are any EVPN controllers in auto mode. If a BGP controller exists, then the ASN from the BGP controller will be used in any case.
Otherwise, if an EVPN controller is configured, then the ASN from the EVPN controller is used - otherwise the ASN from the BGP controller.
The SDN stack utilizes the local-as directive together with the no-prepend replace-as option in the FRR configuration for handling multiple ASN numbers in the configuration. This is the case when the BGP and EVPN controller have different ASNs configured and auto mode is disabled.
The VTEP IPs for the VXLAN devices of a zone will be auto-derived based on the (primary) controller defined in the EVPN zone:
If a BGP or IS-IS controller with a loopback interface is used, then the IP from the loopback interface will be used as VTEP IP.
If the peer list of the controller contains an IP that is locally configured on the node, then this IP will be used as VTEP IP.
Otherwise, the EVPN controller will check the route in the default routing table for the first peer IP and use the source IP address specified in the routing table as VTEP IP. This can cause issues if the route gets announced dynamically e.g. via a fabric. In that case it is necessary for the kernel routing table to contain the required entry when applying the configuration for the EVPN controller.
The BGP controller is not used directly by a zone. You can use it to configure FRR to manage BGP peers.
For BGP-EVPN, it can be used to define a different ASN by node, so doing EBGP. It can also be used to export EVPN routes to an external BGP peer.
|
|
By default, for a simple full mesh EVPN, you don’t need to define a BGP controller. |
BGP controller configuration options:
The node of this BGP controller
A unique BGP ASN number. It’s highly recommended to use a private ASN number in the range (64512 - 65534) or (4200000000 - 4294967294), as otherwise you could break global routing by mistake.
A list of peer IP addresses you want to communicate with using the underlying BGP network.
If your peer’s remote-AS is different, this enables EBGP.
Use a loopback or dummy interface as the source of the EVPN network (for multipath).
Increase the number of hops to reach peers, in case they are not directly connected or they use loopback.
Allow ECMP if your peers have different ASN.
The ISIS controller is not used directly by a zone. You can use it to configure FRR to export EVPN routes to an ISIS domain.
ISIS controller configuration options:
The node of this ISIS controller.
A unique ISIS domain.
A Unique ISIS network address that identifies this node.
A list of physical interface(s) used by ISIS.
Use a loopback or dummy interface as the source of the EVPN network (for multipath).
Fabrics in Proxmox VE SDN provide automated routing between nodes in a cluster. They simplify the configuration of underlay networks between nodes to form the foundation for SDN deployments.
They automatically configure routing protocols on your physical network interfaces to establish connectivity between nodes in the cluster. This creates a resilient, auto-configuring network fabric that adapts to changes in network topology. These fabrics can be used as a full-mesh network for Ceph or in the EVPN controller and VXLAN zone.
The FRR implementations of OpenFabric and OSPF are used, so first ensure that the frr and frr-pythontools packages are installed:
apt update apt install frr frr-pythontools
To view the configuration of an SDN fabric users need SDN.Audit or SDN.Allocate permissions. To create or modify a fabric configuration, users need SDN.Allocate permissions. To view the configuration of a node, users need the Sys.Audit or Sys.Modify permissions. When adding or updating nodes within a fabric, additional Sys.Modify permission for the specific node is required, since this operation involves writing to the node’s /etc/network/interfaces file.
To create a Fabric, head over to Datacenter→SDN→Fabrics and click "Add Fabric". After selecting the preferred protocol, the fabric is created. With the "+" button you can select the nodes which you want to add to the fabric, you also have to select the interfaces used to communicate with the other nodes.
You can specify a CIDR network range (e.g., 192.0.2.0/24) as a loopback prefix for the fabric. When configured, the system will automatically verify that all router-IDs are contained within this prefix. This ensures consistency in your addressing scheme and helps prevent addressing conflicts or errors.
Each node in a fabric needs a unique router-ID, which is an IPv4 address in dotted decimal notation (e.g., 192.0.2.1). In OpenFabric this can also be an IPv6 address in the typical hexadecimal representation separated by colons (e.g., 2001:db8::1428:57ab). A dummy interface with the router-ID as address will automatically be created and will act as a loopback interface for the fabric (it’s also passive by default).
For every fabric, an access-list and a route-map are automatically created. These configure the router to rewrite the source address of outgoing packets. When you communicate with another node (for example, by pinging it), this ensures that traffic originates from the local dummy interface’s IP address rather than from the physical interface. This provides consistent routing behavior and proper source address selection throughout the fabric.
IPv6 is currently only usable on OpenFabric fabrics. These IPv6 Fabrics need global IPv6 forwarding enabled on all nodes contained in the fabric. Without IPv6 forwarding, non-full-mesh fabrics won’t work because the transit nodes don’t forward packets to the outer nodes. Currently there isn’t an easy way to enable IPv6 forwarding per-interface like with IPv4, so it has to be enabled globally. This can be accomplished by appending this line:
post-up sysctl -w net.ipv6.conf.all.forwarding=1
to a fabric interface in the /etc/network/interfaces file. This will enable IPv6 forwarding globally once that interface comes up. Note that this affects how your interfaces handle automatic IPv6 setup (SLAAC), Neighbour Advertisements, Router Solicitations, and Router Advertisements. More details here: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt under net.ipv6.conf.all.forwarding.
OpenFabric is a routing protocol specifically designed for data center fabrics. It’s based on IS-IS and optimized for the spine-leaf topology common in data centers.
This is the name of the OpenFabric fabric and can be at most 8 characters long.
IPv4 CIDR network range (e.g., 192.0.2.0/24) used to verify that all router-IDs in the fabric are contained within this prefix.
IPv6 CIDR network range (e.g., 2001:db8::/64) used to verify that all router-IDs in the fabric are contained within this prefix.
|
|
For IPv6 fabrics to work, global forwarding needs to be enabled on all nodes. Check Notes on IPv6 for how to do it and additional info. |
Controls how frequently (in seconds) hello packets are sent to discover and maintain connections with neighboring nodes. Lower values detect failures faster but increase network traffic. This option is global on the fabric, meaning every interface on every node in this fabric will inherit this hello-interval property. The default value is 3 seconds.
Sets how frequently (in seconds) the node synchronizes its routing database with neighbors. Lower values keep the network topology information more quickly in sync but increase network traffic. This option is global on the fabric, meaning every interface on every node in this fabric will inherit this property. The default value is 10 seconds.
Select the node which will be added to the fabric. Only nodes that currently are in the cluster will be shown.
A unique IPv4 address used to generate the OpenFabric Network Entity Title (NET). Each node in the same fabric must have a different Router-ID, while a single node must use the same NET address across all fabrics (If this is not given Proxmox VE will automatically choose one and ensure that the configuration is valid).
A unique IPv6 address used to generate the OpenFabric Network Entity Title (NET). Each node in the same fabric must have a different Router-ID, while a single node must use the same NET address across all fabrics. If a IPv4 and IPv6 address is configured, the IPv4 one will be used to derive the NET.
|
|
When using IPv6 addresses, the last 3 segments are used to generate the NET. Ensure these segments differ between nodes. |
Specify the interfaces used to establish peering connections with other OpenFabric nodes. Preferably select interfaces without pre-assigned IP addresses, then configure addresses in the IPv4/IPv6 column if needed. A dummy "loopback" interface with the router-id is automatically created.
The following optional parameters can be configured per interface when enabling the additional columns:
A IPv4 that should get automatically configured on this interface. Must include the netmask (e.g. /31)
A IPv6 that should get automatically configured on this interface. Must include the netmask (e.g. /127).
Defines how many missed hello packets constitute a failed connection. Higher values make the connection more resilient to packet loss but slow down failure detection. The default value is 10.
|
|
When you remove an interface with an entry in /etc/network/interfaces that has manual set, then the IP will not get removed on applying the SDN configuration. |
OSPF (Open Shortest Path First) is a widely-used link-state routing protocol that efficiently calculates the shortest path for routing traffic through IP networks.
IPv4 CIDR network range (e.g., 192.0.2.0/24) used to verify that all router-IDs in the fabric are contained within this prefix.
This specifies the OSPF area identifier, which can be either a 32-bit signed integer or an IP address. Areas are a way to organize and structure OSPF networks hierarchically, with Area 0 (or 0.0.0.0) serving as the backbone area.
Optional list of source protocols whose routes are imported into OSPF. Each entry picks a source (bgp, connected, kernel or static) and may additionally reference a route map to filter or transform routes before they are advertised. Useful for example to leak BGP fabric routes into OSPF, or to advertise locally connected networks without configuring them per interface.
Select the node which will be added to the fabric. Only nodes that are currently in the cluster will be shown.
A unique Router-ID used to identify this router within the OSPF network. Each node in the same fabric must have a different Router-ID.
Specify the interfaces used to establish peering connections with other OSPF nodes. Preferably select interfaces without pre-assigned IP addresses, then configure addresses in the IPv4 column if needed. A dummy "loopback" interface with the router-id is automatically created.
The following optional parameter can be configured per interface:
A IPv4 that should get automatically configured on this interface. Must include the netmask (e.g. /31)
|
|
When you remove an interface with an entry in /etc/network/interfaces that has manual set, then the IP will not get removed on applying the SDN configuration. |
|
|
The dummy interface will automatically be configured as passive. Every interface which doesn’t have an ip-address configured will be treated as a point-to-point link. |
WireGuard can be used for establishing a VPN between Proxmox VE nodes and / or external nodes. It does not provide dynamic routing by itself, but can be used in conjunction with dynamic routing protocols operating on layer 3 and above (OSPF, BGP) to provide a dynamically routed, encrypted transport, for example to carry EVPN or VXLAN traffic.
|
|
In order to use WireGuard, the package wireguard-tools needs to be installed. |
Configuration options:
This is the name of the WireGuard fabric and can be at most 8 characters long.
If this is set, then WireGuard will send an empty authenticated packet every N seconds to each configured peer. This can help keeping connections alive when using stateful firewalls or NAT.
There are two types of nodes: internal and external. Internal nodes are Proxmox VE nodes, external nodes everything else. They are essentially reusable peer definitions that can be used across the whole cluster.
This is the IP or hostname that other Proxmox VE nodes should use for connecting to this Proxmox VE node. This is used together with the endpoint port when configuring this Proxmox VE node as a peer.
This is the UDP port that other Proxmox VE nodes should use for connecting to this Proxmox VE node. If unset, it defaults to 51820.
A comma-separated list of CIDRs (for example 198.51.100.0/24, or 0.0.0.0/0 to tunnel all IPv4 traffic). When selecting this node as a peer on other nodes, the list is used as the AllowedIPs setting in the WireGuard peer configuration, that is, both the source filter for incoming traffic and the routing destination for outgoing traffic to this node.
The name of the external node.
The public key used by the external node.
The IP or hostname that other nodes use to connect to this external peer.
The UDP port that other nodes use to connect to this external peer. If unset, it defaults to 51820.
A comma-separated list of CIDRs (for example 198.51.100.0/24, or 0.0.0.0/0 to tunnel all IPv4 traffic). When selecting this node as a peer on other nodes, the list is used as the AllowedIPs setting in the WireGuard peer configuration, that is, both the source filter for incoming traffic and the routing destination for outgoing traffic to this node.
The name of the network interface on the Linux host. One to eight characters that must start with an alphanumeric character; the remaining characters may be alphanumeric or hyphens.
The IPv4 address in CIDR notation that should be configured on this interface (for example 198.51.100.1/24).
The IPv6 address in CIDR notation that should be configured on this interface (for example 2001:db8::1/64).
The listening port for this interface.
A list of peers that should be configured for that interface. All nodes that are part of the fabric can be selected as peers - the peer definition will be auto-generated from the configuration in the node.
When defining an interface, Proxmox VE automatically generates a private key for it in /etc/pve/priv/wg-keys.cfg upon saving the interface, and stores the matching public key alongside the interface in the fabric configuration so it can be shown in the Web UI when editing the node. Removing an interface from a node removes the corresponding private key from wg-keys.cfg as part of the same API call.
The fabric automatically generates routes through the WireGuard interface for every peer’s allowed IPs. For example, if interface wg0 has two peers with 198.51.100.0/24 and 203.0.113.0/24 as allowed IPs, then routes for both subnets are created. If the peer references the interface of a Proxmox VE node, that interface’s configured IP address is additionally added as a host route to the allowed IPs (for example, a peer node configured with 192.0.2.10/24 gets 192.0.2.10/32 added).
The fabric will autogenerate routes in the kernel routing table for all allowed IPs of a peer. By setting this option, no routes will be inserted into the kernel routing table.
Prefix lists and route maps let you control which routes are exchanged between the Proxmox VE nodes and their BGP, EVPN, OSPF, or OpenFabric peers, and modify attributes of those routes on the way in or out. The two work together: a prefix list classifies routes by their destination CIDR, while a route map chains match conditions and actions into an ordered policy.
If you have used ip prefix-list and route-map on FRR, Cisco IOS, or Juniper before, the Proxmox VE entities behave the same way. The Proxmox VE layer manages the configuration as cluster-wide entities under /etc/pve/sdn/ and translates them to FRR prefix-list and route-map stanzas when applying the SDN configuration. The merged FRR configuration ends up in /etc/frr/frr.conf on each node; any node-local additions in /etc/frr/frr.conf.local are appended on top.
Like the rest of the SDN configuration, edits go to a pending state first and become active only after you apply changes in the main SDN panel.
A prefix list is an ordered, named list of entries that each match a single destination CIDR (or a range of prefix lengths within that CIDR) and either permit or deny it. FRR evaluates entries in order of their sequence number and stops at the first match; if no entry matches, the prefix list denies implicitly, same as plain FRR.
Prefix lists are managed under Datacenter -> SDN -> Prefix Lists. Each list has a name and contains one or more entries with the following properties:
Order in which the entry is evaluated within the list. Must be unique per list and at least 1. If left empty on creation, Proxmox VE auto-assigns the next free sequence number using the convention max + 5, leaving room for later insertions.
Either permit or deny. There is an implicit deny at the end of each list.
Destination network in CIDR notation, for example 192.0.2.0/24 or 2001:db8::/32. The prefix 0.0.0.0/0 (and ::/0) is allowed and matches the default route.
Optional minimum prefix length. If set, the entry matches any subnet of prefix whose length is at least ge.
Optional maximum prefix length. If set, the entry matches any subnet of prefix whose length is at most le.
Without le or ge an entry matches the prefix exactly. With both set, it matches any subnet within the given length range. For example 10.0.0.0/8 ge 16 le 24 matches every /16 to /24 subnet inside 10.0.0.0/8. This is the standard FRR/Cisco semantics. Additional validation enforced by Proxmox VE: ge must not exceed le, neither may be smaller than the prefix’s own length, and both must stay within the address family bound (/32 for IPv4, /128 for IPv6).
|
|
The names only_default, only_default_v6, and loopbacks_ips are reserved. The BGP and EVPN controllers generate these automatically and they cannot be used as user-defined prefix list names. |
A route map is an ordered list of entries that match incoming or outgoing routes against criteria, optionally modify attributes such as the metric or local preference, and either permit (forward the route, possibly modified) or deny (drop the route). FRR evaluates entries in order; the first match controls the action. Every route map implicitly has an otherwise empty deny as its last entry, denying all routes that did not match an entry explicitly.
Route maps are managed under Datacenter -> SDN -> Route Maps. Each entry has the following properties:
Order in which the entry is evaluated, between 0 and 65535. Mandatory and must be unique within a route map. Leaving gaps between orders, for example assigning entries 10, 20, 30, leaves room for later insertions without renumbering.
Either permit or deny. permit lets the route pass (after any set actions); deny drops it.
One or more match conditions. Available keys are: route-type, vni, ip-address-prefix-list, ip6-address-prefix-list, ip-next-hop-prefix-list, ip6-next-hop-prefix-list, ip-next-hop-address, ip6-next-hop-address, metric, local-preference, peer, and tag. The four *-prefix-list keys reference a prefix list by name. tag matches a 32-bit route tag attribute, the same value that the tag set action assigns; this lets one entry mark routes with a tag and a later entry (in the same or another map) act on them. All match conditions of an entry must be true for it to apply.
One or more attributes to set when the entry matches. Available keys are ip-next-hop, ip-next-hop-peer-address, ip-next-hop-unchanged, ip6-next-hop, ip6-next-hop-peer-address, ip6-next-hop-prefer-global, local-preference, tag, weight, metric, and src. The metric value supports N, +N, -N (adjust), rtt, +rtt, -rtt, igp, and aigp. The tag value accepts an integer or the literal untagged.
Controls how evaluation continues after a match. Leave empty for the default behavior (terminate on match). on-match-next proceeds with the immediately following entry. on-match-goto <order> jumps to the first entry whose order is greater than or equal to the given value. continue <order> jumps to the entry with exactly that order. All three correspond to the FRR on-match / continue directives.
Optionally call another route map at the end of this entry’s processing. This corresponds to the FRR call directive.
|
|
A route map referenced by a controller cannot be deleted while it is still in use. Delete the reference from the controller first. The names MAP_VTEP_IN, MAP_VTEP_OUT, correct_src, and any name starting with pve_ are reserved for internally generated route maps. |
The BGP and EVPN controllers each take an incoming and an outgoing route map. Incoming maps run against routes received from a peer before installing them; outgoing maps run against routes before announcing them. Both fields are optional and selected from the route maps configured in the SDN.
The OSPF and OpenFabric fabrics take a Route Filter option that references a prefix list. When set, only routes whose destinations pass the prefix list are installed in the kernel routing table. Without an explicit filter, the fabric generates a default filter from the fabric’s own IPv4 Prefix and IPv6 Prefix CIDRs (an FRR access-list permitting those networks); configure a route filter when you need a different policy, for example to allow a default route learned from a border node.
Allow only the IPv4 default route, for example as a route filter on a node that should learn just an upstream default:
# /etc/pve/sdn/prefix-lists.cfg prefix-list: default-only entries action=permit,prefix=0.0.0.0/0,seq=10
Allow every subnet between /16 and /24 inside 10.0.0.0/8, and deny anything else within the 10.0.0.0/8 block:
prefix-list: internal-subnets entries action=permit,prefix=10.0.0.0/8,ge=16,le=24,seq=10 entries action=deny,prefix=10.0.0.0/8,le=32,seq=20
Outgoing BGP policy that lowers local-preference to 50 on routes matched by the internal-subnets prefix list, and otherwise leaves routes untouched. Each route map entry is its own section, with the order encoded as a suffix on the section ID:
# /etc/pve/sdn/route-maps.cfg route-map-entry: bgp-out_10 action permit match key=ip-address-prefix-list,value=internal-subnets set key=local-preference,value=50 route-map-entry: bgp-out_20 action permit
The trailing permit entry with no match is the FRR pattern for "allow everything else"; without it the implicit deny at the end of the route map would drop all unmatched routes.
BGP (Border Gateway Protocol) can be used as an eBGP unnumbered fabric. Each node has its own Autonomous System Number (ASN) and peers with its neighbors over physical interfaces without requiring IP addresses on the fabric links.
Configuration options:
IPv4 CIDR network range (e.g., 192.0.2.0/24) used to verify that all node IPv4 addresses (BGP router-IDs) are contained within this prefix.
IPv6 CIDR network range (e.g., 2001:db8::/64) used to verify that all node IPv6 addresses in the fabric are contained within this prefix.
Enable Bidirectional Forwarding Detection on all peering sessions in this fabric. BFD provides fast failure detection for links between nodes.
A prefix list applied to the fabric peer-group’s inbound direction. When set, this prefix list replaces the implicit "must be within the fabric prefix" filter. Useful to widen the accepted range (e.g. accept additional non-fabric prefixes redistributed via the fabric) or to use a more expressive match. References a prefix list defined under SDN.
A route map chained from the fabric peer-group’s implicit inbound filter via FRR’s call action. Only prefixes that have already passed the implicit prefix check (or the Route Filter override) reach this route map. Useful to transform or further narrow accepted routes (e.g. adjust community or local-preference), but cannot widen the accepted set. Use Route Filter to override the implicit filter. References a route map defined under SDN.
A route map applied to the fabric peer-group’s outbound direction. Used to transform or filter what is announced to fabric neighbors. References a route map defined under SDN.
Which other-protocol routes the BGP router on each node should redistribute into the fabric. Supported sources are connected, kernel, static, and ospf. An optional per-source route map can be attached to shape the redistributed routes before they are announced.
|
|
At least one of IPv4 Prefix or IPv6 Prefix must be configured. |
Options that are available on every node that is part of a fabric:
Select the node which will be added to the fabric. Only nodes that are currently in the cluster will be shown.
BGP Autonomous System Number for this node. It is recommended to use private ASN numbers (64512-65534 for 16-bit, 4200000000-4294967294 for 32-bit).
A unique IPv4 address for this node, also used as the BGP router-id. Required when the fabric has an IPv4 prefix configured; each node in the same fabric must have a different address.
IPv6 address for this node. If only an IPv6 address is configured (without an IPv4 address), the BGP router-id is derived from it via a FNV-1a hash.
Specify the interfaces used to establish peering connections with other BGP nodes. These interfaces run BGP unnumbered (no IP address assignment needed). When a node IP is configured, a dummy "loopback" interface is automatically created with that IP.
|
|
Unlike OSPF and OpenFabric, BGP unnumbered interfaces do not need IP addresses. Peering is established using IPv6 link-local addresses automatically. |
|
|
In an eBGP unnumbered fabric, directly peering nodes must have different ASNs for BGP sessions to establish. |
When configuring an EVPN controller, a BGP fabric can be selected as the underlay instead of manually specifying peer addresses. The default configuration runs the EVPN overlay (VTEP) sessions as iBGP, using the EVPN controller’s ASN for the router process. The per-node fabric ASN is automatically applied via local-as on the underlay neighbor group.
This means the EVPN controller ASN and the per-node fabric ASNs should be different. For example, with three nodes using ASNs 65001, 65002, and 65003 for the underlay, the EVPN controller could use ASN 65000 for the overlay.
Alternatively, setting BGP Mode to "external" on the EVPN controller runs the VTEP sessions as eBGP. The per-node fabric ASN is then used as the local ASN for the VTEP peer-group as well, so the VTEPs naturally peer with each other under different ASNs without needing a separate overlay ASN. The EVPN controller ASN is still used for auto-derived route-target values to keep them consistent across the fabric.
|
|
Using a BGP fabric for an EVPN underlay requires each node to have an IPv4 address, since EVPN uses it as the VTEP address. |
IP Address Management (IPAM) tools manage the IP addresses of clients on the network. SDN in Proxmox VE uses IPAM for example to find free IP addresses for new guests.
A single IPAM instance can be associated with one or more zones.
The default built-in IPAM for your Proxmox VE cluster.
You can inspect the current status of the PVE IPAM Plugin via the IPAM panel in the SDN section of the datacenter configuration. This UI can be used to create, update and delete IP mappings. This is particularly convenient in conjunction with the DHCP feature.
If you are using DHCP, you can use the IPAM panel to create or edit leases for specific VMs, which enables you to change the IPs allocated via DHCP. When editing an IP of a VM that is using DHCP you must make sure to force the guest to acquire a new DHCP leases. This can usually be done by reloading the network stack of the guest or rebooting it.
NetBox is an open-source IP Address Management (IPAM) and datacenter infrastructure management (DCIM) tool.
To integrate NetBox with Proxmox VE SDN, create an API token in NetBox as described here: https://docs.netbox.dev/en/stable/integrations/rest-api/#tokens
The NetBox configuration properties are:
The NetBox REST API endpoint: http://yournetbox.domain.com/api
An API access token
The SHA-256 fingerprint of the NetBox API. Can be retrieved with openssl x509 -in /etc/ssl/certs/netbox.crt -noout -fingerprint -sha256 when using the default certificate.
In phpIPAM you need to create an "application" and add an API token with admin privileges to the application.
The phpIPAM configuration properties are:
The REST-API endpoint: http://phpipam.domain.com/api/<appname>/
An API access token
An integer ID. Sections are a group of subnets in phpIPAM. Default installations use sectionid=1 for customers.
The DNS plugin in Proxmox VE SDN is used to define a DNS API server for registration of your hostname and IP address. A DNS configuration is associated with one or more zones, to provide DNS registration for all the subnet IPs configured for a zone.
You need to enable the web server and the API in your PowerDNS config:
api=yes api-key=arandomgeneratedstring webserver=yes webserver-port=8081
The PowerDNS configuration options are:
The REST API endpoint: http://yourpowerdnserver.domain.com:8081/api/v1/servers/localhost
An API access key
The default TTL for records
The DHCP plugin in Proxmox VE SDN can be used to automatically deploy a DHCP server for a Zone. It provides DHCP for all Subnets in a Zone that have a DHCP range configured. Currently the only available backend plugin for DHCP is the dnsmasq plugin.
The DHCP plugin works by allocating an IP in the IPAM plugin configured in the Zone when adding a new network interface to a VM/CT. You can find more information on how to configure an IPAM in the respective section of our documentation.
When the VM starts, a mapping for the MAC address and IP gets created in the DHCP plugin of the zone. When the network interfaces is removed or the VM/CT are destroyed, then the entry in the IPAM and the DHCP server are deleted as well.
|
|
Some features (adding/editing/removing IP mappings) are currently only available when using the PVE IPAM plugin. |
You can enable automatic DHCP for a zone in the Web UI via the Zones panel and enabling DHCP in the advanced options of a zone.
|
|
Currently only Simple Zones have support for automatic DHCP |
After automatic DHCP has been enabled for a Zone, DHCP Ranges need to be configured for the subnets in a Zone. In order to that, go to the Vnets panel and select the Subnet for which you want to configure DHCP ranges. In the edit dialogue you can configure DHCP ranges in the respective Tab. Alternatively you can set DHCP ranges for a Subnet via the following CLI command:
pvesh set /cluster/sdn/vnets/<vnet>/subnets/<subnet> -dhcp-range start-address=10.0.1.100,end-address=10.0.1.200 -dhcp-range start-address=10.0.2.100,end-address=10.0.2.200
You also need to have a gateway configured for the subnet - otherwise automatic DHCP will not work.
The DHCP plugin will then allocate IPs in the IPAM only in the configured ranges.
Do not forget to follow the installation steps for the dnsmasq DHCP plugin as well.
Currently this is the only DHCP plugin and therefore the plugin that gets used when you enable DHCP for a zone.
For installation see the DHCP IPAM section.
The plugin will create a new systemd service for each zone that dnsmasq gets deployed to. The name for the service is dnsmasq@<zone>. The lifecycle of this service is managed by the DHCP plugin.
The plugin automatically generates the following configuration files in the folder /etc/dnsmasq.d/<zone>:
This contains the default global configuration for a dnsmasq instance.
This file configures specific options for a subnet, such as the DNS server that should get configured via DHCP.
This file configures the DHCP ranges for the dnsmasq instance.
This file contains the MAC-address and IP mappings from the IPAM plugin. In order to override those mappings, please use the respective IPAM plugin rather than editing this file, as it will get overwritten by the dnsmasq plugin.
You must not edit any of the above files, since they are managed by the DHCP plugin. In order to customize the dnsmasq configuration you can create additional files (e.g. 90-custom.conf) in the configuration folder - they will not get changed by the dnsmasq DHCP plugin.
Configuration files are read in order, so you can control the order of the configuration directives by naming your custom configuration files appropriately.
DHCP leases are stored in the file /var/lib/misc/dnsmasq.<zone>.leases.
When using the PVE IPAM plugin, you can update, create and delete DHCP leases. For more information please consult the documentation of the PVE IPAM plugin. Changing DHCP leases is currently not supported for the other IPAM plugins.
SDN integrates with the Proxmox VE firewall by automatically generating IPSets which can then be referenced in the source / destination fields of firewall rules. This happens automatically for VNets and IPAM entries.
The firewall automatically generates the following IPSets in the SDN scope for every VNet:
Contains the CIDRs of all subnets in a VNet
Contains the IPs of the gateways of all subnets in a VNet
Contains the CIDRs of all subnets in a VNet, but excludes the gateways
Contains all DHCP ranges configured in the subnets in a VNet
When making changes to your configuration, the IPSets update automatically, so you do not have to update your firewall rules when changing the configuration of your Subnets.
Assuming the configuration below for a VNet and its contained subnets:
# /etc/pve/sdn/vnets.cfg
vnet: vnet0
zone simple
# /etc/pve/sdn/subnets.cfg
subnet: simple-192.0.2.0-24
vnet vnet0
dhcp-range start-address=192.0.2.100,end-address=192.0.2.199
gateway 192.0.2.1
subnet: simple-2001:db8::-64
vnet vnet0
dhcp-range start-address=2001:db8::1000,end-address=2001:db8::1999
gateway 2001:db8::1
In this example we configured an IPv4 subnet in the VNet vnet0, with 192.0.2.0/24 as its IP Range, 192.0.2.1 as the gateway and the DHCP range is 192.0.2.100 - 192.0.2.199.
Additionally we configured an IPv6 subnet with 2001:db8::/64 as the IP range, 2001:db8::1 as the gateway and a DHCP range of 2001:db8::1000 - 2001:db8::1999.
The respective auto-generated IPsets for vnet0 would then contain the following elements:
192.0.2.0/24
2001:db8::/64
192.0.2.1
2001:db8::1
192.0.2.0/24
2001:db8::/64
!192.0.2.1
!2001:db8::1
192.0.2.100 - 192.0.2.199
2001:db8::1000 - 2001:db8::1999
If you are using the built-in PVE IPAM, then the firewall automatically generates an IPset for every guest that has entries in the IPAM. The respective IPset for a guest with ID 100 would be guest-ipam-100. It contains all IP addresses from all IPAM entries. So if guest 100 is member of multiple VNets, then the IPset would contain the IPs from all VNets.
When entries get added / updated / deleted, then the respective IPSets will be updated accordingly.
|
|
When removing all entries for a guest and there are firewall rules still referencing the auto-generated IPSet then the firewall will fail to update the ruleset, since it references a non-existing IPSet. |
This section presents multiple configuration examples tailored for common SDN use cases. It aims to offer tangible implementations, providing additional details to enhance comprehension of the available configuration options.
Simple zone networks create an isolated network for guests on a single host to connect to each other.
|
|
connection between guests are possible if all guests reside on a same host but cannot be reached on other nodes. |
Create a simple zone named simple.
Add a VNet names vnet1.
Create a Subnet with a gateway and the SNAT option enabled.
This creates a network bridge vnet1 on the node. Assign this bridge to the guests that shall join the network and configure an IP address.
The network interface configuration in two VMs may look like this which allows them to communicate via the 10.0.1.0/24 network.
allow-hotplug ens19
iface ens19 inet static
address 10.0.1.14/24
allow-hotplug ens19
iface ens19 inet static
address 10.0.1.15/24
If you want to allow outgoing connections for guests in the simple network zone the simple zone offers a Source NAT (SNAT) option.
Starting from the configuration above, Add a Subnet to the VNet vnet1, set a gateway IP and enable the SNAT option.
Subnet: 172.16.0.0/24 Gateway: 172.16.0.1 SNAT: checked
In the guests configure the static IP address inside the subnet’s IP range.
The node itself will join this network with the Gateway IP 172.16.0.1 and function as the NAT gateway for guests within the subnet range.
When VMs on different nodes need to communicate through an isolated network, the VLAN zone allows network level isolation using VLAN tags.
Create a VLAN zone named myvlanzone:
ID: myvlanzone Bridge: vmbr0
Create a VNet named myvnet1 with VLAN tag 10 and the previously created myvlanzone.
ID: myvnet1 Zone: myvlanzone Tag: 10
Apply the configuration through the main SDN panel, to create VNets locally on each node.
Create a Debian-based virtual machine (vm1) on node1, with a vNIC on myvnet1.
Use the following network configuration for this VM:
auto eth0
iface eth0 inet static
address 10.0.3.100/24
Create a second virtual machine (vm2) on node2, with a vNIC on the same VNet myvnet1 as vm1.
Use the following network configuration for this VM:
auto eth0
iface eth0 inet static
address 10.0.3.101/24
Following this, you should be able to ping between both VMs using that network.
This example configures two QinQ zones and adds two VMs to each zone to demonstrate the additional layer of VLAN tags which allows the configuration of more isolated VLANs.
A typical use case for this configuration is a hosting provider that provides an isolated network to customers for VM communication but isolates the VMs from other customers.
Create a QinQ zone named qinqzone1 with service VLAN 20
ID: qinqzone1 Bridge: vmbr0 Service VLAN: 20
Create another QinQ zone named qinqzone2 with service VLAN 30
ID: qinqzone2 Bridge: vmbr0 Service VLAN: 30
Create a VNet named myvnet1 with VLAN-ID 100 on the previously created qinqzone1 zone.
ID: qinqvnet1 Zone: qinqzone1 Tag: 100
Create a myvnet2 with VLAN-ID 100 on the qinqzone2 zone.
ID: qinqvnet2 Zone: qinqzone2 Tag: 100
Apply the configuration on the main SDN web interface panel to create VNets locally on each node.
Create four Debian-bases virtual machines (vm1, vm2, vm3, vm4) and add network interfaces to vm1 and vm2 with bridge qinqvnet1 and vm3 and vm4 with bridge qinqvnet2.
Inside the VM, configure the IP addresses of the interfaces, for example via /etc/network/interfaces:
auto eth0
iface eth0 inet static
address 10.0.3.101/24
Configure all four VMs to have IP addresses from the 10.0.3.101 to 10.0.3.104 range.
Now you should be able to ping between the VMs vm1 and vm2, as well as between vm3 and vm4. However, neither of VMs vm1 or vm2 can ping VMs vm3 or vm4, as they are on a different zone with a different service-VLAN.
The example assumes a cluster with three nodes, with the node IP addresses 192.168.0.1, 192.168.0.2 and 192.168.0.3.
Create a VXLAN zone named myvxlanzone and add all IPs from the nodes to the peer address list. Use the default MTU of 1450 or configure accordingly.
ID: myvxlanzone Peers Address List: 192.168.0.1,192.168.0.2,192.168.0.3
Create a VNet named vxvnet1 using the VXLAN zone myvxlanzone created previously.
ID: vxvnet1 Zone: myvxlanzone Tag: 100000
Apply the configuration on the main SDN web interface panel to create VNets locally on each nodes.
Create a Debian-based virtual machine (vm1) on node1, with a vNIC on vxvnet1.
Use the following network configuration for this VM (note the lower MTU).
auto eth0
iface eth0 inet static
address 10.0.3.100/24
mtu 1450
Create a second virtual machine (vm2) on node3, with a vNIC on the same VNet vxvnet1 as vm1.
Use the following network configuration for this VM:
auto eth0
iface eth0 inet static
address 10.0.3.101/24
mtu 1450
Then, you should be able to ping between between vm1 and vm2.
The example assumes a cluster with three nodes (node1, node2, node3) with IP addresses 192.168.0.1, 192.168.0.2 and 192.168.0.3.
Create an EVPN controller, using a private ASN number and the above node addresses as peers.
ID: myevpnctl ASN#: 65000 Peers: 192.168.0.1,192.168.0.2,192.168.0.3
Create an EVPN zone named myevpnzone, assign the previously created EVPN-controller and define node1 and node2 as exit nodes.
ID: myevpnzone VRF VXLAN Tag: 10000 Controller: myevpnctl MTU: 1450 VNet MAC Address: 32:F4:05:FE:6C:0A Exit Nodes: node1,node2
Create the first VNet named myvnet1 using the EVPN zone myevpnzone.
ID: myvnet1 Zone: myevpnzone Tag: 11000
Create a subnet on myvnet1:
Subnet: 10.0.1.0/24 Gateway: 10.0.1.1
Create the second VNet named myvnet2 using the same EVPN zone myevpnzone.
ID: myvnet2 Zone: myevpnzone Tag: 12000
Create a different subnet on myvnet2`:
Subnet: 10.0.2.0/24 Gateway: 10.0.2.1
Apply the configuration from the main SDN web interface panel to create VNets locally on each node and generate the FRR configuration.
Create a Debian-based virtual machine (vm1) on node1, with a vNIC on myvnet1.
Use the following network configuration for vm1:
auto eth0
iface eth0 inet static
address 10.0.1.100/24
gateway 10.0.1.1
mtu 1450
Create a second virtual machine (vm2) on node2, with a vNIC on the other VNet myvnet2.
Use the following network configuration for vm2:
auto eth0
iface eth0 inet static
address 10.0.2.100/24
gateway 10.0.2.1
mtu 1450
Now you should be able to ping vm2 from vm1, and vm1 from vm2.
If you ping an external IP from vm2 on the non-gateway node3, the packet will go to the configured myvnet2 gateway, then will be routed to the exit nodes (node1 or node2) and from there it will leave those nodes over the default gateway configured on node1 or node2.
|
|
You need to add reverse routes for the 10.0.1.0/24 and 10.0.2.0/24 networks to node1 and node2 on your external gateway, so that the public network can reply back. |
If you have configured an external BGP router, the BGP-EVPN routes (10.0.1.0/24 and 10.0.2.0/24 in this example), will be announced dynamically.
If you have multiple gateway nodes, you should disable the rp_filter (Strict Reverse Path Filter) option, because packets can arrive at one node but go out from another node.
Create a drop-in sysctl file /etc/sysctl.d/99-rp-filter.conf:
net.ipv4.conf.default.rp_filter=0 net.ipv4.conf.all.rp_filter=0
To add IPSEC encryption on top of a VXLAN, this example shows how to use strongswan.
You`ll need to reduce the MTU by additional 60 bytes for IPv4 or 80 bytes for IPv6 to handle encryption.
So with default real 1500 MTU, you need to use a MTU of 1370 (1370 + 80 (IPSEC) + 50 (VXLAN) == 1500).
Install strongswan on the host.
apt install strongswan
Add configuration to /etc/ipsec.conf. We only need to encrypt traffic from the VXLAN UDP port 4789.
conn %default
ike=aes256-sha1-modp1024! # the fastest, but reasonably secure cipher on modern HW
esp=aes256-sha1!
leftfirewall=yes # this is necessary when using Proxmox VE firewall rules
conn output
rightsubnet=%dynamic[udp/4789]
right=%any
type=transport
authby=psk
auto=route
conn input
leftsubnet=%dynamic[udp/4789]
type=transport
authby=psk
auto=route
Generate a pre-shared key with:
openssl rand -base64 128
and add the key to /etc/ipsec.secrets, so that the file contents looks like:
: PSK <generatedbase64key>
Copy the PSK and the configuration to all nodes participating in the VXLAN network.