Introduction
Link Aggregation Control Protocol (LACP),
LACP, as specified in IEEE 802.3ad, implements dynamic link aggregation and de-aggregation,
allowing LACP-enabled switches at both ends to exchange Link Aggregation Control Protocol Data Units (LACPDUs).
LACP provides a standard negotiation mechanism that a Huawei switch can use to create and enable the aggregated link based on its configuration.
After an aggregated link is formed, LACP is responsible for maintaining the link. LACP adjusts the link if an aggregated link's status changes.
For example, in Figure 1, four interfaces on DeviceA should be connected to the corresponding interfaces on DeviceB,
and these interfaces are all bundled into an Eth-Trunk. However, one interface on DeviceA is connected to an interface on DeviceC.
As a result, DeviceA may send data destined for DeviceB to DeviceC. In manual mode, this fault would go undetected.
In this case, if LACP is enabled on DeviceA and DeviceB, the Eth-Trunk only selects active links (links connected to DeviceB) to forward data after negotiation.
Data sent by DeviceA destined for DeviceB only reaches DeviceB.
link1
DeviceA+--------------------------------+DeviceB
| link2 |
|--------------------------------|
| link3 |
|--------------------------------|
| link4
|--------------------------------+DeviceC
The concepts in LACP include:
LACP system priority:
LACP system priorities determine the sequence in which devices at two ends of an Eth-Trunk select active interfaces to join a LAG.
In order for a LAG to be established, both devices must select the same interfaces as active interfaces.
To achieve this, one device (with a higher priority) is responsible for selecting the active interfaces.
The other device (with a lower priority) then selects the same interfaces as active interfaces. In priority comparisons,
numerically lower values have higher priority.
LACP interface priority:
LACP interface priorities affect which interfaces of an Eth-Trunk are selected as active interfaces.
A smaller numerical value represents a higher priority. The interfaces with the highest LACP interface priority become active interfaces.
M:N backup of member interfaces:
LACP mode is also called M:N mode, where M refers to the number of active links and N refers to the number of backup links.
This mode guarantees high reliability and allows traffic to be load balanced among M active links.
In Figure 2, M+N links with the same attributes (in the same LAG) are set up between two devices.
When data is transmitted over the aggregated link, traffic is load balanced among M active links, with no data transmitted over N backup links.
Therefore, the actual bandwidth of the aggregated link is the sum of the M links' bandwidth,
and the maximum bandwidth of the aggregated link is the sum of the (M+N) links' bandwidth.
If one of the M links fails, LACP selects one of the N backup links to replace the faulty link.
The actual bandwidth of the aggregated link is still the sum of M links' bandwidth,
but the maximum bandwidth of the aggregated link is the sum of the (M+N-1) links' bandwidth.
LACP Implementation
# 先根据system priority选举出Actor,然后由Actor选择active port。
LACP-enabled switches exchange LACPDUs. An LACPDU contains the LACP system priority, MAC address, LACP interface priority, interface number, and operational key.
An Eth-Trunk in LACP mode is set up as follows:
1.After member interfaces are added to an Eth-Trunk in LACP mode, both ends send LACPDUs.
2.Devices at both ends determine the Actor and active links.
In Figure 2, when DeviceB receives LACPDUs from DeviceA, DeviceB checks and records information about DeviceA and compares LACP system priorities.
If the LACP system priority of DeviceA is higher than that of DeviceB, DeviceA becomes the Actor.
If DeviceA and DeviceB have the same LACP system priority, the device with a smaller MAC address becomes the Actor.
After the Actor is selected, both devices select active interfaces based on the interface priorities of the Actor.
If priorities of interfaces on the Actor are the same, interfaces with smaller interface numbers are selected as active interfaces.
An Eth-Trunk is established when both devices select the same interfaces as active interfaces. Active links then load balance traffic.
Load Balancing
# 根据流进行balancing,两端各自进行balancing,所以收发可能走不同的link。
Because an Eth-Trunk between two devices consists of multiple physical links bundled together,
an Eth-Trunk may transmit data frames of the same data flow over different physical links.
A potential problem arises in that the second data frame may arrive at the remote device earlier than the first data frame, resulting in out-of-order packets.
To prevent out-of-order packets, Eth-Trunk uses flow-based load balancing.
This mechanism uses the hash algorithm to calculate the address in a data frame
and generate a hash key based on which the system searches for the outbound interface in the Eth-Trunk forwarding table.
Each MAC or IP address corresponds to a hash key, so the system uses different outbound interfaces to forward data.
This mechanism ensures that frames of the same data flow are forwarded on the same physical link and implements load balancing of data flows.
Flow-based load balancing ensures that data is transmitted in the correct sequence, but cannot ensure efficient bandwidth usage.
You can set the load balancing mode based on traffic models. When a parameter of traffic changes frequently,
you can set the load balancing mode based on this parameter to ensure that the traffic is load balanced evenly.
For example, if IP addresses in packets change frequently, use the load balancing mode based on dst-ip, src-ip,
or src-dst-ip so that traffic can be properly load balanced among physical links.
If MAC addresses in packets change frequently and IP addresses are fixed, use the load balancing mode based on
dst-mac, src-mac, or src-dst-mac so that traffic can be properly load balanced among physical links.
capture lacp packet using tcpdump
# Activity, Aggregation, Synchronization, Collecting(表示已经准备好接收), Distributing(准备好发送)
[root@hp-dl388g8-22 ~]# tcpdump -i ens3f1np1 ether proto 0x8809 -evvv
dropped privs to tcpdump
tcpdump: listening on ens3f1np1, link-type EN10MB (Ethernet), capture size 262144 bytes
23:57:23.097421 00:0f:53:21:68:31 (oui Unknown) > 01:80:c2:00:00:02 (oui Unknown), ethertype Slow Protocols (0x8809), length 124: LACPv1, length 110
Actor Information TLV (0x01), length 20
System 00:0f:53:21:68:30 (oui Unknown), System Priority 100, Key 15, Port 2, Port Priority 255
State Flags [Activity, Aggregation, Synchronization, Collecting, Distributing]
0x0000: 0064 000f 5321 6830 000f 00ff 0002 3d00
0x0010: 0000
Partner Information TLV (0x02), length 20
System b0:33:a6:2c:45:40 (oui Unknown), System Priority 127, Key 6, Port 2, Port Priority 127
State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing]
0x0000: 007f b033 a62c 4540 0006 007f 0002 3f00
0x0010: 0000
Collector Information TLV (0x03), length 16
Max Delay 0
0x0000: 0000 0000 0000 0000 0000 0000 0000
Terminator TLV (0x00), length 0
[root@hp-dl388g8-22 ~]# tcpdump -i ens3f1np1 ether proto 0x8809 -evvv -Q in
dropped privs to tcpdump
tcpdump: listening on ens3f1np1, link-type EN10MB (Ethernet), capture size 262144 bytes
23:58:15.956375 b0:33:a6:2c:45:67 (oui Unknown) > 01:80:c2:00:00:02 (oui Unknown), ethertype Slow Protocols (0x8809), length 124: LACPv1, length 110
Actor Information TLV (0x01), length 20
System b0:33:a6:2c:45:40 (oui Unknown), System Priority 127, Key 6, Port 2, Port Priority 127
State Flags [Activity, Timeout, Aggregation, Synchronization, Collecting, Distributing]
0x0000: 007f b033 a62c 4540 0006 007f 0002 3f00
0x0010: 0000
Partner Information TLV (0x02), length 20
System 00:0f:53:21:68:30 (oui Unknown), System Priority 100, Key 15, Port 2, Port Priority 255
State Flags [Activity, Aggregation, Synchronization, Collecting, Distributing]
0x0000: 0064 000f 5321 6830 000f 00ff 0002 3d00
0x0010: 0000
Collector Information TLV (0x03), length 16
Max Delay 0
0x0000: 0000 0000 0000 0000 0000 0000 0000
Terminator TLV (0x00), length 0
Linux Bonding Info
# system priority: 系统优先级,数字越小优先级越高。高优先级的system成为Actor,并且选择active port。
# Aggregator ID: 聚合组的id,所有slave应该在同一个聚合组中。(如果slave的speed,duplex不同,那么就会在不同的聚合组中)
# port key: 对于某一端,具有相同key的端口才可以聚合
# port priority: 当不是所有端口都能满足聚合条件时,根据这个优先级,优先选择某些端口进行聚合
# port number: 序号
# port state: Activity, Aggregation, Synchronization, Collecting, Distributing 每个代表一个状态,相加的结果就是这个state
[root@hp-dl388g8-22 ~]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 100
System MAC address: 00:0f:53:21:68:30
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 15
Partner Key: 6
Partner Mac Address: b0:33:a6:2c:45:40
Slave Interface: ens3f0np0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0f:53:21:68:30
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 100
system mac address: 00:0f:53:21:68:30
port key: 15
port priority: 255
port number: 1
port state: 61
details partner lacp pdu:
system priority: 127
system mac address: b0:33:a6:2c:45:40
oper key: 6
port priority: 127
port number: 1
port state: 63
Slave Interface: ens3f1np1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0f:53:21:68:31
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 100
system mac address: 00:0f:53:21:68:30
port key: 15
port priority: 255
port number: 2
port state: 61
details partner lacp pdu:
system priority: 127
system mac address: b0:33:a6:2c:45:40
oper key: 6
port priority: 127
port number: 2
port state: 63