devlink-dev:
1. devlink dev show - display devlink device attributes
# devlink dev show
pci/0000:01:00.0
2. devlink dev eswitch set - sets devlink device eswitch attributes
# devlink dev eswitch set pci/0000:5e:00.0 mode switchdev
3. devlink dev param show - display devlink device supported configuration parameters attributes
# devlink dev param show pci/0000:24:00.0:
name internal_error_reset type generic
values:
cmode runtime value true
cmode driverinit value true
name max_macs type generic
values:
cmode driverinit value 128
name region_snapshot_enable type generic
values:
cmode runtime value false
cmode driverinit value false
name enable_64b_cqe_eqe type driver-specific
values:
cmode driverinit value true
name enable_4k_uar type driver-specific
values:
cmode driverinit value false
4. devlink dev param set - set new value to devlink device configuration parameter
# devlink dev param show pci/0000:01:00.0 name fw_load_policy pci/0000:01:00.0:
name fw_load_policy type generic
values:
cmode driverinit value 0
# devlink dev param set pci/0000:01:00.0 name fw_load_policy value 1 cmode driverinit
# devlink dev param show pci/0000:01:00.0 name fw_load_policy
pci/0000:01:00.0:
name fw_load_policy type generic
values:
cmode driverinit value 1
5. devlink dev info - display device information
[root@mlxsw-sn2100-01 ~]# devlink dev info pci/0000:01:00.0
pci/0000:01:00.0:
driver mlxsw_spectrum
versions:
fixed:
hw.revision A0
fw.psid MT_3000111033
running:
fw.version 13.2008.1310
6. devlink dev flash - write device's non-volatile memory
upgrade firmware
devlink-port
devlink port show
Shows the state of all devlink ports on the system.
devlink port show pci/0000:01:00.0/1
Shows the state of specified devlink port.
[root@mlxsw-sn2100-01 ~]# devlink port show
pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false
pci/0000:01:00.0/1: type eth netdev enp1s0np7 flavour physical port 7 splittable true lanes 4
pci/0000:01:00.0/5: type eth netdev enp1s0np8 flavour physical port 8 splittable true lanes 4
pci/0000:01:00.0/9: type eth netdev enp1s0np5 flavour physical port 5 splittable true lanes 4
pci/0000:01:00.0/13: type eth netdev enp1s0np6 flavour physical port 6 splittable true lanes 4
pci/0000:01:00.0/17: type eth netdev enp1s0np3 flavour physical port 3 splittable true lanes 4
pci/0000:01:00.0/21: type eth netdev enp1s0np4 flavour physical port 4 splittable true lanes 4
pci/0000:01:00.0/25: type eth netdev enp1s0np1 flavour physical port 1 splittable true lanes 4
pci/0000:01:00.0/29: type eth netdev enp1s0np2 flavour physical port 2 splittable true lanes 4
pci/0000:01:00.0/33: type eth netdev enp1s0np10 flavour physical port 10 splittable true lanes 4
pci/0000:01:00.0/37: type eth netdev enp1s0np9 flavour physical port 9 splittable true lanes 4
pci/0000:01:00.0/41: type eth netdev enp1s0np12 flavour physical port 12 splittable true lanes 4
pci/0000:01:00.0/45: type eth netdev enp1s0np11 flavour physical port 11 splittable true lanes 4
pci/0000:01:00.0/49: type eth netdev enp1s0np14 flavour physical port 14 splittable true lanes 4
pci/0000:01:00.0/53: type eth netdev enp1s0np13 flavour physical port 13 splittable true lanes 4
pci/0000:01:00.0/57: type eth netdev enp1s0np16 flavour physical port 16 splittable true lanes 4
pci/0000:01:00.0/61: type eth netdev enp1s0np15 flavour physical port 15 splittable true lanes 4
[root@mlxsw-sn2100-01 ~]# devlink port show pci/0000:01:00.0/0
pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false
devlink port set pci/0000:01:00.0/1 type eth
Set type of specified devlink port to Ethernet.
devlink port split pci/0000:01:00.0/1 count 4
Split the specified devlink port into four ports.
devlink port unsplit pci/0000:01:00.0/1
Unplit the specified previously split devlink port.
devlink port health show
Shows status and configuration of all supported reporters registered on all devlink ports.
devlink port health show pci/0000:01:00.0/1 reporter tx
Shows status and configuration of tx reporter registered on pci/0000:01:00.0/1 devlink port.
[root@dell-per740-07 ~]# devlink port health show pci/0000:5e:00.0/1 reporter tx
pci/0000:5e:00.0:
reporter tx
state healthy error 0 recover 0 grace_period 500 auto_recover true auto_dump true
devlink-sb
1.When a packet enters the chip, it is assigned Switch Priority (SP), an internal identifier that informs how the packet is treated in context of other traffic.
There are two ways to determine SP.
1. Use 802.1p priority 1:1 map
2. Use below command to map DSCP and SP
$ lldptool -T -i <port> -V APP app=<SP>,5,<DSCP> # Insert rule.
$ lldptool -T -i <port> -V APP -d app=<SP>,5,<DSCP> # Delete rule.
$ lldptool -t -i <port> -V APP -c app # Show rules.
2.Afterwards, the packet is directed to a priority group (PG) buffer in the port's headroom based on its SP. The port's headroom buffer is used to store incoming packets on the port while the
go through the switch's pipeline and also to store packets in a lossless flow if they are not allowed to enter the switch's shared buffer. However, if there is no room for the packet in the
headroom, it gets dropped.
prioriy (PG) also called Ingress TC
Mapping an SP to a PG buffer:https://github.com/Mellanox/mlxsw/wiki/Quality-of-Service#priority-group-buffers
1.mlnx_qos -i swp1 --prio_tc=0,0,0,0,1,1,1,1
2.mlnx_qos -i swp1 --prio2buffer=0,0,0,0,1,1,1,1
3.lldptool -T -i swp1 -V ETS-CFG up2tc=0:0,1:1,2:2,3:3,4:4,5:5,6:6,7:7
3.The packet then goes through the switch's pipeline which determines its egress port.
The egress TC is determined based on the packet's 802.1p priority and the egress port's up2tc mapping.
4.Packets are admitted to the switch's shared buffer from the port's headroom and stay there until they are transmitted out of the switch.
The device has two types of pools, ingress and egress. The pools are used as containers for packets and allow a user to limit the amount of traffic:
From a port
From a {Port, PG buffer}
To a port
To a {Port, TC}
Admission Rules
Once out of the switch's pipeline, a packet is admitted into the shared buffer only if all four quotas mentioned above are below the configured threshold:
Ingress{Port}.Usage < Thres
Ingress{Port,PG}.Usage < Thres
Egress{Port}.Usage < Thres
Egress{Port,TC}.Usage < Thres
To configure a pool's size and threshold type, run:
devlink sb pool set pci/0000:03:00.0 pool 0 size 12401088 thtype dynamic
To see the current settings of a pool, run:
devlink sb pool show pci/0000:03:00.0 pool 0
The limit can be either a specified amount of bytes (static) or a percentage of the remaining free space (dynamic).
To set a static threshold, run:
devlink sb pool set pci/0000:03:00.0 pool 0 size 12401088 thtype static
To set a dynamic threshold, run:
$ devlink sb pool set pci/0000:03:00.0 pool 0 size 12401088 thtype dynamic
To bind packets originating from a {Port, PG} to an ingress pool, run:
$ devlink sb tc bind set pci/0000:03:00.0/1 tc 0 type ingress pool 0 th 9
$ devlink sb tc bind set swp1 tc 0 type ingress pool 0 th 9
To see the current settings of binding of {Port, PG} to an ingress pool, run:
$ devlink sb tc bind show swp1 tc 0 type ingress
swp1: sb 0 tc 0 type ingress pool 0 threshold 10
Similarly for egress, to bind packets directed to a {Port, TC} to an egress pool, run:
$ devlink sb tc bind set swp1 tc 0 type egress pool 4 th 9
It is possible to take a snapshot of the shared buffer usage with the following command:
$ devlink sb occupancy snapshot pci/0000:03:00.0
https://github.com/Mellanox/mlxsw/wiki/Quality-of-Service#shared-buffers
5.The packet stays in the shared buffer until it is transmitted from its designated egress port. The packet is queued for transmission according to its Traffic Class (TC). Once in its assigned
queue, the packet is scheduled for transmission based on the chosen traffic selection algorithm (TSA) employed on the TC and its various parameters. The mapping from SP to TC and TSA
configuration are discussed in the https://github.com/Mellanox/mlxsw/wiki/Quality-of-Service#traffic-scheduling
devlink-resoruce:
To show resources:
# devlink resource show pci/0000:01:00.0
[root@mlxsw-sn2100-01 ~]# devlink resource show pci/0000:01:00.0
pci/0000:01:00.0:
name kvd size 258048 unit entry dpipe_tables none
resources:
name linear size 98304 occ 0 unit entry size_min 0 size_max 159744 size_gran 128 dpipe_tables none
resources:
name singles size 16384 occ 0 unit entry size_min 0 size_max 159744 size_gran 1 dpipe_tables none
name chunks size 49152 occ 0 unit entry size_min 0 size_max 159744 size_gran 32 dpipe_tables none
name large_chunks size 32768 occ 0 unit entry size_min 0 size_max 159744 size_gran 512 dpipe_tables none
name hash_double size 65408 unit entry size_min 32768 size_max 192512 size_gran 128 dpipe_tables none
name hash_single size 94336 unit entry size_min 65536 size_max 225280 size_gran 128 dpipe_tables none
name span_agents size 3 occ 0 unit entry dpipe_tables none
name counters size 32000 occ 4 unit entry dpipe_tables none
resources:
name flow size 24576 occ 4 unit entry dpipe_tables none
name rif size 7424 occ 0 unit entry dpipe_tables none
name global_policers size 1000 unit entry dpipe_tables none
resources:
name single_rate_policers size 968 occ 0 unit entry dpipe_tables none
Change resource size
# devlink resource set pci/0000:01:00.0 /kvd/linear 131072
devlink-region:
devlink-region - devlink address region access
devlink region show - Show all supported address regions names, snapshots and sizes
devlink region del - Delete a snapshot specified by address-region name and snapshot ID
devlink region dump - Dump all the available data from a region or from snapshot of a region
devlink region read - Read from a specific region address for a given length
devlink-health: manage predefined health reproter
[root@mlxsw-sn2100-01 ~]# devlink health show
pci/0000:01:00.0:
reporter fw_fatal
state healthy error 0 recover 0 auto_dump true
devlink health recover - Initiate a recovery operation on a reporter.
devlink health diagnose - Retrieve diagnostics data on a reporter.
devlink health set pci/0000:00:09.0 reporter fw_fatal grace_period 3500
Set time interval between auto recoveries to minimum of 3500 msec on the specified device and reporter.
devlink-trap: manage predefined traps
# devlink trap set pci/0000:01:00.0 trap source_mac_is_multicast action drop
# devlink trap show
pci/0000:01:00.0:
name source_mac_is_multicast type drop generic true action drop group l2_drops
name vlan_tag_mismatch type drop generic true action drop group l2_drops
name ingress_vlan_filter type drop generic true action drop group l2_drops
name ingress_spanning_tree_filter type drop generic true action drop group l2_drops
name port_list_is_empty type drop generic true action drop group l2_drops
name port_loopback_filter type drop generic true action drop group l2_drops
name blackhole_route type drop generic true action drop group l3_drops
name non_ip type drop generic true action drop group l3_drops
name uc_dip_over_mc_dmac type drop generic true action drop group l3_drops
name dip_is_loopback_address type drop generic true action drop group l3_drops
name sip_is_mc type drop generic true action drop group l3_drops
name sip_is_loopback_address type drop generic true action drop group l3_drops
name ip_header_corrupted type drop generic true action drop group l3_drops
name ipv4_sip_is_limited_bc type drop generic true action drop group l3_drops
name ipv6_mc_dip_reserved_scope type drop generic true action drop group l3_drops
name ipv6_mc_dip_interface_local_scope type drop generic true action drop group l3_drops
name mtu_value_is_too_small type exception generic true action trap group l3_exceptions
name ttl_value_is_too_small type exception generic true action trap group l3_exceptions
name mc_reverse_path_forwarding type exception generic true action trap group l3_exceptions
name reject_route type exception generic true action trap group l3_exceptions
name unresolved_neigh type exception generic true action trap group l3_exceptions
name ipv4_lpm_miss type exception generic true action trap group l3_exceptions
name ipv6_lpm_miss type exception generic true action trap group l3_exceptions
name irif_disabled type drop generic false action drop group l3_drops
name erif_disabled type drop generic false action drop group l3_drops
name non_routable_packet type drop generic true action drop group l3_drops
name decap_error type exception generic true action trap group tunnel_drops
name overlay_smac_is_mc type drop generic true action drop group tunnel_drops
name ingress_flow_action_drop type drop generic true action drop group acl_drops
name egress_flow_action_drop type drop generic true action drop group acl_drops
name stp type control generic true action trap group stp
name lacp type control generic true action trap group lacp
name lldp type control generic true action trap group lldp
name igmp_query type control generic true action mirror group mc_snooping
name igmp_v1_report type control generic true action trap group mc_snooping
name igmp_v2_report type control generic true action trap group mc_snooping
name igmp_v3_report type control generic true action trap group mc_snooping
name igmp_v2_leave type control generic true action trap group mc_snooping
name mld_query type control generic true action mirror group mc_snooping
name mld_v1_report type control generic true action trap group mc_snooping
name mld_v2_report type control generic true action trap group mc_snooping
name mld_v1_done type control generic true action trap group mc_snooping
name ipv4_dhcp type control generic true action trap group dhcp
name ipv6_dhcp type control generic true action trap group dhcp
name arp_request type control generic true action mirror group neigh_discovery
name arp_response type control generic true action mirror group neigh_discovery
name arp_overlay type control generic true action trap group neigh_discovery
name ipv6_neigh_solicit type control generic true action trap group neigh_discovery
name ipv6_neigh_advert type control generic true action trap group neigh_discovery
name ipv4_bfd type control generic true action trap group bfd
name ipv6_bfd type control generic true action trap group bfd
name ipv4_ospf type control generic true action trap group ospf
name ipv6_ospf type control generic true action trap group ospf
name ipv4_bgp type control generic true action trap group bgp
name ipv6_bgp type control generic true action trap group bgp
name ipv4_vrrp type control generic true action trap group vrrp
name ipv6_vrrp type control generic true action trap group vrrp
name ipv4_pim type control generic true action trap group pim
name ipv6_pim type control generic true action trap group pim
name uc_loopback type control generic true action mirror group uc_loopback
name local_route type control generic true action trap group local_delivery
name external_route type control generic true action trap group external_delivery
name ipv6_uc_dip_link_local_scope type control generic true action trap group local_delivery
name ipv4_router_alert type control generic true action trap group local_delivery
name ipv6_router_alert type control generic true action trap group local_delivery
name ipv6_dip_all_nodes type control generic true action trap group ipv6
name ipv6_dip_all_routers type control generic true action trap group ipv6
name ipv6_router_solicit type control generic true action trap group ipv6
name ipv6_router_advert type control generic true action trap group ipv6
name ipv6_redirect type control generic true action trap group ipv6
name ptp_event type control generic true action trap group ptp_event
name ptp_general type control generic true action trap group ptp_general
name flow_action_sample type control generic true action mirror group acl_sample
name flow_action_trap type control generic true action trap group acl_trap
devlink-monitor
devlink monitor [dev, port, health, trap, trap-group, trap-policer]