Releases: strizhechenko/netutils-linux
RSS and RPS minor updates.
RSS
rss-ladder
tool supports PCI-slot-based queue naming now. I mean this:
30: 127355089 0 0 0 PCI-MSI-edge mlx5_comp0@pci:0000:01:00.0
31: 120112828 5482507 0 0 PCI-MSI-edge mlx5_comp1@pci:0000:01:00.0
32: 121978940 0 5524729 0 PCI-MSI-edge mlx5_comp2@pci:0000:01:00.0
33: 122736116 0 0 5465612 PCI-MSI-edge mlx5_comp3@pci:0000:01:00.0
How to tune it?
$ rss-ladder pci:0000:01:00.0
- distribute interrupts of pci:0000:01:00.0 (mlx5_async_eq) on socket 0
- distribute interrupts of pci:0000:01:00.0 (mlx5_cmd_eq) on socket 0
- distribute interrupts of pci:0000:01:00.0 (mlx5_comp) on socket 0
- pci:0000:01:00.0: queue mlx5_comp0@pci:0000:01:00.0 (irq 30) bound to CPU0
- pci:0000:01:00.0: queue mlx5_comp1@pci:0000:01:00.0 (irq 31) bound to CPU1
- pci:0000:01:00.0: queue mlx5_comp2@pci:0000:01:00.0 (irq 32) bound to CPU2
- pci:0000:01:00.0: queue mlx5_comp3@pci:0000:01:00.0 (irq 33) bound to CPU3
- distribute interrupts of pci:0000:01:00.0 (mlx5_pages_eq) on socket 0
It may be not perfect but it works at least. Well, at least for mlx5 driver.
RPS
autorps
tool doesn't yelling at you with dreadful exception if you try to tune multiqueue NIC. Just says that it may be wrong idea and you should use -f
flag to really change RPS settings.
Also some processors have inverted CPU masks in rps_cpus
file and you could put all processing on foreign NUMA node. I don't know how these masks work and don't want to know, so default behaviour now is to copy mask from /sys/class/net/$dev/device/local_cpus
instead of evaluate it.
Absolutely no new features. Useless release. Increased memory usage. More lines of code. Whyyyy
New class structure
There is a new class-structure in server-info
utility and netutils_linux_hardware
package:
Server class manages five subsystems - CPU, Disk, Net, Memory and System.
Server (server-info) can collect (--collect), read (--show) and rate (--rate) data.
Before the refactoring there were 3 big classes: Reader, Parser (--show) and Assessor (--rate). They had duplicated data about subsystems. Well, there were no "subsystems", there were just a lot of functions with prefixes in those classes. Now all those functions live in their own subsystems, all subsystems have standardised API (that's very cool for Server class, it can just iterate over subsystems).
Folding
There is a new Folding
class with all the folding logic/constants. Also there is no more -f, -ff, -fff args, use --device, --subsystem, --server instead.
Other things
Some code was simplified and I restored run tests for server-info --rate
. Also I got rid of six.iteritems
dependency in few places and just use .items(). There no big data, so I don't think that 2-3 kbits of RAM are more than code simplicity.
You can specify a subsystem you want to rate or show now
All new options available:
--cpu Show information about CPU
--memory Show information about RAM
--net Show information about network devices
--disk Show information about disks
--system Show information about system overall (rate only)
Example:
# server-info --rate --device --net --cpu
cpu:
BogoMIPS: 5
CPU MHz: 5
CPU(s): 5
Core(s) per socket: 10
L3 cache: 7
Socket(s): 10
Thread(s) per core: 10
Vendor ID: 10
net:
eth1: 3.6666666666666665
eth6: 9.666666666666666
eth7: 9.666666666666666
It also works with --show
:
# server-info --show --memory --disk
disk:
vda:
model: null
size: 21474836480
type: HDD
memory:
devices:
'0x1100':
size: '512'
speed: 0
type: RAM
size:
MemFree: 78272
MemTotal: 500196
SwapFree: 0
SwapTotal: 0
Also if you run server-info without necessary parameters it shows more human-oriented error and --help output instead of traceback with AssertionError.
Refactored server-info
Well, I failed the challenge to release it before new 2018 year. But later is better than never!
Detail are boring, you'd better look at examples in README!
- All the server-info-* utils have one entry point now.
- Old
server-info-rate
andserver-info-show
utils deleted. server-info-collect
called via wrapper until it will be rewritten in python. It's a separate issue.- Added (and fixed) tests for server-info --rate feature. It doesn't fail on all examples in
./tests/server-info-show.test/
- Yes, utils call looks like this now:
server-info --rate
instead ofserver-info-rate
- You can collect data and optional pack it into tarball with
server-info --directory <path-to-directory> --gzip
. It will make<path-to-directory>.tar.gz
with all the data that you can take from server for later analyze. - New examples are already in README.
Folding in server-info-rate
You can now skip details of your server's rating this way:
Usage
server-info-rate -f
- shows entire device rateserver-info-rate -ff
- shows entire subsystem's rateserver-info-rate -fff
- shows entire server's rate
Example
$ server-info-rate -fff
WARNING: why do you use 20 years old hardware, dude?
Just a joke. For example:
➜ vscale-vm git:(folding) ✗ server-info-rate -ff
cpu: 4.5
disk: 1.0
memory: 1.0
net: 1.3333333333333333
system: 1.0
It can't be used directly via server-info rate
call, you should go to /root/server/ before run server-info-rate
. I know it's shitty usability, I'll fix it this week in 2.7.2.
DMI Decode and problems with py2.6+travis
Good news first
There is optional dmidecode support, now you can see how good your RAM is:
Bad news
Python 2.6 is deprecated so hard, so it's probably impossible to use pytest to run tests in Travis-ci anymore. I tried to jump into details and problem is probably not in pytest or python, but in pip. pip 8.0.1 installed by yum in CentOS 6 installed the latest version of pytest without any problems while pip 9.0.1 refused. I don't know how to set version of pip for python 2.6 environment in Travis, so I just removed it from .travis.yml.
What does it mean? Higher probability of making bugs specific for python 2.6 - some modern syntax usage, etc. But is it a problem?
Main py2.6 users are CentOS 6 users. Where are python3.4 in EPEL in CentOS 6, so it's still possible to use.
Also, there are Carbon Reductor 7 users. Well, most of them have old versions installed already and everything works. In Carbon Reductor 8 I will move netutils-linux from py2.6 env to 3.4 in January when I'll upgrade to a new version (from... 2.0?).
We (Carbon Soft) also offer a help in migrating from Carbon Reductor 7 to Carbon Reductor 8. So probably no one will suffer from bugs (I hope), but feel free to create issues.
Boring release with just fixed bugs
Fixed:
- Highlighting of CPU/NIC had been able to break
rss-ladder
- numa and socket layout were mixed-up
- lscpu output could not be parsed correctly in python3 and everything that needs CPU topology thought that you had only one logical CPU
And WOW, A FEATURE OF YEAR:
- you may pass file with
lscpu -p
output tonetwork-top
in debugging purposes.
New utility – snmptop
Here comes dat autoxps!
Well, the only difference between rps and xps tuning if queue prefix (rx and tx)... So here are 25 lines changed and you are able to distribute packets transmitting between CPUs even with single-queue NIC!
Example:
# autoxps eth0
Using mask 'ff' for eth0-tx-0
New autorps utility doesn't afraid of difficulties!
Autorps didn't work in systems with multiple NUMA-nodes or CPU sockets before, because it had been calculating cpu mask by total cpu count and wasn't aware of CPU/NUMA topology.
It has been rewritten in python. Now you can:
- Use it on single-queue NICs in multinuma systems. CPU socket/NUMA node to bind network packet processing will be chosen automatically (rss-ladder is able to do it too!) by reading
/sys/class/net/$NIC/device/numa_node
(fallback - 0). --force
it to work with multiqueue NIC.- Pass custom CPU mask.
--cpu-mask=fe
- Pass custom CPU list
--cpus 0 2 4 6
(in the end of options). - Test it before using by
--dry-run
: it will print something likeUsing mask 'fc0' for eth0-rx-0
. - Explicitely define socket to bind queues by
--socket=1
. Why would you ever need it? Because you may found out that moving this nic to external NUMA-node gives you better performance than put all your NIC's on the device's local NUMA-node (and you can't put NIC in this NUMA-node's PCI slot right now).
Also I accidentally drop .pylintrc in repo and fixed all small pep8 violations and other code smells that landscape.io was hiding with default settings.