-
-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ceph: ceph-volume lvm batch support #225
Comments
Sounds like a good idea. Some suggestions for implementation using the suggested schema (but should still apply in any case):
I think, ideally, if we're going to mix use of Hm... After skimming through The Literature1, I'm wondering if trying to interface over Anyway, the reason why I think it might be unnecessary is that...the following seems to suggest
Which seems to suggest to me that maybe there's another way of configuring it outside of using lvm batch. But given the general benefits from lvm batch, might as well use it. Alternatively...drive groups5 but I feel like that would definitely be pushing the scope of this role. 😅 I'll think about it some more but if you open a PR I'll take a look and evaluate. 0 https://github.com/lae/ansible-role-proxmox/blob/develop/tasks/ceph.yml#L39 |
It absolutely can be, and rather than trying to implement something asap I think it's a good idea to let this Issue brew for a while in case other ceph users would want to give feedback. For example, simply augmenting support for the Thanks for the schema pointers. I was also contemplating something like:
...which would retain more of the existing structure, but needlessly replicate information. And as well something like:
...which I suppose is a bit unintuitive. |
Regarding this I can mention from brief testing that the
|
Right, yeah. From my interpretation those are just extra OSD daemons running for the same device (hence the worker term I used), which brings performance improvements. So it's a number that could theoretically be modified at any time without modifying the associated disk, I think. |
Computer says no (but also...yes? regarding the ls-by-host part):
If one batch op has already carved the NVMe into ~ 4x900GB , even if user would be willing to take the hit from a rebalance or mitigate it with noout or whatever, the tool is not magical enough to rearrange the NVMe to e.g. the 2x1800GB attempted above. I dunno, maybe this is a bit more involved than I initially thought:) I pushed some stuff here but will now be AFK for quite some time before can look at this further. |
Oh! Okay, yeah that does seem to indicate my understanding for osds_from_devices from just the docs was incorrect. (I have like, no environment (or funds to make one to be honest) to be able to test/experiment with this unfortunately....) |
This is part feature request part reminder for myself as I could probably whip up a PR at some point.
According to The Literature (1*, 2*) throwing a DB/WAL to an NVMe might not be the only way to utilize an NVMe. But, when storing data, it might be sub-optimal to have the OSD span the entire device - it is better to split it at least a little bit.
Enter LVM batches:
A)
ceph-volume lvm batch --osds-per-device 4 /dev/nvme2n1
or with separate DB
B)
ceph-volume lvm batch --osds-per-device 4 /dev/nvme2n1 --db-devices /dev/nvme0n1
or with multiple data cards
C)
ceph-volume lvm batch --osds-per-device 4 /dev/nvme2n1 /dev/nvme3n1 --db-devices /dev/nvme0n1
I suppose the input data could look something like:
...but up for debate of course.
I have tested scenario A) by running the command manually server-side, then adding just
- device /dev/nvme2n1
intopve_ceph_osds
then running playbook. Works fine. This is on latest PVE, Ceph quincy, latest version of the role. But it would be handy to have the role control the batch creation, hence writing up this Issue.IIUC when offloading DB to a dedicated device, once a batch is built there will be no possibility to add more devices later on as all of the DB device space will be used entirely and split evenly upon build time. So a device add to an existing batch would probably mean a teardown + rebuild for the entire batch. Meaning better have spare cluster capacity for such scenarios.
1* https://forum.proxmox.com/threads/recommended-way-of-creating-multiple-osds-per-nvme-disk.52252/
2* https://www.reddit.com/r/ceph/comments/jnyxgm/how_do_you_create_multiple_osds_per_disk_with/
The text was updated successfully, but these errors were encountered: