-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
USERDATA datasets removed #218
Comments
I discovered a similar problem, but luckily I caught it before zsys nuked my data. check the results of: I bet it contains a typo or some similar goofiness. More useful info in #81 I ended up just clearing the property, and I think I'm safe now. I may go back and fix this properly, but I need to see if I can rebuild some trust with zsys first. Auto-deleting datasets is not cool. Why not spit out a warning that you have orphaned datasets, and let the administrator confirm to delete them with something like a ' |
I think I know what was root cause for my issue Some time before 'purge' i had a problem with updating system (I executed apt upgrade and after a reboot system does not started correctly), so I decided to restore previous snapshot from GRUB. Snapshot was restored (an I was happy 'how great this worked'), BUT after restoring ROOT from snapshot, root received completely different ID - and that was probably the problem - home volumes was still linked to old ROOT(that was not used anymore because of broken update). Because 'Old Root' was not used and later on cleaned by zsys, all user datasets started to be orphaned and 'cleaned' few days later. So I think 'restore' executed from GRUB worked incorrectly. |
Earlier today I lost all snapshots of my user's home dataset (including those not created by zsys). I only found out because my hourly
Followed a bit later by 😮 (luckily I am logged in and the dataset is mounted):
I don't see any related Looking at the output of To me automatic removal of snapshots not created by zsys is a no go, not to mention the fact that it shouldn't even try to destroy any dataset without first prompting the user. Unfortunately, this is not the first issue I have with Zsys. For example, on a low-spec system to which I send ZFS snapshots with syncoid I'm bitten by #204. And for the system I'm talking about here, I start wondering if Zsys is a blessing or a disaster: on the one hand, I could recover from a failed upgrade from Ubuntu 21.04 to 21.10, on the other hand, doing so involved a lot of manual work using a rescue image because somehow the cloning process had gone wrong. For now (see also #213), the balance points to: uninstall Zsys 😞. I think the concept of Zsys is great, but it should work and definitely not destroy a full dataset without asking. |
Just to get some clarity on this issue, a few questions from a novice zsys user:
|
|
This just happened to me, root@ubuntu:/home/ubuntu# zpool history -i | grep USERDATA
2023-03-24.13:19:47 [txg:163] create rpool/USERDATA (274)
2023-03-24.13:19:47 [txg:164] set rpool/USERDATA (274) canmount=0
2023-03-24.13:19:47 [txg:164] set rpool/USERDATA (274) mountpoint=/
2023-03-24.13:19:47 zfs create rpool/USERDATA -o canmount=off -o mountpoint=/
2023-03-24.13:32:25 [txg:2360] create rpool/USERDATA/rscholz_7r5brd (2444)
2023-03-24.13:32:25 [txg:2361] set rpool/USERDATA/rscholz_7r5brd (2444) canmount=1
2023-03-24.13:32:25 [txg:2361] set rpool/USERDATA/rscholz_7r5brd (2444) mountpoint=/home/rscholz
2023-03-24.13:32:25 zfs create rpool/USERDATA/rscholz_7r5brd -o canmount=on -o mountpoint=/home/rscholz
2023-03-24.13:32:25 [txg:2362] set rpool/USERDATA/rscholz_7r5brd (2444) com.ubuntu.zsys:bootfs-datasets=rpool/ROOT/ubuntu_vhqea7
2023-03-24.13:32:25 zfs set com.ubuntu.zsys:bootfs-datasets=rpool/ROOT/ubuntu_vhqea7 rpool/USERDATA/rscholz_7r5brd
2023-03-24.13:32:25 [txg:2363] create rpool/USERDATA/root_7r5brd (1001)
2023-03-24.13:32:25 [txg:2364] set rpool/USERDATA/root_7r5brd (1001) canmount=1
2023-03-24.13:32:25 [txg:2364] set rpool/USERDATA/root_7r5brd (1001) mountpoint=/root
2023-03-24.13:32:25 zfs create rpool/USERDATA/root_7r5brd -o canmount=on -o mountpoint=/root
2023-03-24.13:32:25 [txg:2369] set rpool/USERDATA/root_7r5brd (1001) com.ubuntu.zsys:bootfs-datasets=rpool/ROOT/ubuntu_vhqea7
2023-03-24.13:32:25 zfs set com.ubuntu.zsys:bootfs-datasets=rpool/ROOT/ubuntu_vhqea7 rpool/USERDATA/root_7r5brd
2023-04-05.13:04:13 [txg:224396] destroy rpool/USERDATA/rscholz_7r5brd (2444) (bptree, mintxg=1)
2023-04-05.13:04:15 [txg:224398] destroy rpool/USERDATA/root_7r5brd (1001) (bptree, mintxg=1) could it have played a role that I still had a second, older ubuntu installation on a secondary SSD? Did it confuse the two rpools? |
@randolf-scholz It is much worse than just confusing the two rpools. It's about destroying ANY pool visible to Ubuntu. I've spent a week trying to understand why all my USB backups get destroyed regularly. At first I blamed So it's not the particular This is really hard to believe, because users' kinda make backups, don't they? And they run something like this to backup all the datasets/volumes/snapshots recursively:
Now guess what will In practice it means that instead of just two recursive commands to backup
UPD: UPD: # First run to init bpool:
syncoid --recursive --exclude=bpool/BOOT --no-sync-snap --sendoptions="raw p" --recvoptions=u bpool backup_usb/unity_21_04/bpool
# All subsequent runs to sync backups:
syncoid --recursive --no-sync-snap --sendoptions="raw p" --recvoptions=u bpool/BOOT backup_usb/unity_21_04/bpool/B
# sync persistent datasets recursively, skip ROOT/USERDATA to then sync them into R/U datasets to avoid zsys auto-managing them:
syncoid --recursive --exclude=rpool/ROOT --exclude=rpool/USERDATA --no-sync-snap --sendoptions="raw p" --recvoptions=u rpool backup_usb/unity_21_04/rpool
syncoid --recursive --no-sync-snap --sendoptions="raw p" --recvoptions=u rpool/ROOT backup_usb/unity_21_04/rpool/R
syncoid --recursive --no-sync-snap --sendoptions="raw p" --recvoptions=u rpool/USERDATA backup_usb/unity_21_04/rpool/U UPD: Alternatively, importing zpool as readonly also prevents zpool import -o readonly=on -N -R /backup backup_usb But this won't allow you to sync new snapshots to it - so it's not quite a solution. |
@randolf-scholz Here's another related bug in It was particularly funny because these several snapshots were the only snapshots on the USB backup disk. The rest of user homedir snaps have previously been carefully and silently destroyed by |
For the record, I had my USERDATA datasets and snapshots destroyed last week. I was playing around with /etc/sysctl.conf with the vm.nr_hugepages setting, as one does, and I accidentally set too high of a value and my system would no longer boot due to out of memory issue. After I was able to boot by live CD, access my zpool, to fix the sysctl.conf and re-boot, I noticed that the zfs history showed that the USERDATA snapshots and and datasets were explicitly deleted in the ZFS history; all at the same timestamp and the timestamp exactly matched the first time I booted the machine with the rogue sysctl setting and memory crash. I checked /var/log... and unfortunately those were nuked. If there other systemd logs available, I may still have the db files to do an extract but that's outside of my wheelhouse. This boot memory issue may be a test case that can be tested in the future to ascertain one possible failure mode of zsys. I recall something similar happening several years back but it was early in a new install and did not impact any data. I did not make any detailed notes at the time. My recently retired system had zsys because it was originally built on Ubuntu 20.04 and upgraded to 22.04. I really miss zsys and I'll be glad to do anything to help if this project gets some attention again. |
In its current state zsys should not be used, period. Such a waste of good software :( |
Describe the bug
I have Raidz1 setup following this guide:
https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/Ubuntu%2020.04%20Root%20on%20ZFS.html
ZSys decided to delete all datasets from USERDATA - forcing all data lost.
zpool history -i | grep USERDATA (...) 2021-10-21.20:24:14 [txg:1232311] destroy rpool/USERDATA/struthio_wxxb0z@autozsys_ugsife (1483) rpool/USERDATA/struthio_wxxb0z@autozsys_ugsife 2021-10-21.20:24:15 [txg:1232320] destroy rpool/USERDATA/struthio_wxxb0z@autozsys_moou2o (1293) rpool/USERDATA/struthio_wxxb0z@autozsys_moou2o 2021-10-21.20:24:16 [txg:1232453] destroy rpool/USERDATA/struthio_wxxb0z@autozsys_dxz0dd (1421) rpool/USERDATA/struthio_wxxb0z@autozsys_dxz0dd 2021-10-21.20:24:17 [txg:1232462] destroy rpool/USERDATA/struthio_wxxb0z@autozsys_fs1km6 (1541) rpool/USERDATA/struthio_wxxb0z@autozsys_fs1km6 2021-10-21.20:24:18 [txg:1232463] destroy rpool/USERDATA/struthio_wxxb0z@autozsys_xo9itu (5424) rpool/USERDATA/struthio_wxxb0z@autozsys_xo9itu 2021-10-21.20:24:18 [txg:1232471] destroy rpool/USERDATA/struthio_wxxb0z@autozsys_lsdu5q (1085) rpool/USERDATA/struthio_wxxb0z@autozsys_lsdu5q 2021-10-21.20:24:19 [txg:1232472] destroy rpool/USERDATA/struthio_wxxb0z@autozsys_3ansly (938) rpool/USERDATA/struthio_wxxb0z@autozsys_3ansly 2021-10-21.20:24:20 [txg:1232473] destroy rpool/USERDATA/struthio_wxxb0z@autozsys_7f2s0s (1651) rpool/USERDATA/struthio_wxxb0z@autozsys_7f2s0s 2021-10-21.20:24:21 [txg:1232474] destroy rpool/USERDATA/struthio_wxxb0z@autozsys_7m0nok (5495) rpool/USERDATA/struthio_wxxb0z@autozsys_7m0nok 2021-11-16.19:39:38 [txg:1275178] destroy rpool/USERDATA/struthio_wxxb0z (1688) (bptree, mintxg=1)
cat /var/log/syslog | grep -i zsys Nov 15 18:29:12 titan systemd[1]: Starting ZSYS daemon service... Nov 15 18:29:13 titan systemd[1]: Started ZSYS daemon service. Nov 15 18:29:13 titan zsysctl[10855]: level=error msg="couldn't save state for user \"struthio\": user \"struthio\" doesn't exist" Nov 15 18:29:13 titan systemd[9016]: zsys-user-savestate.service: Main process exited, code=exited, status=1/FAILURE Nov 15 18:29:13 titan systemd[9016]: zsys-user-savestate.service: Failed with result 'exit-code'. Nov 15 18:30:13 titan systemd[1]: zsysd.service: Deactivated successfully. Nov 15 18:35:05 titan systemd[1]: Starting ZSYS daemon service... Nov 15 18:35:06 titan systemd[1]: Started ZSYS daemon service. Nov 15 18:35:07 titan zsysd[12945]: level=warning msg="[[0795e4d6:9ec78de4]] Couldn't destroy user dataset rpool/USERDATA/struthio_wxxb0z (due to rpool/USERDATA/struthio_wxxb0z): couldn't destroy \"rpool/USERDATA/struthio_wxxb0z\" and its children: cannot destroy dataset \"rpool/USERDATA/struthio_wxxb0z\": dataset is busy" Nov 15 18:35:07 titan zsysctl[12939]: #033[33mWARNING#033[0m Couldn't destroy user dataset rpool/USERDATA/struthio_wxxb0z (due to rpool/USERDATA/struthio_wxxb0z): couldn't destroy "rpool/USERDATA/struthio_wxxb0z" and its children: cannot destroy dataset "rpool/USERDATA/struthio_wxxb0z": dataset is busy Nov 15 18:35:08 titan systemd[1]: zsys-gc.service: Deactivated successfully.
If I see correctly zsys was trying to remove dataset yesterday (but failed since I was working on it). So It deleted it today when I first time booted PC.
To Reproduce
Not sure how to reproduce, since I didn't done anything special today.
Expected behavior
Not deleting user datasets.
For ubuntu users, please run and copy the following:
ubuntu-bug zsys --save=/tmp/report
/tmp/report
content:Screenshots
If applicable, add screenshots to help explain your problem.
Installed versions:
/etc/os-release
)NAME="Ubuntu"
VERSION_ID="21.10"
VERSION="21.10 (Impish Indri)"
VERSION_CODENAME=impish
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=impish
zsysctl version
output)zsysctl 0.5.8
zsysd 0.5.8
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: