diff --git a/README.md b/README.md index cb9cde0..761cdf3 100644 --- a/README.md +++ b/README.md @@ -4,31 +4,182 @@ The definitive all-in-one [SnapRAID](https://github.com/amadvance/snapraid) scri There are many SnapRAID scripts out there, but none could fit my needs. So I took the best of them to start a new one. It is meant to be run periodically (e.g. daily) and do the heavy lifting, then send an email you will actually read. -It is highly customizable. -It has been tested with Debian 10 and OpenMediaVault 5. + +Supports single and dual parity configurations. + +It is customizable and has been tested with Debian 10 and OpenMediaVault 5. Contributions are welcome: there's always room for improvement! -This readme has some rough edges which will be smoothened over time. +_This readme has some rough edges which will be smoothened over time._ + +# Highlights + +## How it works +- After some preliminary checks, the script will execute `snapraid diff` to figure out if parity info is out of date, which means checking for changes since the last execution. +- One of the following will happen: + - If parity info is out of sync **and** the number of deleted or changed files exceed the threshold you have configured it **stops**. You may want to take a look to the output log. + - If parity info is out of sync **and** the number of deleted or changed files exceed the threshold, you can still **force a sync** after a number of warnings. It's useful If you often get a false alarm but you're confident enough. + - If parity info is out of sync **but** the number of deleted or changed files did not exceed the treshold, it **executes a sync** to update the parity info. +- When the parity info is in sync, either because nothing has changed or after a successfully sync, it runs the `snapraid scrub` command to validate the integrity of the data, both the files and the parity info. _Note that each run of the scrub command will validate only a configurable portion of parity info to avoid having a long running job and affecting the performance of the server._ +- When the script is done sends an email with the results, both in case of error or success. + +Pre-hashing is enabled by default to avoid silent read errors. It mitigates the lack of ECC memory. + +## A nice email report +This report produces emails that don't contain a list of changed files to improve clarity. + +You can re-enable full output in the email by switching the option `VERBOSITY` but the full report will always be available in `/tmp/snapRAID.out` and will be replaced after each run or deleted when the system is shut down if kept there. + +SMART drive report from SnapRAID is also included by default. + +Here's a sneak peek of the email report. + +```markdown +## [COMPLETED] DIFF + SYNC + SCRUB Jobs (SnapRAID on omv-test.local) + +SnapRAID Script Job started [Sat Jan 9 02:07:46 CET 2021] +Running SnapRAID version 11.5 +SnapRAID Script version 2.7.0 + +---------- + +## Preprocessing + +Configuration file found! Proceeding. +Testing that all parity files are present. +All parity files found. Continuing... + +---------- + +## Processing + +### SnapRAID TOUCH [Sat Jan 9 02:07:46 CET 2021] + +Checking for zero sub-second files. +No zero sub-second timestamp files found. +TOUCH finished [Sat Jan 9 02:07:46 CET 2021] + +### SnapRAID DIFF [Sat Jan 9 02:07:46 CET 2021] + +DIFF finished [Sat Jan 9 02:07:46 CET 2021] + +**SUMMARY of changes - Added [2] - Deleted [0] - Moved [0] - Copied [0] - Updated [0]** + +There are deleted files. The number of deleted files, (0), is below the threshold of (2). SYNC Authorized. +There are updated files. The number of updated files, (0), is below the threshold of (2). SYNC Authorized. + +### SnapRAID SYNC [Sat Jan 9 02:07:46 CET 2021] + +Self test... +Loading state from /srv/dev-disk-by-label-DISK1/snapraid.content... +Scanning disk DATA1... +Scanning disk DATA2... +Using 0 MiB of memory for the file-system. +Initializing... +Hashing... +SYNC_JOB--Everything OK +Resizing... +Saving state to /srv/dev-disk-by-label-DISK1/snapraid.content... +Saving state to /srv/dev-disk-by-label-DISK2/snapraid.content... +Saving state to /srv/dev-disk-by-label-DISK3/snapraid.content... +Saving state to /srv/dev-disk-by-label-DISK4/snapraid.content... +Verifying /srv/dev-disk-by-label-DISK1/snapraid.content... +Verifying /srv/dev-disk-by-label-DISK2/snapraid.content... +Verifying /srv/dev-disk-by-label-DISK3/snapraid.content... +Verifying /srv/dev-disk-by-label-DISK4/snapraid.content... +Verified /srv/dev-disk-by-label-DISK4/snapraid.content in 0 seconds +Verified /srv/dev-disk-by-label-DISK3/snapraid.content in 0 seconds +Verified /srv/dev-disk-by-label-DISK2/snapraid.content in 0 seconds +Verified /srv/dev-disk-by-label-DISK1/snapraid.content in 0 seconds +Syncing... +Using 32 MiB of memory for 32 cached blocks. + +DATA1 59% | *********************************** +DATA2 55% | ******************************** +parity 0% | +2-parity 0% | +raid 6% | +hash 5% | +sched 7% | +misc 17% | +|______________ +wait time (total, less is better) + +SYNC_JOB--Everything OK +Saving state to /srv/dev-disk-by-label-DISK1/snapraid.content... +Saving state to /srv/dev-disk-by-label-DISK2/snapraid.content... +Saving state to /srv/dev-disk-by-label-DISK3/snapraid.content... +Saving state to /srv/dev-disk-by-label-DISK4/snapraid.content... +Verifying /srv/dev-disk-by-label-DISK1/snapraid.content... +Verifying /srv/dev-disk-by-label-DISK2/snapraid.content... +Verifying /srv/dev-disk-by-label-DISK3/snapraid.content... +Verifying /srv/dev-disk-by-label-DISK4/snapraid.content... +Verified /srv/dev-disk-by-label-DISK4/snapraid.content in 0 seconds +Verified /srv/dev-disk-by-label-DISK3/snapraid.content in 0 seconds +Verified /srv/dev-disk-by-label-DISK2/snapraid.content in 0 seconds +Verified /srv/dev-disk-by-label-DISK1/snapraid.content in 0 seconds +SYNC finished [Sat Jan 9 02:07:49 CET 2021] + +### SnapRAID SCRUB [Sat Jan 9 02:07:49 CET 2021] + +Self test... +Loading state from /srv/dev-disk-by-label-DISK1/snapraid.content... +Using 0 MiB of memory for the file-system. +Initializing... +Scrubbing... +Using 48 MiB of memory for 32 cached blocks. +SCRUB_JOB--Nothing to do +SCRUB finished [Sat Jan 9 02:07:49 CET 2021] + +---------- + +## Postprocessing + +SnapRAID SMART report: + +Temp Power Error FP Size +C OnDays Count TB Serial Device Disk + +---------- + + - - - SSD 0.0 00000000000000000001 /dev/sdb DATA1 + - - - - 0.0 01000000000000000001 /dev/sdc DATA2 + - - - SSD 0.0 02000000000000000001 /dev/sdd parity + - - - SSD 0.0 03000000000000000001 /dev/sde 2-parity + 0 - - - 0.0 - /dev/sda - + +The FP column is the estimated probability (in percentage) that the disk +is going to fail in the next year. + +Probability that at least one disk is going to fail in the next year is 0%. +All jobs ended. [Sat Jan 9 02:07:49 CET 2021] +Email address is set. Sending email report to example@example.com [Sat Jan 9 02:07:49 CET 2021] +``` + +## Customization +Many options can be changed to your taste, their behaviour is documented in the script config file. + +If you don't know what to do, I recommend using the default values and see how it performs. + +You can also change more advanced options such as mail binary (by default uses `mailx`), SnapRAID binary location, log file location. -# Features -[WIP] # Requirements - Markdown to have nice emails -- Hd-idle to spin down disks - [Link TBD] +- ~~Hd-idle to spin down disks - [Link TBD] - currently not required since spin down does not work properly.~~ # Installation -[WIP] -1. Install markdown `apt install python-markdown` -2. Place the script wherever you prefer e.g. `/usr/sbin/snapraid` -3. Give executable rights - `chmod +x snapraid-aio-script.sh` -4. Open the script and add your email address at line 43 -5. Tweak the script if needed +1. Install markdown `apt install python-markdown`. You can skip this step since the script will check and install it for you. +2. Download config file and script, then place wherever you prefer e.g. `/usr/sbin/snapraid` +3. Give executable rights to the main script - `chmod +x snapraid-aio-script.sh` +4. Edit the config file and add your email address at line 9 +5. Tweak the config file if needed 6. Schedule the script execution time # Known Issues -Hard disk spin down does not work: they are immediately woken up. The script probably does not handle this correctly while running. +- Hard disk spin down does not work: they are immediately woken up. The script probably does not handle this correctly while running. +- The report is not perfect, we can't be solve this because SnapRAID does not natively support Markdown. # Credits All rights belong to the respective creators. @@ -36,5 +187,5 @@ Thanks to: - [Zack Reed](https://zackreed.me/snapraid-split-parity-sync-script/) for most of the original script - [mtompkins](https://gist.github.com/mtompkins/91cf0b8be36064c237da3f39ff5cc49d) for most of the original script - [sburke](https://zackreed.me/snapraid-split-parity-sync-script/#comment-300) for the Debian 10 fix -- metagliatore (I don't think he's on Github) for removing the DIFF output from the email +- metagliatore (a friend, not on Github) for removing the DIFF output from the email - [ozboss](https://forum.openmediavault.org/wsc/index.php?user/27331-ozboss/) diff --git a/script-config.sh b/script-config.sh new file mode 100644 index 0000000..c6a7952 --- /dev/null +++ b/script-config.sh @@ -0,0 +1,82 @@ +#!/bin/bash +###################### +# USER VARIABLES # +###################### + +####################### USER CONFIGURATION START ####################### + +# address where the output of the jobs will be emailed to. +EMAIL_ADDRESS="youremailgoeshere" + +# Set the threshold of deleted files to stop the sync job from running. +# NOTE that depending on how active your filesystem is being used, a low +# number here may result in your parity info being out of sync often and/or +# you having to do lots of manual syncing. +DEL_THRESHOLD=500 +UP_THRESHOLD=500 + +# Set number of warnings before we force a sync job. +# This option comes in handy when you cannot be bothered to manually +# start a sync job when DEL_THRESHOLD is breached due to false alarm. +# Set to 0 to ALWAYS force a sync (i.e. ignore the delete threshold above) +# Set to -1 to NEVER force a sync (i.e. need to manual sync if delete threshold is breached) +SYNC_WARN_THRESHOLD=-1 + +# Set percentage of array to scrub if it is in sync. +# i.e. 0 to disable and 100 to scrub the full array in one go +# WARNING - depending on size of your array, setting to 100 will take a very long time! +SCRUB_PERCENT=5 +SCRUB_AGE=10 + +# Prehash Data To avoid the risk of a latent hardware issue, you can enable the "pre-hash" mode and have all the +# data read two times to ensure its integrity. This option also verifies the files moved inside the array, to ensure +# that the move operation went successfully, and in case to block the sync and to allow to run a fix operation. +# 1 to enable, any other values to disable +PREHASH=1 + +# Set the option to log SMART info. 1 to enable, any other value to disable +SMART_LOG=1 + +# Set verbosity of the email output. TOUCH and DIFF outputs will be kept in the email, producing a potentially huge email. Keep this disabled for optimal reading +# You can always check TOUCH and DIFF outputs using the TMP file. +# 1 to enable, any other values to disable +VERBOSITY=0 + +# Set if disk spindown should be performed. Depending on your system, this may not work. 1 to enable, any other values to disable +SPINDOWN=0 + +# Run snapraid status command to show array general information. +# Be aware the HTML output is pretty broken. +SNAP_STATUS=0 + +# location of the snapraid binary +SNAPRAID_BIN="/usr/bin/snapraid" +# location of the mail program binary +MAIL_BIN="/usr/bin/mailx" + +####################### USER CONFIGURATION END ####################### + +####################### SYSTEM CONFIGURATION ####################### +# Make changes only if you know what you're doing +###################### + +# Init variables +CHK_FAIL=0 +DO_SYNC=0 +EMAIL_SUBJECT_PREFIX="(SnapRAID on `hostname`)" +GRACEFUL=0 +SYNC_WARN_FILE="$CURRENT_DIR/snapRAID.warnCount" +SYNC_WARN_COUNT="" +TMP_OUTPUT="/tmp/snapRAID.out" +SNAPRAID_LOG="/var/log/snapraid.log" +SECONDS=0 #Capture time + +# Expand PATH for smartctl +PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin + +# Determine names of first content file... +CONTENT_FILE=`grep -v '^$\|^\s*\#' /etc/snapraid.conf | grep snapraid.content | head -n 1 | cut -d " " -f2` + +# Build an array of parity all files... +PARITY_FILES[0]=`grep -v '^$\|^\s*\#' /etc/snapraid.conf | grep snapraid.parity | head -n 1 | cut -d " " -f2` +IFS=$'\n' PARITY_FILES=(`cat /etc/snapraid.conf | grep "^[^#;]" | grep "^\([2-6z]-\)*parity" | cut -d " " -f 2 | tr ',' '\n'`) diff --git a/snapraid-aio-script b/snapraid-aio-script.sh similarity index 72% rename from snapraid-aio-script rename to snapraid-aio-script.sh index 658cfae..4c8834a 100644 --- a/snapraid-aio-script +++ b/snapraid-aio-script.sh @@ -1,130 +1,39 @@ #!/bin/bash ######################################################################## -# This is a helper script that keeps snapraid parity info in sync with -# your data and optionally verifies the parity info. Here's how it works: -# 1) Calls diff to figure out if the parity info is out of sync. -# 2) If parity info is out of sync, AND the number of deleted or changed files exceed -# X (each configurable), it triggers an alert email and stops. (In case of -# accidental deletions, you have the opportunity to recover them from -# the existing parity info. This also mitigates to a degree encryption malware.) -# 3) If parity info is out of sync, AND the number of deleted or changed files exceed X -# AND it has reached/exceeded Y (configurable) number of warnings, force -# a sync. (Useful when you get a false alarm above and you can't be bothered -# to login and do a manual sync. Note the risk is if its not a false alarm -# and you can't access the server to fix the issue before the job is run Y number of -# times... Well I hope you have other backups...) -# 4) If parity info is out of sync BUT the number of deleted files did NOT -# exceed X, it calls sync to update the parity info. -# 5) If the parity info is in sync (either because nothing changed or after it -# has successfully completed the sync job, it runs the scrub command to -# validate the integrity of the data (both the files and the parity info). -# Note that each run of the scrub command will validate only a (configurable) -# portion of parity info to avoid having a long running job and affecting -# the performance of the server. -# 6) Once all jobs are completed, it sends an email with the output to user -# (if configured). # # Project page: https://github.com/auanasgheps/snapraid-aio-script # -# Original by Zack Reed https://zackreed.me/snapraid-split-parity-sync-script/ -# + mtompkins https://gist.github.com/mtompkins/91cf0b8be36064c237da3f39ff5cc49d -# -SNAPSCRIPTVERSION="2.6.6" - +SNAPSCRIPTVERSION="2.7" ######################################################################## ###################### # USER VARIABLES # ###################### -####################### USER CONFIGURATION START ####################### - -# address where the output of the jobs will be emailed to. -EMAIL_ADDRESS="youremailgoeshere" - -# Set the threshold of deleted files to stop the sync job from running. -# NOTE that depending on how active your filesystem is being used, a low -# number here may result in your parity info being out of sync often and/or -# you having to do lots of manual syncing. -DEL_THRESHOLD=500 -UP_THRESHOLD=500 - -# Set number of warnings before we force a sync job. -# This option comes in handy when you cannot be bothered to manually -# start a sync job when DEL_THRESHOLD is breached due to false alarm. -# Set to 0 to ALWAYS force a sync (i.e. ignore the delete threshold above) -# Set to -1 to NEVER force a sync (i.e. need to manual sync if delete threshold is breached) -SYNC_WARN_THRESHOLD=-1 - -# Set percentage of array to scrub if it is in sync. -# i.e. 0 to disable and 100 to scrub the full array in one go -# WARNING - depending on size of your array, setting to 100 will take a very long time! -SCRUB_PERCENT=5 -SCRUB_AGE=10 - -# Prehash Data To avoid the risk of a latent hardware issue, you can enable the "pre-hash" mode and have all the -# data read two times to ensure its integrity. This option also verifies the files moved inside the array, to ensure -# that the move operation went successfully, and in case to block the sync and to allow to run a fix operation. -# 1 to enable, any other values to disable -PREHASH=1 - -# Set the option to log SMART info. 1 to enable, any other value to disable -SMART_LOG=1 - -# Set verbosity of the email output. TOUCH and DIFF outputs will be kept in the email, producing a potentially huge email. Keep this disabled for optimal reading -# You can always check TOUCH and DIFF outputs using the TMP file. -# 1 to enable, any other values to disable -VERBOSITY=0 - -# Set if disk spindown should be performed. Depending on your system, this may not work. 1 to enable, any other values to disable -SPINDOWN=0 - -# location of the snapraid binary -SNAPRAID_BIN="/usr/bin/snapraid" -# location of the mail program binary -MAIL_BIN="/usr/bin/mailx" - -function main(){ - - ###################### - # INIT VARIABLES # - ###################### - CHK_FAIL=0 - DO_SYNC=0 - EMAIL_SUBJECT_PREFIX="(SnapRAID on `hostname`)" - GRACEFUL=0 - SYNC_WARN_FILE="/tmp/snapRAID.warnCount" - SYNC_WARN_COUNT="" - TMP_OUTPUT="/tmp/snapRAID.out" - SNAPRAID_LOG="/var/log/snapraid.log" - # Capture time - SECONDS=0 - - # Build Services Array... - service_array_setup - - # Expand PATH for smartctl - PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin +# find the current path +CURRENT_DIR="$(dirname "${0}")" - # Determine names of first content file... - CONTENT_FILE=`grep -v '^$\|^\s*\#' /etc/snapraid.conf | grep snapraid.content | head -n 1 | cut -d " " -f2` +# import the config file for this script which contain user configuration +CONFIG_FILE=$CURRENT_DIR/script-config.sh +source $CONFIG_FILE - # Build an array of parity all files... - PARITY_FILES[0]=`grep -v '^$\|^\s*\#' /etc/snapraid.conf | grep snapraid.parity | head -n 1 | cut -d " " -f2` - IFS=$'\n' PARITY_FILES=(`cat /etc/snapraid.conf | grep "^[^#;]" | grep "^\([2-6z]-\)*parity" | cut -d " " -f 2 | tr ',' '\n'`) +######################################################################## -##### USER CONFIGURATION STOP ##### MAKE NO CHANGES BELOW THIS LINE #### +###################### +# MAIN SCRIPT # +###################### +function main(){ # create tmp file for output > $TMP_OUTPUT # Redirect all output to file and screen. Starts a tee process output_to_file_screen - + # timestamp the job echo "SnapRAID Script Job started [`date`]" echo "Running SnapRAID version $SNAPRAIDVERSION" - echo "SnapRAID Script version $SNAPSCRIPTVERSION" + echo "SnapRAID AIO Script version $SNAPSCRIPTVERSION" echo echo "----------------------------------------" mklog "INFO: ----------------------------------------" @@ -133,9 +42,30 @@ function main(){ mklog "INFO: SnapRAID Script version $SNAPSCRIPTVERSION" - # Remove any plex created anomalies echo "##Preprocessing" + # Check if script configuration file has been found + if [ ! -f $CONFIG_FILE ]; + then + echo "Script configuration file not found! The script cannot be run! Please check and try again!" + mklog "WARN: Script configuration file not found! The script cannot be run! Please check and try again!" + exit 1; + else + echo "Configuration file found! Proceeding." + mklog "INFO: Script configuration file found! Proceeding." + fi + + # install markdown if not present + if [ $(dpkg-query -W -f='${Status}' python-markdown 2>/dev/null | grep -c "ok installed") -eq 0 ]; + then + echo "**Markdown has not been found and will be installed.**" + mklog "WARN: Markdown has not been found and will be installed." + # super silent and secret install command + export DEBIAN_FRONTEND=noninteractive + apt-get install -qq -o=Dpkg::Use-Pty=0 python-markdown; + echo + fi + # sanity check first to make sure we can access the content and parity files mklog "INFO: Checking SnapRAID disks" sanity_check @@ -227,8 +157,8 @@ function main(){ # YES, first let's check if delete threshold has been breached and we have not forced a sync. if [ $CHK_FAIL -eq 1 -a $DO_SYNC -eq 0 ]; then # YES, parity is out of sync so let's not run scrub job - echo "Scrub job cancelled as parity info is out of sync (deleted or changed files threshold has been breached). [`date`]" - mklog "INFO: Scrub job cancelled as parity info is out of sync (deleted or changed files threshold has been breached)." + echo "Scrub job is cancelled as parity info is out of sync (deleted or changed files threshold has been breached). [`date`]" + mklog "INFO: Scrub job is cancelled as parity info is out of sync (deleted or changed files threshold has been breached)." else # NO, delete threshold has not been breached OR we forced a sync, but we have one last test - # let's make sure if sync ran, it completed successfully (by checking for our marker text "SYNC_JOB--" in the output). @@ -261,13 +191,24 @@ function main(){ echo "----------------------------------------" echo "##Postprocessing" - # Moving onto logging SMART info if enabled + # Show SnapRAID SMART info if enabled if [ $SMART_LOG -eq 1 ]; then echo $SNAPRAID_BIN smart close_output_and_wait output_to_file_screen + echo fi + + # Show SnapRAID Status information if enabled + if [ $SNAP_STATUS -eq 1 ]; then + echo + $SNAPRAID_BIN status + close_output_and_wait + output_to_file_screen + echo + fi + # Spinning down disks (Method 1: snapraid - preferred) if [ $SPINDOWN -eq 1 ]; then @@ -308,14 +249,14 @@ function main(){ echo echo "----------------------------------------" echo "##Total time elapsed for SnapRAID: $ELAPSED" - mklog "INFO: Total time elapsed for SnapRAID: $ELAPSED" + mklog "INFO: Total time elapsed for SnapRAID: $ELAPSED" # Add a topline to email body sed_me "1s:^:##$SUBJECT \n:" "${TMP_OUTPUT}" if [ $VERBOSITY -eq 1 ]; then send_mail_verbose else - send_mail + send_mail fi fi @@ -412,15 +353,15 @@ function chk_sync_warn(){ if [ $SYNC_WARN_COUNT -ge $SYNC_WARN_THRESHOLD ]; then # YES, lets force a sync job. Do not need to remove warning marker here as it is automatically removed when the sync job is run by this script - echo "Number of warning(s) ($SYNC_WARN_COUNT) has reached/exceeded threshold ($SYNC_WARN_THRESHOLD). Forcing a SYNC job to run. [`date`]" - mklog "INFO: Number of warning(s) ($SYNC_WARN_COUNT) has reached/exceeded threshold ($SYNC_WARN_THRESHOLD). Forcing a SYNC job to run." + echo "Number of threshold warning(s) ($SYNC_WARN_COUNT) has reached/exceeded threshold ($SYNC_WARN_THRESHOLD). Forcing a SYNC job to run. [`date`]" + mklog "INFO: Number of threshold warning(s) ($SYNC_WARN_COUNT) has reached/exceeded threshold ($SYNC_WARN_THRESHOLD). Forcing a SYNC job to run." DO_SYNC=1 else # NO, so let's increment the warning count and skip the sync job ((SYNC_WARN_COUNT += 1)) echo $SYNC_WARN_COUNT > $SYNC_WARN_FILE - echo "$((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) warning(s) till forced sync. NOT proceeding with SYNC job. [`date`]" - mklog "INFO: $((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) warning(s) till forced sync. NOT proceeding with SYNC job." + echo "$((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) threshold warning(s) until the next forced sync. NOT proceeding with SYNC job. [`date`]" + mklog "INFO: $((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) threshold warning(s) until the next forced sync. NOT proceeding with SYNC job." DO_SYNC=0 fi else