SnapRAID Split Parity Sync Script
SnapRAID v11 adds a sweet new feature, split parity. In the past, adding larger data disk always came with the issue of needing parity disks as large or larger than your data disks. For example, let’s say you add an array made up of (4) 4TB data disks and (1) 4TB parity disk. What if you want to buy one of those 6 or 8TB disks to use in your array? In the past, you could have either chosing to use the new larger disk as your new parity disk, or risk having part of your new disk not protected. With split parity, you could use the new 8TB disk as a data disk and then use (2) of your old 4TB disks, joined together as one complete set of parity (or, you could create parity in this scenario with (4) 2TB disks or even (8) 1TB disks). Pretty neat!
So, this would allow you going forward to add 6 or 8TB data disks and have all your data protected, without having to buy an extra one or two larger disks just to use on parity. Now that we’ve discussed split parity, how can we automate syncing like we did with my previous script? We can’t use that script as is because of the split parity files. So, I already had a modified version of my script, but when mtompkins presented his cleaned up version of my script, I thought I’d extend it for split parity and add a couple of extra functions. I present you now with the new split parity script (this version is setup for dual parity with 4 disks setup to complete split parity).
As a sidenote, I would love it if someone could provide a BASH method to read the snapraid.conf file and automatically build the array rather than having to manually set that up in the config. I fear with split parity, complex grepping may be over many user’s heads.
Here’s how I have the parity files setup from this example in my /etc/snapraid.conf file.
parity /mnt/split-parity/parity1-disk1/part1.parity,/mnt/split-parity/parity1-disk2/part2.parity
2-parity /mnt/split-parity/parity2-disk1/part1.parity,/mnt/split-parity/parity2-disk2/part2.parity
Here's the actual script...
#!/bin/bash
#######################################################################
# This is a helper script that keeps snapraid parity info in sync with
# your data and optionally verifies the parity info. Here's how it works:
# 1) Shuts down configured services
# 2) Calls diff to figure out if the parity info is out of sync.
# 3) If parity info is out of sync, AND the number of deleted or changed files exceed
# X (each configurable), it triggers an alert email and stops. (In case of
# accidental deletions, you have the opportunity to recover them from
# the existing parity info. This also mitigates to a degree encryption malware.)
# 4) If partiy info is out of sync, AND the number of deleted or changed files exceed X
# AND it has reached/exceeded Y (configurable) number of warnings, force
# a sync. (Useful when you get a false alarm above and you can't be bothered
# to login and do a manual sync. Note the risk is if its not a false alarm
# and you can't access the box before Y number of times the job is run to
# fix the issue... Well I hope you have other backups...)
# 5) If parity info is out of sync BUT the number of deleted files did NOT
# exceed X, it calls sync to update the parity info.
# 6) If the parity info is in sync (either because nothing changed or after it
# has successfully completed the sync job, it runs the scrub command to
# validate the integrity of the data (both the files and the parity info).
# Note that each run of the scrub command will validate only a (configurable)
# portion of parity info to avoid having a long running job and affecting
# the performance of the box.
# 7) Once all jobs are completed, it sends an email with the output to user
# (if configured).
#
# Inspired by Zack Reed (https://zackreed.me/articles/83-updated-snapraid-sync-script)
# Modified version of mtompkins version of my script (https://gist.github.com/mtompkins/91cf0b8be36064c237da3f39ff5cc49d)
#
#######################################################################
######################
# USER VARIABLES #
######################
####################### USER CONFIGURATION START #######################
# address where the output of the jobs will be emailed to.
EMAIL_ADDRESS="email_address@gmail.com"
# Set the threshold of deleted files to stop the sync job from running.
# NOTE that depending on how active your filesystem is being used, a low
# number here may result in your parity info being out of sync often and/or
# you having to do lots of manual syncing.
DEL_THRESHOLD=50
UP_THRESHOLD=500
# Set number of warnings before we force a sync job.
# This option comes in handy when you cannot be bothered to manually
# start a sync job when DEL_THRESHOLD is breached due to false alarm.
# Set to 0 to ALWAYS force a sync (i.e. ignore the delete threshold above)
# Set to -1 to NEVER force a sync (i.e. need to manual sync if delete threshold is breached)
#SYNC_WARN_THRESHOLD=3
SYNC_WARN_THRESHOLD=-1
# Set percentage of array to scrub if it is in sync.
# i.e. 0 to disable and 100 to scrub the full array in one go
# WARNING - depending on size of your array, setting to 100 will take a very long time!
SCRUB_PERCENT=8
SCRUB_AGE=10
# Set the option to log SMART info. 1 to enable, any other values to disable
SMART_LOG=1
# location of the snapraid binary
SNAPRAID_BIN="/usr/local/bin/snapraid"
# location of the mail program binary
MAIL_BIN="/usr/bin/mutt"
function main(){
######################
# INIT VARIABLES #
######################
CHK_FAIL=0
DO_SYNC=0
EMAIL_SUBJECT_PREFIX="(SnapRAID on `hostname`)"
GRACEFUL=0
SYNC_WARN_FILE="/tmp/snapRAID.warnCount"
SYNC_WARN_COUNT=""
TMP_OUTPUT="/tmp/snapRAID.out"
# Capture time
SECONDS=0
###############################
# MANAGE DOCKER CONTAINERS #
###############################
# Set to 0 to not manage any containers.
MANAGE_SERVICES=1
# Containers to manage (separated with spaces).
SERVICES='sabnzbd sonarr radarr lidarr plex unifi-controller'
# Build Services Array...
service_array_setup
# Expand PATH for smartctl
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
# Determine names of first content file...
CONTENT_FILE=`grep -v '^$\|^\s*\#' /etc/snapraid.conf | grep snapraid.content | head -n 1 | cut -d " " -f2`
# Build an array of parity all files...
IFS=$'\n' PARITY_FILES=(`cat /etc/snapraid.conf | grep "^[^#;]" | grep "^\([2-6z]-\)*parity" | cut -d " " -f 2 | tr ',' '\n'`)
##### USER CONFIGURATION STOP ##### MAKE NO CHANGES BELOW THIS LINE ####
# create tmp file for output
> $TMP_OUTPUT
# Redirect all output to file and screen. Starts a tee process
output_to_file_screen
# timestamp the job
echo "SnapRAID Script Job started [`date`]"
echo
echo "----------------------------------------"
# Remove any plex created anomolies
echo "##Preprocessing"
# Stop any services that may write to the array during sync
if [ $MANAGE_SERVICES -eq 1 ]; then
echo "###Stop Services [`date`]"
stop_services
fi
# sanity check first to make sure we can access the content and parity files
sanity_check
echo
echo "----------------------------------------"
echo "##Processing"
# Fix timestamps
chk_zero
# run the snapraid DIFF command
echo "###SnapRAID DIFF [`date`]"
$SNAPRAID_BIN diff
# wait for the above cmd to finish, save output and open new redirect
close_output_and_wait
output_to_file_screen
echo
echo "DIFF finished [`date`]"
JOBS_DONE="DIFF"
# Get number of deleted, updated, and modified files...
get_counts
# sanity check to make sure that we were able to get our counts from the output of the DIFF job
if [ -z "$DEL_COUNT" -o -z "$ADD_COUNT" -o -z "$MOVE_COUNT" -o -z "$COPY_COUNT" -o -z "$UPDATE_COUNT" ]; then
# failed to get one or more of the count values, lets report to user and exit with error code
echo "**ERROR** - failed to get one or more count values. Unable to proceed."
echo "Exiting script. [`date`]"
if [ $EMAIL_ADDRESS ]; then
SUBJECT="$EMAIL_SUBJECT_PREFIX WARNING - Unable to proceed with SYNC/SCRUB job(s). Check DIFF job output."
send_mail
fi
exit 1;
fi
echo
echo "**SUMMARY of changes - Added [$ADD_COUNT] - Deleted [$DEL_COUNT] - Moved [$MOVE_COUNT] - Copied [$COPY_COUNT] - Updated [$UPDATE_COUNT]**"
echo
# check if the conditions to run SYNC are met
# CHK 1 - if files have changed
if [ $DEL_COUNT -gt 0 -o $ADD_COUNT -gt 0 -o $MOVE_COUNT -gt 0 -o $COPY_COUNT -gt 0 -o $UPDATE_COUNT -gt 0 ]; then
chk_del
if [ $CHK_FAIL -eq 0 ]; then
chk_updated
fi
if [ $CHK_FAIL -eq 1 ]; then
chk_sync_warn
fi
else
# NO, so let's skip SYNC
echo "No change detected. Not running SYNC job. [`date`] "
DO_SYNC=0
fi
# Now run sync if conditions are met
if [ $DO_SYNC -eq 1 ]; then
echo "###SnapRAID SYNC [`date`]"
$SNAPRAID_BIN sync -q
#wait for the job to finish
close_output_and_wait
output_to_file_screen
echo "SYNC finished [`date`]"
JOBS_DONE="$JOBS_DONE + SYNC"
# insert SYNC marker to 'Everything OK' or 'Nothing to do' string to differentiate it from SCRUB job later
sed_me "s/^Everything OK/SYNC_JOB--Everything OK/g;s/^Nothing to do/SYNC_JOB--Nothing to do/g" "$TMP_OUTPUT"
# Remove any warning flags if set previously. This is done in this step to take care of scenarios when user
# has manually synced or restored deleted files and we will have missed it in the checks above.
if [ -e $SYNC_WARN_FILE ]; then
rm $SYNC_WARN_FILE
fi
echo
fi
# Moving onto scrub now. Check if user has enabled scrub
if [ $SCRUB_PERCENT -gt 0 ]; then
# YES, first let's check if delete threshold has been breached and we have not forced a sync.
if [ $CHK_FAIL -eq 1 -a $DO_SYNC -eq 0 ]; then
# YES, parity is out of sync so let's not run scrub job
echo "Scrub job cancelled as parity info is out of sync (deleted or changed files threshold has been breached). [`date`]"
else
# NO, delete threshold has not been breached OR we forced a sync, but we have one last test -
# let's make sure if sync ran, it completed successfully (by checking for our marker text "SYNC_JOB--" in the output).
if [ $DO_SYNC -eq 1 -a -z "$(grep -w "SYNC_JOB-" $TMP_OUTPUT)" ]; then
# Sync ran but did not complete successfully so lets not run scrub to be safe
echo "**WARNING** - check output of SYNC job. Could not detect marker . Not proceeding with SCRUB job. [`date`]"
else
# Everything ok - let's run the scrub job!
echo "###SnapRAID SCRUB [`date`]"
$SNAPRAID_BIN scrub -p $SCRUB_PERCENT -o $SCRUB_AGE -q
#wait for the job to finish
close_output_and_wait
output_to_file_screen
echo "SCRUB finished [`date`]"
echo
JOBS_DONE="$JOBS_DONE + SCRUB"
# insert SCRUB marker to 'Everything OK' or 'Nothing to do' string to differentiate it from SYNC job above
sed_me "s/^Everything OK/SCRUB_JOB--Everything OK/g;s/^Nothing to do/SCRUB_JOB--Nothing to do/g" "$TMP_OUTPUT"
fi
fi
else
echo "Scrub job is not enabled. Not running SCRUB job. [`date`] "
fi
echo
echo "----------------------------------------"
echo "##Postprocessing"
# Moving onto logging SMART info if enabled
if [ $SMART_LOG -eq 1 ]; then
echo
$SNAPRAID_BIN smart
close_output_and_wait
output_to_file_screen
fi
#echo "Spinning down disks..."
#$SNAPRAID_BIN down
# Graceful restore of services outside of trap - for messaging
GRACEFUL=1
if [ $MANAGE_SERVICES -eq 1 ]; then
restore_services
fi
echo "All jobs ended. [`date`] "
# all jobs done, let's send output to user if configured
if [ $EMAIL_ADDRESS ]; then
echo -e "Email address is set. Sending email report to **$EMAIL_ADDRESS** [`date`]"
# check if deleted count exceeded threshold
prepare_mail
ELAPSED="$(($SECONDS / 3600))hrs $((($SECONDS / 60) % 60))min $(($SECONDS % 60))sec"
echo
echo "----------------------------------------"
echo "##Total time elapsed for SnapRAID: $ELAPSED"
# Add a topline to email body
sed_me "1s/^/##$SUBJECT \n/" "${TMP_OUTPUT}"
send_mail
fi
#clean_desc
exit 0;
}
#######################
# FUNCTIONS & METHODS #
#######################
function sanity_check() {
if [ ! -e $CONTENT_FILE ]; then
echo "**ERROR** Content file ($CONTENT_FILE) not found!"
exit 1;
fi
echo "Testing that all parity files are present."
for i in "${PARITY_FILES[@]}"
do
if [ ! -e $i ]; then
echo "[`date`] ERROR - Parity file ($i) not found!"
echo "ERROR - Parity file ($i) not found!" >> $TMP_OUTPUT
exit 1;
fi
done
echo "All parity files found. Continuing..."
}
function get_counts() {
DEL_COUNT=$(grep -w '^ \{1,\}[0-9]* removed' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
ADD_COUNT=$(grep -w '^ \{1,\}[0-9]* added' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
MOVE_COUNT=$(grep -w '^ \{1,\}[0-9]* moved' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
COPY_COUNT=$(grep -w '^ \{1,\}[0-9]* copied' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
UPDATE_COUNT=$(grep -w '^ \{1,\}[0-9]* updated' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
}
function sed_me(){
# Close the open output stream first, then perform sed and open a new tee process and redirect output.
# We close stream because of the calls to new wait function in between sed_me calls.
# If we do not do this we try to close Processes which are not parents of the shell.
exec >&$out 2>&$err
$(sed -i "$1" "$2")
output_to_file_screen
}
function chk_del(){
if [ $DEL_COUNT -lt $DEL_THRESHOLD ]; then
# NO, delete threshold not reached, lets run the sync job
echo "There are deleted files. The number of deleted files, ($DEL_COUNT), is below the threshold of ($DEL_THRESHOLD). SYNC Authorized."
DO_SYNC=1
else
echo "**WARNING** Deleted files ($DEL_COUNT) exceeded threshold ($DEL_THRESHOLD)."
CHK_FAIL=1
fi
}
function chk_updated(){
if [ $UPDATE_COUNT -lt $UP_THRESHOLD ]; then
echo "There are updated files. The number of updated files, ($UPDATE_COUNT), is below the threshold of ($UP_THRESHOLD). SYNC Authorized."
DO_SYNC=1
else
echo "**WARNING** Updated files ($UPDATE_COUNT) exceeded threshold ($UP_THRESHOLD)."
CHK_FAIL=1
fi
}
function chk_sync_warn(){
if [ $SYNC_WARN_THRESHOLD -gt -1 ]; then
echo "Forced sync is enabled. [`date`]"
SYNC_WARN_COUNT=$(sed 'q;/^[0-9][0-9]*$/!d' $SYNC_WARN_FILE 2>/dev/null)
SYNC_WARN_COUNT=${SYNC_WARN_COUNT:-0} #value is zero if file does not exist or does not contain what we are expecting
if [ $SYNC_WARN_COUNT -ge $SYNC_WARN_THRESHOLD ]; then
# YES, lets force a sync job. Do not need to remove warning marker here as it is automatically removed when the sync job is run by this script
echo "Number of warning(s) ($SYNC_WARN_COUNT) has reached/exceeded threshold ($SYNC_WARN_THRESHOLD). Forcing a SYNC job to run. [`date`]"
DO_SYNC=1
else
# NO, so let's increment the warning count and skip the sync job
((SYNC_WARN_COUNT += 1))
echo $SYNC_WARN_COUNT > $SYNC_WARN_FILE
echo "$((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) warning(s) till forced sync. NOT proceeding with SYNC job. [`date`]"
DO_SYNC=0
fi
else
# NO, so let's skip SYNC
echo "Forced sync is not enabled. Check $TMP_OUTPUT for details. NOT proceeding with SYNC job. [`date`]"
DO_SYNC=0
fi
}
function chk_zero(){
echo "###SnapRAID TOUCH [`date`]"
echo "Checking for zero sub-second files."
TIMESTATUS=$($SNAPRAID_BIN status | grep 'You have [1-9][0-9]* files with zero sub-second timestamp\.' | sed 's/^You have/Found/g')
if [ -n "$TIMESTATUS" ]; then
echo "$TIMESTATUS"
echo "Running TOUCH job to timestamp. [`date`]"
$SNAPRAID_BIN touch
close_output_and_wait
output_to_file_screen
echo "TOUCH finished [`date`]"
else
echo "No zero sub-second timestamp files found."
fi
}
function service_array_setup() {
if [ -z "$SERVICES" ]; then
echo "Please configure serivces"
else
echo "Setting up service array"
read -a service_array <<<$SERVICES
fi
}
function stop_services(){
for i in ${service_array[@]}; do
echo "Pausing Service - ""${i^}";
/usr/bin/docker pause $i
done
}
function restore_services(){
for i in ${service_array[@]}; do
echo "Unpausing Service - ""${i^}";
/usr/bin/docker unpause $i
done
if [ $GRACEFUL -eq 1 ]; then
return
fi
clean_desc
exit
}
function clean_desc(){
# Cleanup file descriptors
exec 1>&{out} 2>&{err}
# If interactive shell restore output
[[ $- == *i* ]] && exec &>/dev/tty
}
function prepare_mail() {
if [ $CHK_FAIL -eq 1 ]; then
if [ $DEL_COUNT -gt $DEL_THRESHOLD -a $DO_SYNC -eq 0 ]; then
MSG="Deleted Files ($DEL_COUNT) / ($DEL_THRESHOLD) Violation"
fi
if [ $DEL_COUNT -gt $DEL_THRESHOLD -a $UPDATE_COUNT -gt $UP_THRESHOLD -a $DO_SYNC -eq 0 ]; then
MSG="$MSG & "
fi
if [ $UPDATE_COUNT -gt $UP_THRESHOLD -a $DO_SYNC -eq 0 ]; then
MSG="$MSG Changed Files ($UPDATE_COUNT) / ($UP_THRESHOLD) Violation"
fi
SUBJECT="[WARNING] $SYNC_WARN_COUNT - ($MSG) $EMAIL_SUBJECT_PREFIX"
elif [ -z "${JOBS_DONE##*"SYNC"*}" -a -z "$(grep -w "SYNC_JOB-" $TMP_OUTPUT)" ]; then
# Sync ran but did not complete successfully so lets warn the user
SUBJECT="[WARNING] SYNC job ran but did not complete successfully $EMAIL_SUBJECT_PREFIX"
elif [ -z "${JOBS_DONE##*"SCRUB"*}" -a -z "$(grep -w "SCRUB_JOB-" $TMP_OUTPUT)" ]; then
# Scrub ran but did not complete successfully so lets warn the user
SUBJECT="[WARNING] SCRUB job ran but did not complete successfully $EMAIL_SUBJECT_PREFIX"
else
SUBJECT="[COMPLETED] $JOBS_DONE Jobs $EMAIL_SUBJECT_PREFIX"
fi
}
function send_mail(){
# Format for markdown
sed_me "s/$/ /" "$TMP_OUTPUT"
$MAIL_BIN -s "$SUBJECT" "$EMAIL_ADDRESS" < $TMP_OUTPUT
#docker run --rm msmtptest bash -c 'printf "Subject: $SUBJECT" | msmtp $EMAIL_ADDRESS'
}
#Due to how process substitution and newer bash versions work, this function stops the output stream which allows wait stops wait from hanging on the tee process.
#If we do not do this and use normal 'wait' the processes will wait forever as newer bash version will wait for the process substitution to finish.
#Probably not the best way of 'fixing' this issue. Someone with more knowledge can provide better insight.
function close_output_and_wait(){
exec >&$out 2>&$err
wait $(pgrep -P "$$")
}
# Redirects output to file and screen. Open a new tee process.
function output_to_file_screen(){
# redirect all output to screen and file
exec {out}>&1 {err}>&2
# NOTE: Not preferred format but valid: exec &> >(tee -ia "${TMP_OUTPUT}" )
exec > >(tee -a "${TMP_OUTPUT}") 2>&1
}
# Set TRAP
trap restore_services INT EXIT
main "$@"
Hey Zack, it is still possible to use this script for just a standard dual parity setup? I plan on eventually using a split parity in the future but for now I was just wondering if I could use this new script after upgrading to Snapraid V11. I’m using your older script at the moment on Snapraid V10.
Great question. You can’t just drop this in and have it work, but with a very small modification, it should work fine. You would need to remove these lines.
and replace them with something like this instead.
or… if you have dual parity…
You just need to adjust the names of the parity files that you have setup in your snapraid.conf file, and that should be it 🙂
Thanks for the easy explanation. Can’t wait to give it a go.
Is there any reason it unpauses the containers twice?
It only unpauses once, unless you have an error in your script so it doesn’t exit gracefully. Here’s the output of my run last night.
FYI, it happened again. I don’t know what the heck is causing this… because I also get a “normal” email that shows it worked properly. The cronn process seems to be thinking it failed… and I guess it did, at least partially since it hit the 2nd restart and they fail.
The 1st email fires off successfully and then the 2nd fires off when it failed immediately after when it tries to restart again. I think ti doesn’t send me off the fail email if I manually run it as it isn’t run automatically causing the cron failure email to fire off.
OK sorry to spam your comments, but I did find that somehow the trap keeps being triggered but no idea how. I edited it to show “trap triggered” by replacing the function and it triggered at the end again. So I figured it looked like there is an issue /w the clean_desc function on my system, but I can’t tell why? So I tried replacing the trap /w…
trap 'err_report $LINENO' ERR
but now it isn’t triggering the trap. I did try changing to trap to a new function i had to echo trap triggered and it always did it at the end /w clean_desc . Since that worked for now I’ll keep it that way, and see if eventually it does trigger that so I can see where the line is that it errors on.
So odd, I did a notepad++ txt compare and the only 2 changes were the services and the the thresholds… yet I repasted it and it worked OK this time. Maybe a line break or something got messed up, no idea because even notepad++ couldn’t tell a difference. That was with a forced run, so we’ll see overnight I guess
Thanks for the info. Keep me posted.
Hi Zack and Dulanic,
First of all, thank you Zack for sharing this script!
To you both: Did you gain anymore insight in this issue? in my testing it seems that the trap gets executed at a clean exit as well (I did not find any errors or weird exit values from commands at least). Separating the service_restore and clean_desc seems to fix it for me, see patch below. It relies on the trap being executed on the intended exit call in main. This was tested on Ubuntu 18.04, bash version 4.4.20(1)-release. What are your thoughts on this, should the trap not be executed on the exit call in main (row 285 above)?
Also worth noting is that during my debugging a ran shellcheck and corrected some of the warnings but it did not make a difference in this case. The patch below might look a bit weird due to it though, so please tell me if I should post one that is directly applicable to the script above.
This looks like a good approach. I just patched a test version of this script on my local version. I’m also on Ubuntu 18.04 with mine with a bash version 4.4.20(1)-release. The trap order seems to work fine, and logically makes sense. Thank you for this! I will continue to test, but this seems to be a good fix. I’d love to hear back from others with newer BASH versions to see if this remedies their issues.
Ok, thanks for looking into this and glad I could help!
So I have been using your script and mergerfs setup for a month now and twice I have run into an issue twice where it says something along the lines of…
I have a feeling it is likely due to a file that mergerfs assigns to d4 that is either later moved or deleted between syncs. This is frustrating and the force empty sync takes forever. Is there any way around this?
FYI what is the right way to post code? lol I keep trying things but can’t figure it out.
You need to wrap code in open and close pre tags.
If you are running into needing to use –force-empty sync, you really need to check the files in question. In this case, this means that all of the files that were on /mnt/data/disk4 have been removed or have been deleted. If you haven’t completely removed all content from that disk, this means that something has went wrong. For example, a disk that didn’t mount, or a script that’s deleted all files, etc.
To put this in perspective, I have never had to run –force-empty-sync unless, I have manually removed all files on a disk. I would do some more investigation.
It looks like it was because of torrents soon after a disk got full… it would sync and then later on the files would be removed and boom that error showed up. What I did was add a DONOTDELETE.txt file to all of the drive to avoid the issue.
I would suggest handling all of your torrent downloading outside of SnapRAID. Or, just make a torrents folder for in process files, and then move the final, completed files to a location that SnapRAID is including. That way you don’t run the risk of a failed restore because of some missing files that were there during your last sync.
Hi Zack,
I try to use your script on CentOS 7 and check all path this are the same like into ubuntu.
But if i want to run the script i get the follow error:
[root@media scripts]# sh snapraid_diff_n_sync.sh
snapraid_diff_n_sync.sh: line 121: syntax error near unexpected token `>’
snapraid_diff_n_sync.sh: line 121: ` exec > >(tee -ia “${TMP_OUTPUT}” ) 2>&1′
I only modified the script so i can use dual parity, for the rest it is untouched.
Did you try the commented out method above to see if that works instead (obviously comment out the current exec line)?
Zack –
Thanks for your work on this, it’s been working very well for many months now! I have two related issues I need to ask you about. The background is that I’ve recently begun using Zoneminder for video surveillance. I have a single camera and keep recorded video for about 15 days. The way Zoneminder works is that it saves tens of thousands of files, sometimes hundreds of thousands of files per day. Then, after 15 days, those files are deleted in order to save space. This operation happens daily with just one camera. This would experience significant growth as cameras are added.
You probably know where I’m going with this – your script sees these as massive changes to the system and tries to run the Snapraid operation but I’m receiving the following error:
…
remove zoneminder/events/2/17/07/10/.1119
remove zoneminder/events/2/17/07/10/.1120
remove zoneminder/events/2/17/07/10/.1121
remove zoneminder/events/2/17/07/10/.1122
remove zoneminder/events/2/17/07/10/.1123
613286 equal
63606 added
15725 removed
1 updated
0 moved
0 copied
0 restored
There are differences!
DIFF finished [Tue Jul 25 23:30:46 PDT 2017]
**ERROR** – failed to get one or more count values. Unable to proceed.
Exiting script. [Tue Jul 25 23:30:46 PDT 2017]
As you can see, 63k+ files added, 15k+ files deleted but I’m not sure why the script is erroring out. Can you help with this? I’ve set my delete threshold to 10,000 but it’s still too low.
The second question is there any way to exclude a folder from being included in the delete threshold count? With that method I can exclude my Zoneminder folder and avoid triggering the threshold nightly. Also since I have to increase the threshold, this opens opportunity for script to continue even if many files have been deleted in error.
Again thanks for your great work!
Chad
Thank you for the kind words! The only way to figure out why it’s erroring is to re-create the script by hand (run a snapraid diff, and then run the grep’s that the script runs). I would think it’s due to the massive amount of adds/removes throwing my greps off.
As a sidenote, I would strongly suggest you manage Zoneminder outside of SnapRAID (either moving them to a different setup (maybe a difference ZFS mirror array or exclude that path in SnapRAID). Rapidly changing files are not what SnapRAID is designed for at all. Rapidly changing filesystems can make recovery suffer/not work in the event of a disk failure.
Agreed. As I think about the use case of SnapRAID, it makes sense that it’s not designed for what I’m trying to do. Therefore I’m trying to exclude the folder but it’s not working and I think I’m not understanding the exclude option properly. In snapraid.conf I’m simply trying to ‘exclude /mnt/storage/zoneminder/’ but it’s not working neither is ‘exclude /mnt/storage/zoneminder/*”. Reading through the docs, it seems as though it doesn’t exclude recursively through all the subfolders (and Zoneminder has ALOT of them – one for each day AND timestamp). I saw a post where a user was able to exclude using ‘exclude /rootfolder/subfolder/*/*.jpg’ and ‘exclude /rootfolder/subfolder/*/*/*.jpg” and so on but that seems tedious.
Any idea on how to handle this situation? My /mnt/storage is my main array and I want to keep my video storage on that.
Good question. The exclude is actually really easy. It is a relative path in the array. So, if your folder is at /mnt/storage/zoneminder, you would exclude that whole folder recursivly by adding just this one line to your snapraid.conf file.
That did it, thanks!
I’m glad that got it working for you 🙂
Hey Zach, Great work all around! I was hoping to get some advice on how to modify this script. Does the output get written locally to a log file somewhere, or is it only e-mailed? If it is logged locally, where? If not, how would I set the output to be recorded locally? Also, I don’t want daily e-mails every time it runs successfully, I only want an e-mail if it fails. the prepare_mail function has a number of if statements that change the subject based on warning conditions. I’d like to set something up so it only send_mail if subject contains warning? something along those lines. Any advice?
Hello! Thanks for the kind words. The script does write to a local file, you can see that in the INIT variables (/tmp/snapRAID.out). If you don’t want to receive emails unless it fails, you will need to edit that send_mail function to contain an if statement (I’m writing this from my phone, so this is untested).
Honestly, I like the nightly emails though. It ensures that the script ran correctly, and didn't silently fail. That way I always KNOW my data is safe. I hope that helps.
Thanks for the script.
Does this script spin down disks? If so, how do I disable this?
Yes it does. You can see on this line.
Just comment it out…
i’m having an issue with line 127 where services need to be stopped… it says ‘missing’ not sure what to do everything else works using triple parity so disabled make service array
` # Stop any services that may inhibit optimum execution
if [ $MANAGE_SERVICES -eq 1 ]; then
echo “###Stop Services [`date`]”
stop_services`
That line is for managing Docker containers. Are you using Docker containers, and if so, would you like to stop them? If so, you need to add the names of the services you’d like to stop to this line.
If you are not using Docker containers, just change this line from a 1 to a 0.
As a general rule, it’s a good idea to read through the commented lines in any script your are using to try to understand roughly how it works. All of the configuration options are at the top and I have tried to provide comments for every option.
I hope that helps,
i got the script working now … but when i do crontab -e then add the script …. the script runs but it doesn’t send any emails… can you assist. Mutt is setup correctly. Emails do get sent if i run the script from terminal
# Run a SnapRAID diff and then sync
30 23 * * * /root/Backup/snapraid.sh
Have you chmod’d the snapraid.sh script to make it executable, and it’s it running under the root user’s crontab -e or your regular user (it needs to be root)?
yes the script is chmod +x. I setup a cronjob thru webmin as a test to have it run immediately the script runs but no email is sent
As I said in my email, the script appears to be working fine. It looks like you need to properly configure ssmtp to work with gmail and Mutt. Once that is done, the email should work fine 🙂
Hey, not sure if this helps but this might be a solution for supporting different parity setups
https://gist.github.com/nerdfury/7b5de21e8f8c54616feca73638f97fe1#file-snapraid-sh-L106
should work with parity, 2-parity and z-parity options
Hi thanks for this. Can you please explain how would i adjust for my single parity setup?
content /var/snapraid.content
content /mnt/disk-3tb1/snapraid.content
content /mnt/disk-3tb2/snapraid.content
content /mnt/disk-3tb3/snapraid.content
content /mnt/disk-3tb4/snapraid.content
content /mnt/disk-4tb1/snapraid.content
content /mnt/disk-6tb1/snapraid.content
content /mnt/disk-8tb1/snapraid.content
content /mnt/disk-8tb2/snapraid.content
content /mnt/disk-8tb3/snapraid.content
data d1 /mnt/disk-3tb1/
data d2 /mnt/disk-3tb2/
data d3 /mnt/disk-3tb3/
data d4 /mnt/disk-3tb4/
data d5 /mnt/disk-4tb1/
data d6 /mnt/disk-6tb1/
data d7 /mnt/disk-8tb1/
data d8 /mnt/disk-8tb2/
data d9 /mnt/disk-8tb3/
parity /mnt/parity/snapraid.parity
How about you just use the other version of my script 🙂
https://zackreed.me/updated-snapraid-sync-script/
Hi Zack,
First of all thanks for all your helpful posts. You saved me a ton of research.
Secondly, my question: You directed this guy to your old post. Does that mean this script is not optimal for a single parity setup? I have 4 3TB drives with 1 of them as parity. Is the old script better for my setup?
I’m happy that this was helpful to you! This script will work fine, it just needs to be modified slightly to work with single parity.
Any issues using this script with Debian? I have had it working with Ubuntu, but no luck so far in Debian.
Get the following errors when executing the script:
/root/scripts/snapraid_diff_n_sync.sh: line 308: unexpected EOF while looking for matching `)’
/root/scripts/snapraid_diff_n_sync.sh: line 457: syntax error: unexpected end of file
I have adjusted to dual parity as per your instructions in a previous post. But don’t think that has anything to do with it?
Line 308: UPDATE_COUNT=$(grep -w ‘^ \{1,\}[0-9]* updated
Add a ‘ after each of the following => removed, added, moved, updated and copied. Move commands onto a single line. You should get the following:
Hey,
First off, great script Zack, thank you for publishing it. It’s come in very handy.
Secondly, I was using this version of the script on Debian 9 and it worked without issue. Needed some minor formatting but noting major. I upgraded to Debian 10 and the wait commands got stuck waiting forever. I know the simple solution was to probably use the old version of the script but I overlooked this until just now.
So if anybody out there is running Debian 10 or a newer version of bash, not quite sure at what version this kicks in but I’m using > 5.0 on Debian 10, so at least that version or greater and is running into the same issue. I’d advise to you to look here as to why:
https://unix.stackexchange.com/questions/530457/script-hanging-when-using-tee-and-wait-why
Save yourself the headache and use the old script.
https://zackreed.me/updated-snapraid-sync-script/
I didn’t realize that. It would be nice to actually fix this script so that it works. It seems like mosvy’s fix would be like this…
Sorry for the late reply. Here is how I have butchered your script but it seems to work: https://pastebin.com/PBzrBXq0
I have only tested the ‘happy path’ flow.
Your above solution solves the initial issue but what was happening to me was that, anytime sed_me is called it redirects again and opens a new tee process and then the next time we hit a wait, it’s back to square 1 and we wait forever. Does your above fix solve both problems?
What I think I’ve done is essentially kill the tee process before each wait and open a new one after waiting.
Hello 🙂 No, my above solution was just proposed because on the solution offered on Stack Exchange. You are correct, the sed_me function would put you right back to square one. And, I wouldn’t call what you did a butchering. You made new functions, and it looks pretty good. Also, your assumption is correct, you are allowing the tee process to close before each wait. That’s why it is working. Good job!
Glad I could help. Thank you again for publishing 🙂
It took me about a month to realize I was not getting any mails from Snapraid and then I remembered it was about a month since I updated Proxmox, bash probably got updated too.
Your changes sorted it for me, thanks!
Hi, thanks for the great work! There’s a syntax error with the IFS variable on line 110 that renders the script unusable. I don’t know what that variable’s supposted to be, so I’m stuck.
Hello, I’d suggest that you read the comments below. I’m guessing that you are running a new version of BASH.
Nice script. My only suggestion is to either default to sync -h for hashing or to leave a comment. I feel that hashing is almost necessary if you aren’t using ECC memory.
Thank you! That’s super easy to add. I have just updated the script to add that as another user option.
Love the script, thanks! One thing I’m wondering is how to setup smtp authentication. I’m using Gmail, so I have to authenticate. Is this something I can setup?
Never mind! Found your article: https://zackreed.me/send-system-email-with-gmail-and-ssmtp/
I’m glad that you got it figured out 🙂
It looks like there is a type in the script on line 116 which breaks the script:
The IFS= is never used in the script i think, should it even be there?
Line 16
IFS=
\n’ PARITY_FILES=(`cat /etc/snapraid.conf | grep “^[^#;]” | grep “^\([2-6z]-\)*parity” | cut -d ” ” -f 2 | tr ‘,’ ‘\n’`)
Thanks for letting me know. That is supposed to be there. WordPress just through in a random line break after my last edit. It is supposed to be like this…
IFS stands for “internal field separator”. It is used by the shell to determine how to do word splitting, i. e. how to recognize word boundaries.
Hi (again) Zack,
When doing some debugging of another issue (see comment originally made by Dulanic about dual un-pausing of docker services) if found that when waiting for processes to exit, sometimes the value of the pgrep in “close_output_and_wait()” would be invalid (“not a valid PID error” thrown from wait). This would happen for example with a diff that does not have changes (and thus runs very shortly?) A temporary workaround for me was:
But preferably one would instead save the PID when spawning the process. Does this seem reasonable?
Maybe I’m dense, but I don’t have, or see, a close_output_and_wait function in my version of the script. Where did you use this? And, thanks for all the questions and ideas!
Oh, sorry for the confusion. I failed to mention that I incorporated the changes proposed by “sburke” in the comment from July 26, 2019 where this link was originally posted: https://pastebin.com/PBzrBXq0
That proposal has the “close_output_and_wait” function.
Hi gang, I’m using OpenMediaVault for my NAS software. I have 2 2Tb drive and another 2 2Tb drive for Snapraid parity. I also have a 6Tb hard drive that I backup directly to.
Would this script be an overkill for my usage? I have very little personal documents saved on my NAS. The storage is mostly for my movie and music collection.
If I can use this script, would anyone have a good tutorial on how I could make this script work in OMV? Right now, I just run a snapraid sync every 2 hours.
Thanks.
Hello, this script “should” work fine in OMV as it is just a Debian based distribution just like Ubuntu. So, you’d need to paste the contents of this script into a file, change the variables to the correct email addresses and disk locations, finally, you’ll want to chmod +x your script and add it to your crontab. Before you do any of that, just get the script setup and try to run it outside of crontab first. Below I saved the script to /root/scripts/SnapRAID-sync-script
Hi Zack, been using snapraid fora few years and everything was working as expected, now i had to setup again with stronger encryption.
Copied over my scripts, set up ssmtp etc. and first snap sync took a while but finished without errors.
Now a testrun on my old working snapraid_diff_n_sync.sh hangs after comparing.
Rechecked the file but i do not see any errors. Also tried your new (April) modified script but same outcome.
Running it with, bash +x snapraid_diff_n_sync.sh shows a wait.
Any ideas?
hey sorry for the formatting, cannot edit post
My previous good running install was on 18.04 and the upgrade to 20.04, i had backups of config files to compare changes but i would have never guessed it has something todo with a bash upgrade that happened as written in https://zackreed.me/snapraid-split-parity-sync-script/#comment-300
Hi, been trying to figure out how to use your script for single parity.
If i’m not mistaken, wouldn’t the script (as it is today) work for both single parity and split parity setups?
As far as i can see it do work on my single parity setup. Great script.
I get a small “error” in the bottom where it’s trying to restart the container services even though i’ve set the MANAGE_SERVICES=0, can’t really figure out why it is running, i seems correct ??
Hello 🙂 Yes, this script, as written will work well with any level of parity or split parity. I’m not sure how docker would be trying to restart services if MANAGE_SERVICES=0. Those functions shouldn’t execute. I don’t see behavior on my own Ubuntu 20.04 server.
You do not explain what the variables mean or do. I can guess some of them but it would be good if you added that info.
CHK_FAIL=0
DO_SYNC=0
GRACEFUL=0
SECONDS=0