SnapRAID Split Parity Sync Script

SnapRAID v11 adds a sweet new feature, split parity. In the past, adding larger data disk always came with the issue of needing parity disks as large or larger than your data disks. For example, let’s say you add an array made up of (4) 4TB data disks and (1) 4TB parity disk. What if you want to buy one of those 6 or 8TB disks to use in your array? In the past, you could have either chosing to use the new larger disk as your new parity disk, or risk having part of your new disk not protected. With split parity, you could use the new 8TB disk as a data disk and then use (2) of your old 4TB disks, joined together as one complete set of parity (or, you could create parity in this scenario with (4) 2TB disks or even (8) 1TB disks). Pretty neat!

So, this would allow you going forward to add 6 or 8TB data disks and have all your data protected, without having to buy an extra one or two larger disks just to use on parity. Now that we’ve discussed split parity, how can we automate syncing like we did with my previous script? We can’t use that script as is because of the split parity files. So, I already had a modified version of my script, but when mtompkins presented his cleaned up version of my script, I thought I’d extend it for split parity and add a couple of extra functions. I present you now with the new split parity script (this version is setup for dual parity with 4 disks setup to complete split parity).

As a sidenote, I would love it if someone could provide a BASH method to read the snapraid.conf file and automatically build the array rather than having to manually set that up in the config. I fear with split parity, complex grepping may be over many user’s heads.

Here’s how I have the parity files setup from this example in my /etc/snapraid.conf file.

parity /mnt/split-parity/parity1-disk1/part1.parity,/mnt/split-parity/parity1-disk2/part2.parity
2-parity /mnt/split-parity/parity2-disk1/part1.parity,/mnt/split-parity/parity2-disk2/part2.parity
Here's the actual script...

Bash
#!/bin/bash

#######################################################################
# This is a helper script that keeps snapraid parity info in sync with
# your data and optionally verifies the parity info. Here's how it works:
#   1) Shuts down configured services
#   2) Calls diff to figure out if the parity info is out of sync.
#   3) If parity info is out of sync, AND the number of deleted or changed files exceed
#      X (each configurable), it triggers an alert email and stops. (In case of
#      accidental deletions, you have the opportunity to recover them from
#      the existing parity info. This also mitigates to a degree encryption malware.)
#   4) If partiy info is out of sync, AND the number of deleted or changed files exceed X
#      AND it has reached/exceeded Y (configurable) number of warnings, force
#      a sync. (Useful when you get a false alarm above and you can't be bothered
#      to login and do a manual sync. Note the risk is if its not a false alarm
#      and you can't access the box before Y number of times the job is run  to
#      fix the issue... Well I hope you have other backups...)
#   5) If parity info is out of sync BUT the number of deleted files did NOT
#      exceed X, it calls sync to update the parity info.
#   6) If the parity info is in sync (either because nothing changed or after it
#      has successfully completed the sync job, it runs the scrub command to
#      validate the integrity of the data (both the files and the parity info).
#      Note that each run of the scrub command will validate only a (configurable)
#      portion of parity info to avoid having a long running job and affecting
#      the performance of the box.
#   7) Once all jobs are completed, it sends an email with the output to user
#      (if configured).
#
#   Inspired by Zack Reed (https://zackreed.me/articles/83-updated-snapraid-sync-script)
#   Modified version of mtompkins version of my script (https://gist.github.com/mtompkins/91cf0b8be36064c237da3f39ff5cc49d)
#
#######################################################################

######################
#   USER VARIABLES   #
######################

####################### USER CONFIGURATION START #######################

# address where the output of the jobs will be emailed to.
EMAIL_ADDRESS="email_address@gmail.com"

# Set the threshold of deleted files to stop the sync job from running.
# NOTE that depending on how active your filesystem is being used, a low
# number here may result in your parity info being out of sync often and/or
# you having to do lots of manual syncing.
DEL_THRESHOLD=50
UP_THRESHOLD=500

# Set number of warnings before we force a sync job.
# This option comes in handy when you cannot be bothered to manually
# start a sync job when DEL_THRESHOLD is breached due to false alarm.
# Set to 0 to ALWAYS force a sync (i.e. ignore the delete threshold above)
# Set to -1 to NEVER force a sync (i.e. need to manual sync if delete threshold is breached)
#SYNC_WARN_THRESHOLD=3
SYNC_WARN_THRESHOLD=-1

# Set percentage of array to scrub if it is in sync.
# i.e. 0 to disable and 100 to scrub the full array in one go
# WARNING - depending on size of your array, setting to 100 will take a very long time!
SCRUB_PERCENT=8
SCRUB_AGE=10

# Set the option to log SMART info. 1 to enable, any other values to disable
SMART_LOG=1

# location of the snapraid binary
SNAPRAID_BIN="/usr/local/bin/snapraid"
# location of the mail program binary
MAIL_BIN="/usr/bin/mutt"

function main(){

  ######################
  #   INIT VARIABLES   #
  ######################
  CHK_FAIL=0
  DO_SYNC=0
  EMAIL_SUBJECT_PREFIX="(SnapRAID on `hostname`)"
  GRACEFUL=0
  SYNC_WARN_FILE="/tmp/snapRAID.warnCount"
  SYNC_WARN_COUNT=""
  TMP_OUTPUT="/tmp/snapRAID.out"
  # Capture time
  SECONDS=0

  ###############################
  #   MANAGE DOCKER CONTAINERS  #
  ###############################
  # Set to 0 to not manage any containers.
  MANAGE_SERVICES=1

  # Containers to manage (separated with spaces).
  SERVICES='sabnzbd sonarr radarr lidarr plex unifi-controller'

  # Build Services Array...
  service_array_setup

  # Expand PATH for smartctl
  PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

  # Determine names of first content file...
  CONTENT_FILE=`grep -v '^$\|^\s*\#' /etc/snapraid.conf | grep snapraid.content | head -n 1 | cut -d " " -f2`

  # Build an array of parity all files...
  IFS=$'\n' PARITY_FILES=(`cat /etc/snapraid.conf | grep "^[^#;]" | grep "^\([2-6z]-\)*parity" | cut -d " " -f 2 | tr ',' '\n'`)

##### USER CONFIGURATION STOP ##### MAKE NO CHANGES BELOW THIS LINE ####

  # create tmp file for output
  > $TMP_OUTPUT

  # Redirect all output to file and screen. Starts a tee process
  output_to_file_screen

  # timestamp the job
  echo "SnapRAID Script Job started [`date`]"
  echo
  echo "----------------------------------------"

  # Remove any plex created anomolies
  echo "##Preprocessing"

  # Stop any services that may write to the array during sync
  if [ $MANAGE_SERVICES -eq 1 ]; then
    echo "###Stop Services [`date`]"
    stop_services
  fi

  # sanity check first to make sure we can access the content and parity files
  sanity_check

  echo
  echo "----------------------------------------"
  echo "##Processing"

  # Fix timestamps
  chk_zero

  # run the snapraid DIFF command
  echo "###SnapRAID DIFF [`date`]"
  $SNAPRAID_BIN diff
  # wait for the above cmd to finish, save output and open new redirect
  close_output_and_wait
  output_to_file_screen
  echo
  echo "DIFF finished [`date`]"
  JOBS_DONE="DIFF"

  # Get number of deleted, updated, and modified files...
  get_counts

  # sanity check to make sure that we were able to get our counts from the output of the DIFF job
  if [ -z "$DEL_COUNT" -o -z "$ADD_COUNT" -o -z "$MOVE_COUNT" -o -z "$COPY_COUNT" -o -z "$UPDATE_COUNT" ]; then
    # failed to get one or more of the count values, lets report to user and exit with error code
    echo "**ERROR** - failed to get one or more count values. Unable to proceed."
    echo "Exiting script. [`date`]"
    if [ $EMAIL_ADDRESS ]; then
      SUBJECT="$EMAIL_SUBJECT_PREFIX WARNING - Unable to proceed with SYNC/SCRUB job(s). Check DIFF job output."
      send_mail
    fi
    exit 1;
  fi
  echo
  echo "**SUMMARY of changes - Added [$ADD_COUNT] - Deleted [$DEL_COUNT] - Moved [$MOVE_COUNT] - Copied [$COPY_COUNT] - Updated [$UPDATE_COUNT]**"
  echo

  # check if the conditions to run SYNC are met
  # CHK 1 - if files have changed
  if [ $DEL_COUNT -gt 0 -o $ADD_COUNT -gt 0 -o $MOVE_COUNT -gt 0 -o $COPY_COUNT -gt 0 -o $UPDATE_COUNT -gt 0 ]; then
    chk_del

    if [ $CHK_FAIL -eq 0 ]; then
      chk_updated
    fi

    if [ $CHK_FAIL -eq 1 ]; then
      chk_sync_warn
    fi
  else
    # NO, so let's skip SYNC
    echo "No change detected. Not running SYNC job. [`date`] "
    DO_SYNC=0
  fi

  # Now run sync if conditions are met
  if [ $DO_SYNC -eq 1 ]; then
    echo "###SnapRAID SYNC [`date`]"
    $SNAPRAID_BIN sync -q
    #wait for the job to finish
    close_output_and_wait
    output_to_file_screen
    echo "SYNC finished [`date`]"
    JOBS_DONE="$JOBS_DONE + SYNC"
    # insert SYNC marker to 'Everything OK' or 'Nothing to do' string to differentiate it from SCRUB job later
    sed_me "s/^Everything OK/SYNC_JOB--Everything OK/g;s/^Nothing to do/SYNC_JOB--Nothing to do/g" "$TMP_OUTPUT"
    # Remove any warning flags if set previously. This is done in this step to take care of scenarios when user
    # has manually synced or restored deleted files and we will have missed it in the checks above.
    if [ -e $SYNC_WARN_FILE ]; then
      rm $SYNC_WARN_FILE
    fi
    echo
  fi

  # Moving onto scrub now. Check if user has enabled scrub
  if [ $SCRUB_PERCENT -gt 0 ]; then
    # YES, first let's check if delete threshold has been breached and we have not forced a sync.
    if [ $CHK_FAIL -eq 1 -a $DO_SYNC -eq 0 ]; then
      # YES, parity is out of sync so let's not run scrub job
      echo "Scrub job cancelled as parity info is out of sync (deleted or changed files threshold has been breached). [`date`]"
    else
      # NO, delete threshold has not been breached OR we forced a sync, but we have one last test -
      # let's make sure if sync ran, it completed successfully (by checking for our marker text "SYNC_JOB--" in the output).
      if [ $DO_SYNC -eq 1 -a -z "$(grep -w "SYNC_JOB-" $TMP_OUTPUT)" ]; then
        # Sync ran but did not complete successfully so lets not run scrub to be safe
        echo "**WARNING** - check output of SYNC job. Could not detect marker . Not proceeding with SCRUB job. [`date`]"
      else
        # Everything ok - let's run the scrub job!
        echo "###SnapRAID SCRUB [`date`]"
        $SNAPRAID_BIN scrub -p $SCRUB_PERCENT -o $SCRUB_AGE -q
        #wait for the job to finish
        close_output_and_wait
        output_to_file_screen
        echo "SCRUB finished [`date`]"
        echo
        JOBS_DONE="$JOBS_DONE + SCRUB"
        # insert SCRUB marker to 'Everything OK' or 'Nothing to do' string to differentiate it from SYNC job above
        sed_me "s/^Everything OK/SCRUB_JOB--Everything OK/g;s/^Nothing to do/SCRUB_JOB--Nothing to do/g" "$TMP_OUTPUT"
      fi
    fi
  else
    echo "Scrub job is not enabled. Not running SCRUB job. [`date`] "
  fi

  echo
  echo "----------------------------------------"
  echo "##Postprocessing"

  # Moving onto logging SMART info if enabled
  if [ $SMART_LOG -eq 1 ]; then
    echo
    $SNAPRAID_BIN smart
    close_output_and_wait
    output_to_file_screen
  fi

  #echo "Spinning down disks..."
  #$SNAPRAID_BIN down

  # Graceful restore of services outside of trap - for messaging
  GRACEFUL=1
  if [ $MANAGE_SERVICES -eq 1 ]; then
    restore_services
  fi

  echo "All jobs ended. [`date`] "

  # all jobs done, let's send output to user if configured
  if [ $EMAIL_ADDRESS ]; then
    echo -e "Email address is set. Sending email report to **$EMAIL_ADDRESS** [`date`]"
    # check if deleted count exceeded threshold
    prepare_mail

    ELAPSED="$(($SECONDS / 3600))hrs $((($SECONDS / 60) % 60))min $(($SECONDS % 60))sec"
    echo
    echo "----------------------------------------"
    echo "##Total time elapsed for SnapRAID: $ELAPSED"

    # Add a topline to email body
    sed_me "1s/^/##$SUBJECT \n/" "${TMP_OUTPUT}"
    send_mail
  fi

  #clean_desc

  exit 0;
}

#######################
# FUNCTIONS & METHODS #
#######################

function sanity_check() {
  if [ ! -e $CONTENT_FILE ]; then
    echo "**ERROR** Content file ($CONTENT_FILE) not found!"
    exit 1;
  fi

  echo "Testing that all parity files are present."
  for i in "${PARITY_FILES[@]}"
    do
      if [ ! -e $i ]; then
        echo "[`date`] ERROR - Parity file ($i) not found!"
        echo "ERROR - Parity file ($i) not found!" >> $TMP_OUTPUT
        exit 1;
      fi
  done
  echo "All parity files found. Continuing..."
}

function get_counts() {
  DEL_COUNT=$(grep -w '^ \{1,\}[0-9]* removed' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
  ADD_COUNT=$(grep -w '^ \{1,\}[0-9]* added' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
  MOVE_COUNT=$(grep -w '^ \{1,\}[0-9]* moved' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
  COPY_COUNT=$(grep -w '^ \{1,\}[0-9]* copied' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
  UPDATE_COUNT=$(grep -w '^ \{1,\}[0-9]* updated' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
}

function sed_me(){
  # Close the open output stream first, then perform sed and open a new tee process and redirect output.
  # We close stream because of the calls to new wait function in between sed_me calls.
  # If we do not do this we try to close Processes which are not parents of the shell.
  exec >&$out 2>&$err
  $(sed -i "$1" "$2")

  output_to_file_screen
}

function chk_del(){
  if [ $DEL_COUNT -lt $DEL_THRESHOLD ]; then
    # NO, delete threshold not reached, lets run the sync job
    echo "There are deleted files. The number of deleted files, ($DEL_COUNT), is below the threshold of ($DEL_THRESHOLD). SYNC Authorized."
    DO_SYNC=1
  else
    echo "**WARNING** Deleted files ($DEL_COUNT) exceeded threshold ($DEL_THRESHOLD)."
    CHK_FAIL=1
  fi
}

function chk_updated(){
  if [ $UPDATE_COUNT -lt $UP_THRESHOLD ]; then
    echo "There are updated files. The number of updated files, ($UPDATE_COUNT), is below the threshold of ($UP_THRESHOLD). SYNC Authorized."
    DO_SYNC=1
  else
    echo "**WARNING** Updated files ($UPDATE_COUNT) exceeded threshold ($UP_THRESHOLD)."
    CHK_FAIL=1
  fi
}

function chk_sync_warn(){
  if [ $SYNC_WARN_THRESHOLD -gt -1 ]; then
    echo "Forced sync is enabled. [`date`]"

    SYNC_WARN_COUNT=$(sed 'q;/^[0-9][0-9]*$/!d' $SYNC_WARN_FILE 2>/dev/null)
    SYNC_WARN_COUNT=${SYNC_WARN_COUNT:-0} #value is zero if file does not exist or does not contain what we are expecting

    if [ $SYNC_WARN_COUNT -ge $SYNC_WARN_THRESHOLD ]; then
      # YES, lets force a sync job. Do not need to remove warning marker here as it is automatically removed when the sync job is run by this script
      echo "Number of warning(s) ($SYNC_WARN_COUNT) has reached/exceeded threshold ($SYNC_WARN_THRESHOLD). Forcing a SYNC job to run. [`date`]"
      DO_SYNC=1
    else
      # NO, so let's increment the warning count and skip the sync job
      ((SYNC_WARN_COUNT += 1))
      echo $SYNC_WARN_COUNT > $SYNC_WARN_FILE
      echo "$((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) warning(s) till forced sync. NOT proceeding with SYNC job. [`date`]"
      DO_SYNC=0
    fi
  else
    # NO, so let's skip SYNC
    echo "Forced sync is not enabled. Check $TMP_OUTPUT for details. NOT proceeding with SYNC job. [`date`]"
    DO_SYNC=0
  fi
}

function chk_zero(){
  echo "###SnapRAID TOUCH [`date`]"
  echo "Checking for zero sub-second files."
  TIMESTATUS=$($SNAPRAID_BIN status | grep 'You have [1-9][0-9]* files with zero sub-second timestamp\.' | sed 's/^You have/Found/g')
  if [ -n "$TIMESTATUS" ]; then
    echo "$TIMESTATUS"
    echo "Running TOUCH job to timestamp. [`date`]"
    $SNAPRAID_BIN touch
    close_output_and_wait
    output_to_file_screen
    echo "TOUCH finished [`date`]"
  else
    echo "No zero sub-second timestamp files found."
  fi
}

function service_array_setup() {
  if [ -z "$SERVICES" ]; then
    echo "Please configure serivces"
  else
    echo "Setting up service array"
    read -a service_array <<<$SERVICES
  fi
}

function stop_services(){
  for i in ${service_array[@]}; do
    echo "Pausing Service - ""${i^}";
    /usr/bin/docker pause $i
  done
}

function restore_services(){
  for i in ${service_array[@]}; do
    echo "Unpausing Service - ""${i^}";
    /usr/bin/docker unpause $i
  done

  if [ $GRACEFUL -eq 1 ]; then
    return
  fi

  clean_desc

  exit
}

function clean_desc(){
  # Cleanup file descriptors
  exec 1>&{out} 2>&{err}

  # If interactive shell restore output
  [[ $- == *i* ]] && exec &>/dev/tty
}

function prepare_mail() {
  if [ $CHK_FAIL -eq 1 ]; then
    if [ $DEL_COUNT -gt $DEL_THRESHOLD -a $DO_SYNC -eq 0 ]; then
      MSG="Deleted Files ($DEL_COUNT) / ($DEL_THRESHOLD) Violation"
    fi

    if [ $DEL_COUNT -gt $DEL_THRESHOLD -a $UPDATE_COUNT -gt $UP_THRESHOLD -a $DO_SYNC -eq 0 ]; then
      MSG="$MSG & "
    fi

    if [ $UPDATE_COUNT -gt $UP_THRESHOLD -a $DO_SYNC -eq 0 ]; then
      MSG="$MSG Changed Files ($UPDATE_COUNT) / ($UP_THRESHOLD) Violation"
    fi

    SUBJECT="[WARNING] $SYNC_WARN_COUNT - ($MSG) $EMAIL_SUBJECT_PREFIX"
  elif [ -z "${JOBS_DONE##*"SYNC"*}" -a -z "$(grep -w "SYNC_JOB-" $TMP_OUTPUT)" ]; then
    # Sync ran but did not complete successfully so lets warn the user
    SUBJECT="[WARNING] SYNC job ran but did not complete successfully $EMAIL_SUBJECT_PREFIX"
  elif [ -z "${JOBS_DONE##*"SCRUB"*}" -a -z "$(grep -w "SCRUB_JOB-" $TMP_OUTPUT)" ]; then
    # Scrub ran but did not complete successfully so lets warn the user
    SUBJECT="[WARNING] SCRUB job ran but did not complete successfully $EMAIL_SUBJECT_PREFIX"
  else
    SUBJECT="[COMPLETED] $JOBS_DONE Jobs $EMAIL_SUBJECT_PREFIX"
  fi
}

function send_mail(){
  # Format for markdown
  sed_me "s/$/  /" "$TMP_OUTPUT"
  $MAIL_BIN -s "$SUBJECT" "$EMAIL_ADDRESS" < $TMP_OUTPUT
  #docker run --rm msmtptest bash -c 'printf "Subject: $SUBJECT" | msmtp $EMAIL_ADDRESS'
}

#Due to how process substitution and newer bash versions work, this function stops the output stream which allows wait stops wait from hanging on the tee process.
#If we do not do this and use normal 'wait' the processes will wait forever as newer bash version will wait for the process substitution to finish.
#Probably not the best way of 'fixing' this issue. Someone with more knowledge can provide better insight.
function close_output_and_wait(){
  exec >&$out 2>&$err
  wait $(pgrep -P "$$")
}

# Redirects output to file and screen. Open a new tee process.
function output_to_file_screen(){
  # redirect all output to screen and file
  exec {out}>&1 {err}>&2
  # NOTE: Not preferred format but valid: exec &> >(tee -ia "${TMP_OUTPUT}" )
  exec > >(tee -a "${TMP_OUTPUT}") 2>&1
}

# Set TRAP
trap restore_services INT EXIT

main "$@"

          
        

Zack

I love learning new things and trying out the latest technology.

You may also like...

70 Responses

  1. woodensoul2k says:

    Hey Zack, it is still possible to use this script for just a standard dual parity setup? I plan on eventually using a split parity in the future but for now I was just wondering if I could use this new script after upgrading to Snapraid V11. I’m using your older script at the moment on Snapraid V10.

    • Zack says:

      Great question. You can’t just drop this in and have it work, but with a very small modification, it should work fine. You would need to remove these lines.

        # Build an array of parity all files...
        PARITY_FILES[0]=`cat /etc/snapraid.conf | grep "^[^#;]" | grep parity | head -n 1 | cut -d " " -f 2 | cut -d "," -f 1`
        PARITY_FILES[1]=`cat /etc/snapraid.conf | grep "^[^#;]" | grep parity | head -n 1 | cut -d " " -f 2 | cut -d "," -f 2`
        PARITY_FILES[2]=`cat /etc/snapraid.conf | grep "^[^#;]" | grep 2-parity | head -n 1 | cut -d " " -f 2 | cut -d "," -f 1`
        PARITY_FILES[3]=`cat /etc/snapraid.conf | grep "^[^#;]" | grep 2-parity | head -n 1 | cut -d " " -f 2 | cut -d "," -f 2`
      

      and replace them with something like this instead.

      PARITY_FILES[0]=`grep -v '^$\|^\s*\#' /etc/snapraid.conf | grep snapraid.parity | head -n 1 | cut -d " " -f2`
      

      or… if you have dual parity…

      PARITY_FILES[0]=`grep -v '^$\|^\s*\#' /etc/snapraid.conf | grep snapraid.parity | head -n 1 | cut -d " " -f2`
      PARITY_FILES[1]=`grep -v '^$\|^\s*\#' /etc/snapraid.conf | grep snapraid.2-parity | head -n 1 | cut -d " " -f2`
      

      You just need to adjust the names of the parity files that you have setup in your snapraid.conf file, and that should be it 🙂

  2. woodensoul2k says:

    Thanks for the easy explanation. Can’t wait to give it a go.

  3. Dulanic says:

    Is there any reason it unpauses the containers twice?

    Unpausing Service - Plex
    plex
    All jobs ended. [Sun Feb  5 09:08:48 CST 2017] 
    Email address is set. Sending email report to **user@gmail.com** [Sun Feb  5 09:08:48 CST 2017]
    ----------------------------------------
    ##Total time elapsed for SnapRAID: 0hrs 0min 47sec
    Unpausing Service - Plex
    Error response from daemon: Container 811652fcf99c25021c6c5bf65bafcc69d3ec47f3be22b75e6d91ac5db4c95154 is not paused
    
    • Zack says:

      It only unpauses once, unless you have an error in your script so it doesn’t exit gracefully. Here’s the output of my run last night.

      ##[COMPLETED] DIFF Jobs (SnapRAID on loki)
      SnapRAID Script Job started [Sun Feb  5 23:30:01 EST 2017]
      
      ----------------------------------------
      ##Preprocessing
      ###Stop Services [Sun Feb  5 23:30:01 EST 2017]
      Pausing Service - Nzbget
      nzbget
      Pausing Service - Sonarr
      sonarr
      Testing that all parity files are present.
      All parity files found. Continuing...
      
      ----------------------------------------
      ##Processing
      ###SnapRAID TOUCH [Sun Feb  5 23:30:02 EST 2017]
      Checking for zero sub-second files.
      No zero sub-second timestamp files found.
      ###SnapRAID DIFF [Sun Feb  5 23:30:18 EST 2017]
      Loading state from /var/snapraid/content...
      Comparing...
      
         79918 equal
             0 added
             0 removed
             0 updated
             0 moved
             0 copied
             0 restored
      No differences
      
      DIFF finished [Sun Feb  5 23:32:13 EST 2017]
      
      **SUMMARY of changes - Added [0] - Deleted [0] - Moved [0] - Copied [0] - Updated [0]**
      
      No change detected. Not running SYNC job. [Sun Feb  5 23:32:13 EST 2017]
      Scrub job is not enabled. Not running SCRUB job. [Sun Feb  5 23:32:13 EST 2017]
      
      ----------------------------------------
      ##Postprocessing
      Spinning down disks...
      Spindown...
      Spundown device '/dev/sdi' for disk 'parity' in 39 ms.
      Spundown device '/dev/sdk' for disk 'parity' in 69 ms.
      Spundown device '/dev/sdh' for disk 'd07' in 608 ms.
      Spundown device '/dev/sdb' for disk 'd08' in 628 ms.
      Spundown device '/dev/sda' for disk 'd05' in 633 ms.
      Spundown device '/dev/sdg' for disk 'd01' in 695 ms.
      Spundown device '/dev/sdj' for disk '2-parity' in 713 ms.
      Spundown device '/dev/sdf' for disk 'd02' in 722 ms.
      Spundown device '/dev/sdl' for disk '2-parity' in 729 ms.
      Spundown device '/dev/sde' for disk 'd03' in 734 ms.
      Spundown device '/dev/sdc' for disk 'd06' in 941 ms.
      Spundown device '/dev/sdd' for disk 'd04' in 945 ms.
      Unpausing Service - Nzbget
      nzbget
      Unpausing Service - Sonarr
      sonarr
      All jobs ended. [Sun Feb  5 23:32:14 EST 2017]
      Email address is set. Sending email report to **rubylaser@gmail.com** [Sun Feb  5 23:32:14 EST 2017]
      
      ----------------------------------------
      ##Total time elapsed for SnapRAID: 0hrs 2min 13sec
      
      • Dulanic says:

        FYI, it happened again. I don’t know what the heck is causing this… because I also get a “normal” email that shows it worked properly. The cronn process seems to be thinking it failed… and I guess it did, at least partially since it hit the 2nd restart and they fail.

        The 1st email fires off successfully and then the 2nd fires off when it failed immediately after when it tries to restart again. I think ti doesn’t send me off the fail email if I manually run it as it isn’t run automatically causing the cron failure email to fire off.

         ##Postprocessing
        Spinning down disks...
        Spindown...
        Spundown device '/dev/sdf' for disk 'parity' in 435 ms.
        Spundown device '/dev/sdd' for disk 'd3' in 484 ms.
        Spundown device '/dev/sde' for disk 'd5' in 604 ms.
        Spundown device '/dev/sdb' for disk 'd1' in 640 ms.
        Spundown device '/dev/sdc' for disk 'd2' in 645 ms.
        Spundown device '/dev/sdg' for disk 'd4' in 874 ms.
        Spundown device '/dev/sdh' for disk '2-parity' in 1285 ms.
        Unpausing Service - Deluge
        deluge
        Unpausing Service - Sonarr
        sonarr
        Unpausing Service - Quassel-core
        quassel-core
        Unpausing Service - Crashplan
        crashplan
        Unpausing Service - Jackett
        jackett
        Unpausing Service - Hydra
        hydra
        Unpausing Service - Plexpy
        plexpy
        Unpausing Service - Muximux
        muximux
        Unpausing Service - Letsencrypt
        letsencrypt
        Unpausing Service - Couchpotato
        couchpotato
        Unpausing Service - Mariadb
        mariadb
        Unpausing Service - Sabnzbd
        sabnzbd
        Unpausing Service - Plex
        plex
        Unpausing Service - Nextcloud
        Error response from daemon: Container 4cfbaa7990fc236e5e3b1ccc8641d21b2775b1193bc238378bce513b01dc583e is not paused
        All jobs ended. [Mon Feb  6 23:36:21 CST 2017]
        Email address is set. Sending email report to **dulanic@gmail.com** [Mon Feb  6 23:36:21 CST 2017]
        
        ----------------------------------------
        ##Total time elapsed for SnapRAID: 0hrs 6min 20sec
        Unpausing Service - Deluge
        Error response from daemon: Container da7333422bc93d1a648463e1993c957a68866cdec3e42c1932158e76ed2f8647 is not paused
        Unpausing Service - Sonarr
        Error response from daemon: Container 33e3823a6122747f6b6c25597c03b667afc021fba1386b1a21138558528e8271 is not paused
        Unpausing Service - Quassel-core
        Error response from daemon: Container a1acee7c40c6a95741a83d7c7832d72a94ea104e5d63ea2bc43b9415ee256652 is not paused
        Unpausing Service - Crashplan
        Error response from daemon: Container 30b077d70125481e435a4c6651fb790ba2720f6c8b14996a1b2a44508faaa41e is not paused
        Unpausing Service - Jackett
        Error response from daemon: Container e743af5de3e3f04488658034c6be261daa6094da0e7c852d4e01b2066948ee7b is not paused
        Unpausing Service - Hydra
        Error response from daemon: Container 864e63f16b50854d81b8801bb3bf94116b2efda1ecde5043243fe4d0bc9a275d is not paused
        Unpausing Service - Plexpy
        Error response from daemon: Container 622def42a89d6931268f1933247e0a7e4fe90134fa0c2c2443b1da6d0e967233 is not paused
        Unpausing Service - Muximux
        Error response from daemon: Container a9bfa9ef1f6e366fce8d06b84499b17f68334bd819cecf783f2e85b885e45b10 is not paused
        Unpausing Service - Letsencrypt
        Error response from daemon: Container 40b90fce88dfb3339ca245d7f0604460c145424dedf36b583c61e19b4208974b is not paused
        Unpausing Service - Couchpotato
        Error response from daemon: Container af4e00b3237d814df315ce647e864644fc76a713604bdf4a1905b29d18ab4409 is not paused
        Unpausing Service - Mariadb
        Error response from daemon: Container acd11d391990de2b2fb0ea1d8551e41850ff288c2c2d9c17549884139d6a7e7f is not paused
        Unpausing Service - Sabnzbd
        Error response from daemon: Container 4d48e3801a726464df38385d11045029232143f8f6ba9d6d990384b9d721842b is not paused
        Unpausing Service - Plex
        Error response from daemon: Container 811652fcf99c25021c6c5bf65bafcc69d3ec47f3be22b75e6d91ac5db4c95154 is not paused
        Unpausing Service - Nextcloud
        Error response from daemon: Container 4cfbaa7990fc236e5e3b1ccc8641d21b2775b1193bc238378bce513b01dc583e is not paused
        • Dulanic says:

          OK sorry to spam your comments, but I did find that somehow the trap keeps being triggered but no idea how. I edited it to show “trap triggered” by replacing the function and it triggered at the end again. So I figured it looked like there is an issue /w the clean_desc function on my system, but I can’t tell why? So I tried replacing the trap /w…

          trap 'err_report $LINENO' ERR

          but now it isn’t triggering the trap. I did try changing to trap to a new function i had to echo trap triggered and it always did it at the end /w clean_desc . Since that worked for now I’ll keep it that way, and see if eventually it does trigger that so I can see where the line is that it errors on.

  4. Dulanic says:

    So odd, I did a notepad++ txt compare and the only 2 changes were the services and the the thresholds… yet I repasted it and it worked OK this time. Maybe a line break or something got messed up, no idea because even notepad++ couldn’t tell a difference. That was with a forced run, so we’ll see overnight I guess

    • Zack says:

      Thanks for the info. Keep me posted.

      • EmanuelW says:

        Hi Zack and Dulanic,

        First of all, thank you Zack for sharing this script!

        To you both: Did you gain anymore insight in this issue? in my testing it seems that the trap gets executed at a clean exit as well (I did not find any errors or weird exit values from commands at least). Separating the service_restore and clean_desc seems to fix it for me, see patch below. It relies on the trap being executed on the intended exit call in main. This was tested on Ubuntu 18.04, bash version 4.4.20(1)-release. What are your thoughts on this, should the trap not be executed on the exit call in main (row 285 above)?

        Also worth noting is that during my debugging a ran shellcheck and corrected some of the warnings but it did not make a difference in this case. The patch below might look a bit weird due to it though, so please tell me if I should post one that is directly applicable to the script above.

        diff --git a/snapraid_diff_n_sync.sh b/snapraid_diff_n_sync.sh
        index 395b21b..b088b7e 100755
        --- a/snapraid_diff_n_sync.sh
        +++ b/snapraid_diff_n_sync.sh
        @@ -82,7 +82,7 @@ function main(){
           CHK_FAIL=0
           DO_SYNC=0
           EMAIL_SUBJECT_PREFIX="(SnapRAID on $(hostname))"
        -  GRACEFUL=0
        +  SERVICES_STOPPED=0
           SYNC_WARN_FILE="/tmp/snapRAID.warnCount"
           SYNC_WARN_COUNT=""
           TMP_OUTPUT="/tmp/snapRAID.out"
        @@ -261,7 +261,6 @@ function main(){
           $SNAPRAID_BIN down
        
           # Graceful restore of services outside of trap - for messaging
        -  GRACEFUL=1
           if [ $MANAGE_SERVICES -eq 1 ]; then
             restore_services
           fi
        @@ -284,8 +283,7 @@ function main(){
             send_mail
           fi
        
        -  clean_desc
        -
        +  # Exit with success, letting the trap handle cleanup of file descriptors
           exit 0;
         }
        
        @@ -404,22 +402,18 @@ function stop_services(){
           for i in "${service_array[@]}"; do
             echo "Pausing Service - ""${i^}";
             docker pause "$i"
        +    SERVICES_STOPPED=1
           done
         }
        
         function restore_services(){
        -  for i in "${service_array[@]}"; do
        +  if [ $SERVICES_STOPPED -eq 1 ]; then
        +    for i in "${service_array[@]}"; do
               echo "Unpausing Service - ${i^}";
        -    docker unpause "$i"
        -  done
        -
        -  if [ $GRACEFUL -eq 1 ]; then
        -    return
        +      docker unpause "$i"
        +      SERVICES_STOPPED=0
        +    done
           fi
        -
        -  clean_desc
        -
        -  exit
         }
        
         function clean_desc(){
        @@ -430,6 +424,12 @@ function clean_desc(){
           [[ $- == *i* ]] && exec &>/dev/tty
         }
        
        +function final_cleanup(){
        +  restore_services
        +  clean_desc
        +  exit
        +}
        +
         function prepare_mail() {
           if [ $CHK_FAIL -eq 1 ]; then
             if [ "$DEL_COUNT" -gt $DEL_THRESHOLD ] && [ $DO_SYNC -eq 0 ]; then
        @@ -479,6 +479,6 @@ function output_to_file_screen(){
         }
        
         # Set TRAP
        -trap restore_services INT EXIT
        +trap final_cleanup INT EXIT
        
         main "$@"
        \ No newline at end of file
        
        • Zack says:

          This looks like a good approach. I just patched a test version of this script on my local version. I’m also on Ubuntu 18.04 with mine with a bash version 4.4.20(1)-release. The trap order seems to work fine, and logically makes sense. Thank you for this! I will continue to test, but this seems to be a good fix. I’d love to hear back from others with newer BASH versions to see if this remedies their issues.

  5. Dulanic says:

    So I have been using your script and mergerfs setup for a month now and twice I have run into an issue twice where it says something along the lines of…

    WARNING! All the files previously present in disk 'd4' at dir '/mnt/data/disk4/'
    are now missing or rewritten!
    This could happen when some disks are not mounted
    in the expected directory.
    If you want to 'sync' anyway, use 'snapraid --force-empty sync'.

    I have a feeling it is likely due to a file that mergerfs assigns to d4 that is either later moved or deleted between syncs. This is frustrating and the force empty sync takes forever. Is there any way around this?

    • Dulanic says:

      FYI what is the right way to post code? lol I keep trying things but can’t figure it out.

    • Zack says:

      If you are running into needing to use –force-empty sync, you really need to check the files in question. In this case, this means that all of the files that were on /mnt/data/disk4 have been removed or have been deleted. If you haven’t completely removed all content from that disk, this means that something has went wrong. For example, a disk that didn’t mount, or a script that’s deleted all files, etc.

      To put this in perspective, I have never had to run –force-empty-sync unless, I have manually removed all files on a disk. I would do some more investigation.

      • Dulanic says:

        It looks like it was because of torrents soon after a disk got full… it would sync and then later on the files would be removed and boom that error showed up. What I did was add a DONOTDELETE.txt file to all of the drive to avoid the issue.

        • Zack says:

          I would suggest handling all of your torrent downloading outside of SnapRAID. Or, just make a torrents folder for in process files, and then move the final, completed files to a location that SnapRAID is including. That way you don’t run the risk of a failed restore because of some missing files that were there during your last sync.

  6. oxzhor says:

    Hi Zack,

    I try to use your script on CentOS 7 and check all path this are the same like into ubuntu.
    But if i want to run the script i get the follow error:

    [root@media scripts]# sh snapraid_diff_n_sync.sh
    snapraid_diff_n_sync.sh: line 121: syntax error near unexpected token `>’
    snapraid_diff_n_sync.sh: line 121: ` exec > >(tee -ia “${TMP_OUTPUT}” ) 2>&1′

    I only modified the script so i can use dual parity, for the rest it is untouched.

    • Zack says:

      Did you try the commented out method above to see if that works instead (obviously comment out the current exec line)?

      exec &> >(tee -ia "${TMP_OUTPUT}" )
      
  7. chad says:

    Zack –

    Thanks for your work on this, it’s been working very well for many months now! I have two related issues I need to ask you about. The background is that I’ve recently begun using Zoneminder for video surveillance. I have a single camera and keep recorded video for about 15 days. The way Zoneminder works is that it saves tens of thousands of files, sometimes hundreds of thousands of files per day. Then, after 15 days, those files are deleted in order to save space. This operation happens daily with just one camera. This would experience significant growth as cameras are added.

    You probably know where I’m going with this – your script sees these as massive changes to the system and tries to run the Snapraid operation but I’m receiving the following error:


    remove zoneminder/events/2/17/07/10/.1119
    remove zoneminder/events/2/17/07/10/.1120
    remove zoneminder/events/2/17/07/10/.1121
    remove zoneminder/events/2/17/07/10/.1122
    remove zoneminder/events/2/17/07/10/.1123

    613286 equal
    63606 added
    15725 removed
    1 updated
    0 moved
    0 copied
    0 restored
    There are differences!

    DIFF finished [Tue Jul 25 23:30:46 PDT 2017]
    **ERROR** – failed to get one or more count values. Unable to proceed.
    Exiting script. [Tue Jul 25 23:30:46 PDT 2017]

    As you can see, 63k+ files added, 15k+ files deleted but I’m not sure why the script is erroring out. Can you help with this? I’ve set my delete threshold to 10,000 but it’s still too low.

    The second question is there any way to exclude a folder from being included in the delete threshold count? With that method I can exclude my Zoneminder folder and avoid triggering the threshold nightly. Also since I have to increase the threshold, this opens opportunity for script to continue even if many files have been deleted in error.

    Again thanks for your great work!

    Chad

    • Zack says:

      Thank you for the kind words! The only way to figure out why it’s erroring is to re-create the script by hand (run a snapraid diff, and then run the grep’s that the script runs). I would think it’s due to the massive amount of adds/removes throwing my greps off.

      As a sidenote, I would strongly suggest you manage Zoneminder outside of SnapRAID (either moving them to a different setup (maybe a difference ZFS mirror array or exclude that path in SnapRAID). Rapidly changing files are not what SnapRAID is designed for at all. Rapidly changing filesystems can make recovery suffer/not work in the event of a disk failure.

      • chad says:

        Agreed. As I think about the use case of SnapRAID, it makes sense that it’s not designed for what I’m trying to do. Therefore I’m trying to exclude the folder but it’s not working and I think I’m not understanding the exclude option properly. In snapraid.conf I’m simply trying to ‘exclude /mnt/storage/zoneminder/’ but it’s not working neither is ‘exclude /mnt/storage/zoneminder/*”. Reading through the docs, it seems as though it doesn’t exclude recursively through all the subfolders (and Zoneminder has ALOT of them – one for each day AND timestamp). I saw a post where a user was able to exclude using ‘exclude /rootfolder/subfolder/*/*.jpg’ and ‘exclude /rootfolder/subfolder/*/*/*.jpg” and so on but that seems tedious.

        Any idea on how to handle this situation? My /mnt/storage is my main array and I want to keep my video storage on that.

        • Zack says:

          Good question. The exclude is actually really easy. It is a relative path in the array. So, if your folder is at /mnt/storage/zoneminder, you would exclude that whole folder recursivly by adding just this one line to your snapraid.conf file.

          exclude /zoneminder/
          
  8. chad says:

    That did it, thanks!

  9. blotsome says:

    Hey Zach, Great work all around! I was hoping to get some advice on how to modify this script. Does the output get written locally to a log file somewhere, or is it only e-mailed? If it is logged locally, where? If not, how would I set the output to be recorded locally? Also, I don’t want daily e-mails every time it runs successfully, I only want an e-mail if it fails. the prepare_mail function has a number of if statements that change the subject based on warning conditions. I’d like to set something up so it only send_mail if subject contains warning? something along those lines. Any advice?

    • Zack says:

      Hello! Thanks for the kind words. The script does write to a local file, you can see that in the INIT variables (/tmp/snapRAID.out). If you don’t want to receive emails unless it fails, you will need to edit that send_mail function to contain an if statement (I’m writing this from my phone, so this is untested).

      function send_mail(){
        # Format for markdown
        sed_me "s/$/  /" "$TMP_OUTPUT"
        [[ $SUBJECT != *"[COMPLETED]"* ]]; then
           $MAIL_BIN -s "$SUBJECT" "$EMAIL_ADDRESS" < $TMP_OUTPUT
        fi
      }
      

      Honestly, I like the nightly emails though. It ensures that the script ran correctly, and didn't silently fail. That way I always KNOW my data is safe. I hope that helps.

  10. kocane says:

    Thanks for the script.

    Does this script spin down disks? If so, how do I disable this?

    • Zack says:

      Yes it does. You can see on this line.

        echo "Spinning down disks..."
        $SNAPRAID_BIN down
      

      Just comment it out…

        #echo "Spinning down disks..."
        #$SNAPRAID_BIN down
      
  11. clayboyx says:

    i’m having an issue with line 127 where services need to be stopped… it says ‘missing’ not sure what to do everything else works using triple parity so disabled make service array
    ` # Stop any services that may inhibit optimum execution
    if [ $MANAGE_SERVICES -eq 1 ]; then
    echo “###Stop Services [`date`]”
    stop_services`

    • Zack says:

      That line is for managing Docker containers. Are you using Docker containers, and if so, would you like to stop them? If so, you need to add the names of the services you’d like to stop to this line.

      SERVICES='nzbget sonarr' 

      If you are not using Docker containers, just change this line from a 1 to a 0.

      MANAGE_SERVICES=1

      As a general rule, it’s a good idea to read through the commented lines in any script your are using to try to understand roughly how it works. All of the configuration options are at the top and I have tried to provide comments for every option.

      I hope that helps,

  12. clayboyx says:

    i got the script working now … but when i do crontab -e then add the script …. the script runs but it doesn’t send any emails… can you assist. Mutt is setup correctly. Emails do get sent if i run the script from terminal

    # Run a SnapRAID diff and then sync
    30 23 * * * /root/Backup/snapraid.sh

    • Zack says:

      Have you chmod’d the snapraid.sh script to make it executable, and it’s it running under the root user’s crontab -e or your regular user (it needs to be root)?

  13. clayboyx says:

    yes the script is chmod +x. I setup a cronjob thru webmin as a test to have it run immediately the script runs but no email is sent

    • Zack says:

      As I said in my email, the script appears to be working fine. It looks like you need to properly configure ssmtp to work with gmail and Mutt. Once that is done, the email should work fine 🙂

  14. nerdfury says:

    Hey, not sure if this helps but this might be a solution for supporting different parity setups

    https://gist.github.com/nerdfury/7b5de21e8f8c54616feca73638f97fe1#file-snapraid-sh-L106

    should work with parity, 2-parity and z-parity options

  15. kiwijunglist says:

    Hi thanks for this. Can you please explain how would i adjust for my single parity setup?

    content /var/snapraid.content
    content /mnt/disk-3tb1/snapraid.content
    content /mnt/disk-3tb2/snapraid.content
    content /mnt/disk-3tb3/snapraid.content
    content /mnt/disk-3tb4/snapraid.content
    content /mnt/disk-4tb1/snapraid.content
    content /mnt/disk-6tb1/snapraid.content
    content /mnt/disk-8tb1/snapraid.content
    content /mnt/disk-8tb2/snapraid.content
    content /mnt/disk-8tb3/snapraid.content

    data d1 /mnt/disk-3tb1/
    data d2 /mnt/disk-3tb2/
    data d3 /mnt/disk-3tb3/
    data d4 /mnt/disk-3tb4/
    data d5 /mnt/disk-4tb1/
    data d6 /mnt/disk-6tb1/
    data d7 /mnt/disk-8tb1/
    data d8 /mnt/disk-8tb2/
    data d9 /mnt/disk-8tb3/

    parity /mnt/parity/snapraid.parity

  16. Sejrup says:

    Any issues using this script with Debian? I have had it working with Ubuntu, but no luck so far in Debian.

    Get the following errors when executing the script:
    /root/scripts/snapraid_diff_n_sync.sh: line 308: unexpected EOF while looking for matching `)’
    /root/scripts/snapraid_diff_n_sync.sh: line 457: syntax error: unexpected end of file

    I have adjusted to dual parity as per your instructions in a previous post. But don’t think that has anything to do with it?

    Line 308: UPDATE_COUNT=$(grep -w ‘^ \{1,\}[0-9]* updated

    • sburke says:

      Add a ‘ after each of the following => removed, added, moved, updated and copied. Move commands onto a single line. You should get the following:

      DEL_COUNT=$(grep -w '^ \{1,\}[0-9]* removed' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
      ADD_COUNT=$(grep -w '^ \{1,\}[0-9]* added' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
      MOVE_COUNT=$(grep -w '^ \{1,\}[0-9]* moved' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
      COPY_COUNT=$(grep -w '^ \{1,\}[0-9]* copied' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
      UPDATE_COUNT=$(grep -w '^ \{1,\}[0-9]* updated' $TMP_OUTPUT | sed 's/^ *//g' | cut -d ' ' -f1)
      
  17. sburke says:

    Hey,
    First off, great script Zack, thank you for publishing it. It’s come in very handy.
    Secondly, I was using this version of the script on Debian 9 and it worked without issue. Needed some minor formatting but noting major. I upgraded to Debian 10 and the wait commands got stuck waiting forever. I know the simple solution was to probably use the old version of the script but I overlooked this until just now.

    So if anybody out there is running Debian 10 or a newer version of bash, not quite sure at what version this kicks in but I’m using > 5.0 on Debian 10, so at least that version or greater and is running into the same issue. I’d advise to you to look here as to why:
    https://unix.stackexchange.com/questions/530457/script-hanging-when-using-tee-and-wait-why

    Save yourself the headache and use the old script.
    https://zackreed.me/updated-snapraid-sync-script/

    • Zack says:

      I didn’t realize that. It would be nice to actually fix this script so that it works. It seems like mosvy’s fix would be like this…

        # redirect all output to screen and file
        > $TMP_OUTPUT
        exec {out}>&1 {err}>&2
        exec > >(tee -a "${TMP_OUTPUT}") 2>&1
      
        # timestamp the job
        echo "SnapRAID Script Job started [`date`]"
        echo
        echo "----------------------------------------"
      
        # Remove any plex created anomolies
        echo "##Preprocessing"
      
        # Stop any services that may inhibit optimum execution
        if [ $MANAGE_SERVICES -eq 1 ]; then
          echo "###Stop Services [`date`]"
          stop_services
        fi
      
        # sanity check first to make sure we can access the content and 
        parity files
        sanity_check
      
        echo
        echo "----------------------------------------"
        echo "##Processing"
      
        # Fix timestamps
        chk_zero
      
        # run the snapraid DIFF command
        echo "###SnapRAID DIFF [`date`]"
        $SNAPRAID_BIN diff
        # wait for the above cmd to finish
        exec >&$out 2>&$err
        wait $(pgrep -P "$$")
        echo
        echo "DIFF finished [`date`]"
        JOBS_DONE="DIFF"
      
      • sburke says:

        Sorry for the late reply. Here is how I have butchered your script but it seems to work: https://pastebin.com/PBzrBXq0
        I have only tested the ‘happy path’ flow.
        Your above solution solves the initial issue but what was happening to me was that, anytime sed_me is called it redirects again and opens a new tee process and then the next time we hit a wait, it’s back to square 1 and we wait forever. Does your above fix solve both problems?
        What I think I’ve done is essentially kill the tee process before each wait and open a new one after waiting.

        • Zack says:

          Hello 🙂 No, my above solution was just proposed because on the solution offered on Stack Exchange. You are correct, the sed_me function would put you right back to square one. And, I wouldn’t call what you did a butchering. You made new functions, and it looks pretty good. Also, your assumption is correct, you are allowing the tee process to close before each wait. That’s why it is working. Good job!

    • realbosselarsson says:

      It took me about a month to realize I was not getting any mails from Snapraid and then I remembered it was about a month since I updated Proxmox, bash probably got updated too.
      Your changes sorted it for me, thanks!

  18. MMauro says:

    Hi, thanks for the great work! There’s a syntax error with the IFS variable on line 110 that renders the script unusable. I don’t know what that variable’s supposted to be, so I’m stuck.

  19. Egleu says:

    Nice script. My only suggestion is to either default to sync -h for hashing or to leave a comment. I feel that hashing is almost necessary if you aren’t using ECC memory.

  20. svh1985 says:

    Love the script, thanks! One thing I’m wondering is how to setup smtp authentication. I’m using Gmail, so I have to authenticate. Is this something I can setup?

  21. svh1985 says:

    It looks like there is a type in the script on line 116 which breaks the script:
    The IFS= is never used in the script i think, should it even be there?

    Line 16
    IFS=
    \n’ PARITY_FILES=(`cat /etc/snapraid.conf | grep “^[^#;]” | grep “^\([2-6z]-\)*parity” | cut -d ” ” -f 2 | tr ‘,’ ‘\n’`)

    • Zack says:

      Thanks for letting me know. That is supposed to be there. WordPress just through in a random line break after my last edit. It is supposed to be like this…

      IFS=$'\n' PARITY_FILES=(`cat /etc/snapraid.conf | grep "^[^#;]" | grep "^\([2-6z]-\)*parity" | cut -d " " -f 2 | tr ',' '\n'`)

      IFS stands for “internal field separator”. It is used by the shell to determine how to do word splitting, i. e. how to recognize word boundaries.

  22. EmanuelW says:

    Hi (again) Zack,

    When doing some debugging of another issue (see comment originally made by Dulanic about dual un-pausing of docker services) if found that when waiting for processes to exit, sometimes the value of the pgrep in “close_output_and_wait()” would be invalid (“not a valid PID error” thrown from wait). This would happen for example with a diff that does not have changes (and thus runs very shortly?) A temporary workaround for me was:

    function close_output_and_wait(){
      _PID=$(pgrep -P "$$")
      exec >&"$out" 2>&"$err"
      wait "$_PID"
    }
    

    But preferably one would instead save the PID when spawning the process. Does this seem reasonable?

    • Zack says:

      Maybe I’m dense, but I don’t have, or see, a close_output_and_wait function in my version of the script. Where did you use this? And, thanks for all the questions and ideas!

  23. mbourd25 says:

    Hi gang, I’m using OpenMediaVault for my NAS software. I have 2 2Tb drive and another 2 2Tb drive for Snapraid parity. I also have a 6Tb hard drive that I backup directly to.

    Would this script be an overkill for my usage? I have very little personal documents saved on my NAS. The storage is mostly for my movie and music collection.

    If I can use this script, would anyone have a good tutorial on how I could make this script work in OMV? Right now, I just run a snapraid sync every 2 hours.

    Thanks.

    • Zack says:

      Hello, this script “should” work fine in OMV as it is just a Debian based distribution just like Ubuntu. So, you’d need to paste the contents of this script into a file, change the variables to the correct email addresses and disk locations, finally, you’ll want to chmod +x your script and add it to your crontab. Before you do any of that, just get the script setup and try to run it outside of crontab first. Below I saved the script to /root/scripts/SnapRAID-sync-script

      sudo -i
      chmod +x
      ./root/scripts/SnapRAID-sync-script
      
  24. mon0 says:

    Hi Zack, been using snapraid fora few years and everything was working as expected, now i had to setup again with stronger encryption.
    Copied over my scripts, set up ssmtp etc. and first snap sync took a while but finished without errors.
    Now a testrun on my old working snapraid_diff_n_sync.sh hangs after comparing.

    ##Processing
    ###SnapRAID TOUCH [Mon 05 Oct 2020 04:43:02 PM EEST]
    Checking for zero sub-second files.
    No zero sub-second timestamp files found.
    ###SnapRAID DIFF [Mon 05 Oct 2020 04:43:06 PM EEST]
    Loading state from /var/snapraid.content...
    Comparing...
    
      157042 equal
           0 added
           0 removed
           0 updated
           0 moved
           0 copied
           0 restored
    No differences
    

    Rechecked the file but i do not see any errors. Also tried your new (April) modified script but same outcome.

    Running it with, bash +x snapraid_diff_n_sync.sh shows a wait.

    + echo '###SnapRAID TOUCH [Mon 05 Oct 2020 04:43:28 PM EEST]'
    ###SnapRAID TOUCH [Mon 05 Oct 2020 04:43:28 PM EEST]
    + echo 'Checking for zero sub-second files.'
    Checking for zero sub-second files.
    ++ /usr/local/bin/snapraid status
    ++ sed 's/^You have/Found/g'
    ++ grep 'You have [1-9][0-9]* files with zero sub-second timestamp\.'
    + TIMESTATUS=
    + '[' -n '' ']'
    + echo 'No zero sub-second timestamp files found.'
    No zero sub-second timestamp files found.
    ++ date
    + echo '###SnapRAID DIFF [Mon 05 Oct 2020 04:43:31 PM EEST]'
    ###SnapRAID DIFF [Mon 05 Oct 2020 04:43:31 PM EEST]
    + /usr/local/bin/snapraid diff
    Loading state from /var/snapraid.content...
    Comparing...
    
      157042 equal
           0 added
           0 removed
           0 updated
           0 moved
           0 copied
           0 restored
    No differences
    + wait
    

    Any ideas?

  25. mon0 says:

    hey sorry for the formatting, cannot edit post

  26. punq says:

    Hi, been trying to figure out how to use your script for single parity.

    If i’m not mistaken, wouldn’t the script (as it is today) work for both single parity and split parity setups?

    • punq says:

      As far as i can see it do work on my single parity setup. Great script.

      I get a small “error” in the bottom where it’s trying to restart the container services even though i’ve set the MANAGE_SERVICES=0, can’t really figure out why it is running, i seems correct ??

      • Zack says:

        Hello 🙂 Yes, this script, as written will work well with any level of parity or split parity. I’m not sure how docker would be trying to restart services if MANAGE_SERVICES=0. Those functions shouldn’t execute. I don’t see behavior on my own Ubuntu 20.04 server.

  27. lockheed says:

    You do not explain what the variables mean or do. I can guess some of them but it would be good if you added that info.
    CHK_FAIL=0
    DO_SYNC=0
    GRACEFUL=0
    SECONDS=0

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.