automatically backup xTuple DB on loinux host

 

megabill's picture

What I would like to set up is a cron job on linux that creates a backup of all the DB's in Postgres before the regular system backup to the NAS appliance runs. Does anyone have such a script that they would be willing to share?

Thanks in advance.

megabill's picture
User offline. Last seen 1 week 6 days ago. Offline
Joined: 10/06/2009
Here is what I have done

I have set up the following to see how well it would work. It should.

1. I created a shell script in /root called pg-backup as follows:

#!/bin/bash
#PG BACKUP
#Script to completely dump all postgres databases prior to
#daily system backup
#
# July 25,2010
# Megatechnologies Inc.
#
PGUSER=postgres
PGPASSWORD=postgres

cd /opt/xTuple/postgresql
bin/pg_dumpall > dailydump.sql
exit 0

2. Added a task to the cron table for root to execute this script daily one hour before the daily system backup kicks off.

So far, this seems to work. I do not particularly like having a password in clear text in a script. However, this script resides in root's folder and as such should be somewhat safe from prying eyes.

Of course, I am always open to other methods and ideas. Does anyone have an alternate plan?

sbuttgereit496's picture
User offline. Last seen 1 week 3 days ago. Offline
Joined: 04/14/2009
My approach

Hi--

I do something very similar to Megabill, though a bit more involved. There are a couple key objectives that I try to always accomplish when I implement PostgreSQL backups.

  • Don't use privileged accounts for automated processes, except insofar as is necessary.
  • Backup on the PostgreSQL server directly, ideally to disks that do not host the PostgreSQL tablespaces, WAL files, etc. and during slow load times.
  • Retain some number of prior backups in a couple of different places... just in case!

Its also worth noting that this process is for complete backups. Most xTuple databases are of a size where incremental backups where there isn't enough time/disk/bandwidth savings to worry about anything like incrementals. Naturally, your needs may vary from this and if so, you'll need to alter this approach for that circumstance.

So to prepare for running backups I do a couple of things to prepare for the process:

  1. I create a Linux user account which will act as the 'service account' ('backup_user' from here out). If you use cron to run the server-side script, the backup_user is the account which will own the cron job... everything will be done under this user's security context. The backup_user is a regular Linux user with no special privileges; this user must be able to find and execute the PostgreSQL binaries, access the directories where the backups are stored and be able to find and execute the backup script itself.
  2. I create a backup group in Linux. In my own systems, I have a remote system that comes in later to pickup backup files for off-site storage. That system uses a different user account to log into the database server with, again so I can track exactly what each process is doing. I have the backup_user and this remote system's user share a Linux group so that I can give group level read/write permissions to the backup files without being wide open.
  3. I create a PostgreSQL role in the database server which is how the backup_user will log into the database ('backup_role' from here out). Unlike the backup_user, the backup_role has a great deal of permissions granted in the PostgreSQL database. In essence, this role needs to be able to read everything; if you choose to just have the superuser account (usually 'postgres') to serve as the backup_role, I still recommend creating a new role specially for the backup process and granting that new role superuser status. That way if you're logging and reviewing logs appropriately, you'll be able to see any suspicious activity by the backup_role.
  4. I create some directories to hold the backup files and related logging (collectively, the 'backup directories'). For me, I use something like /var/backup and then I have subdirectories /var/backup/progs, /var/backup/outbound and /var/backup/logs. The purpose of each directory will be made clear later.
  5. Next I create a special file called the .pgpass file. This is the recommended way to store the PostgreSQL user and password in a file for access to the databases. The PostgreSQL binaries know to look for this file and ensure that it's got the correct permissions before accepting it. Complete directions on how to create a .pgpass file can be found at: http://www.postgresql.org/docs/current/static/libpq-pgpass.html. In this case, the .pgpass file is stored in the backup_user's home directory.
  6. After successfully testing everything by running the backup script from the command line as the backup_user, I create the scheduled job which will run the script every night. I personally do this via PgAgent, but there's no reason that cron or some other scheduler couldn't do the same job; so long as it can run the script as the backup_user.

So with those preliminaries out of the way below is the backup process I use and the Linux script that runs the backups.

The basic process in the script is this:

  1. Script starts and cleans up (deletes) old files from the backup directories. 'Old files' are those that are older in days than the TOTAL_DAYS_KEPT variable in the script. The log file is also started in the logs directory of the backup directories.
  2. Next the actual backup runs. As I mentioned before, this is a complete backup of the cluster using pg_dumpall: this way I'm sure to get the global objects such as the PostgreSQL roles in addition to the data in the databases. The backup file itself is saved to the root of the backup directories. This is my 'on-server' backup store, limited still by the TOTAL_DAYS_KEPT value.
  3. Since I'm using other scripts to copy these files to machines in other locations, I compress the backup files. I use zip instead of, say, tar/gzip simply because its more universally common; since these PostgreSQL backups are not platform specific, why assume that I'll only necessarily restore to a Linux machine later? Anyway, its a small point, but that's why zip.
  4. I make a copy of the compressed backup file to the outbound directory and set permissions to allow my remote backup system to be able to access these files. The point of the outbound directory is to allow my other, off-site system to be able to know what files it's retrieved or not. My off site system transfers all files in the outbound directory and then once they're copied successfully, the off-site system deletes them out of the outbound directory so it doesn't see them again. I still have copies of these files in the root of my backup directories after the remote system completes its transfer/delete process.
  5. The logging is completed and closed off. Script is done. An hour or so later after this step, the remote system comes to pickup the files in the outbound directory.

Naturally, that stuff about remote systems may very well not apply to you. Once you have the backup files, you could go to tape, other disk, etc. with only minor modifications to the script below. I just find the remote system pickup process to be very efficient related to the infrastructure that I have available to me. Below is my backup script.

<br />
#!/bin/bash<br />
###################################################<br />
##<br />
##   PostgreSQL Backup Script<br />
##<br />
##   Distributed under the PostgreSQL License:<br />
##<br />
##   Copyright (c) 2010, Lima Buttgereit Holdings LLC d/b/a Muse Systems<br />
##   <a href="http://www.musesystems.com" title="www.musesystems.com">www.musesystems.com</a><br />
##<br />
##   Permission to use, copy, modify, and distribute this software and<br />
##   its documentation for any purpose, without fee, and without a<br />
##   written agreement is hereby granted, provided that the above<br />
##   copyright notice and this paragraph and the following two paragraphs<br />
##   appear in all copies.<br />
##<br />
##   IN NO EVENT SHALL LIMA BUTTGEREIT HOLDINGS LLC BE LIABLE TO ANY<br />
##   PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL<br />
##   DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS<br />
##   SOFTWARE AND ITS DOCUMENTATION, EVEN IF LIMA BUTTGEREIT HOLDINGS LLC<br />
##   HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.<br />
##<br />
##   LIMA BUTTGEREIT HOLDINGS LLC SPECIFICALLY DISCLAIMS ANY WARRANTIES,<br />
##   INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF<br />
##   MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE SOFTWARE<br />
##   PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND LIMA BUTTGEREIT<br />
##   HOLDINGS LLC HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT,<br />
##   UPDATES, ENHANCEMENTS, OR MODIFICATIONS.<br />
##<br />
###################################################</p>
<p># Set some needed variables<br />
BACKUP_PATH=<br />
<the root path to your backups>   #parent directory for the backups<br />
LOG_PATH=$BACKUP_PATH/logs   #where the backup process log files are kept<br />
OUTBOUND=$BACKUP_PATH/outbound   #where the 'export' backup files kept<br />
POSTGRES_PATH=/opt/postgresql/current/bin  #path the PostgreSQL binaries<br />
PATH=$PATH:$POSTGRES_PATH<br />
HOST=<your host address here>   #the hostname or IP address pf your PostgreSQL server<br />
PORT=5432   #the port that the PostgreSQL instance listens on.<br />
DBUSER=<backup postgresql role>   #The database role to connect as for the backup process<br />
DBNAME=<database to connect to>   #The database name to connect to, usually 'postgres'.<br />
BACKUP_GROUP=backup   #a linux group which will own the export backup files.<br />
TOTAL_DAYS_KEPT=2   #the number of days to keep archival backup copies.<br />
YEAR=$(date +%Y)<br />
MONTH=$(date +%m)<br />
DAY=$(date +%d)<br />
TIME=$(date +%H%M)<br />
DATE_STAMP=$YEAR-$MONTH-$DAY-$TIME   #a date stamp used in filenames and such.</p>
<p># Delete old backups & logs<br />
echo "Starting backup process..."  > $LOG_PATH/$DATE_STAMP-backup-log.log<br />
echo "Deleting old files." $(date) >> $LOG_PATH/$DATE_STAMP-backup-log.log<br />
find $BACKUP_PATH -type f -mtime $TOTAL_DAYS_KEPT -delete</p>
<p># Make backup<br />
echo "Starting pg_dumpall." $(date) >> $LOG_PATH/$DATE_STAMP-backup-log.log<br />
pg_dumpall -h $HOST -U $DBUSER -l $DBNAME -p $PORT -f $BACKUP_PATH/$DATE_STAMP-postgresql.sql >>$LOG_PATH/$DATE_STAMP-backup-log.log 2>&1</p>
<p># Compress backup and copy to outbound queue<br />
echo "Backup complete, compressing." $(date) >> $LOG_PATH/$DATE_STAMP-backup-log.log<br />
zip $BACKUP_PATH/$DATE_STAMP-postgresql.zip $BACKUP_PATH/$DATE_STAMP-postgresql.sql >>$LOG_PATH/$DATE_STAMP-backup-log.log 2>&1<br />
cp $BACKUP_PATH/$DATE_STAMP-postgresql.zip $OUTBOUND/$DATE_STAMP-postgresql.zip</p>
<p># Clean-up and Exit<br />
rm $BACKUP_PATH/$DATE_STAMP-postgresql.sql<br />
chown :$BACKUP_GROUP $OUTBOUND/$DATE_STAMP-postgresql.zip<br />
chmod 770 $OUTBOUND/$DATE_STAMP-postgresql.zip<br />
echo "Process Finished." $(date) >> $LOG_PATH/$DATE_STAMP-backup-log.log<br />
exit 0<br />

Hopes this helps to give the community some ideas related to the possibilities for backing up.

Regards,
Steven C. Buttgereit
Muse Systems
www.musesystems.com

sbuttgereit496's picture
User offline. Last seen 1 week 3 days ago. Offline
Joined: 04/14/2009
One extra note...

I forgot to include the purpose of the '/var/backup/progs' directory I mentioned. It just holds the backup script itself. By no means is it necessary to keep the script in the backup directories; I just do it so I can have everything close at hand. Probably better practice to keep it in the backup_user's home dierctory or somewhere in /opt.