Python: скрипт инкрементального или полного бекапа файлов

Автор: | 14/10/2014
 

PythonСоздаёт копию всех данных в каталоге /var/www/vhosts/ по понедельникам, и “инкрементальный” бекап только файлов, изменившихся за последние 24 часа.

Упаковывает и сжимает в архив tar.bz2.

На каждый день создаётся отдельная директория, хранятся 4 полных бекапа и 7 – ежедневных.

Для бекапа баз MySQL – второй скрипт: Python: скрипт бекапа баз данных MySQL/MariaDB.

Структура каталогов в директории бекапов выглядит так:

[simterm]

# tree -L 3 -d /home/setevoy/backups/
/home/setevoy/backups/
├── freeproxy
│   ├── daily
│   │   ├── 2014-10-13-files
│   │   └── 2014-10-14-files
│   └── weekly
│       └── 2014-10-13-files
├── hudeem
│   ├── daily
│   │   ├── 2014-10-13-files
│   │   └── 2014-10-14-files
│   └── weekly
│       └── 2014-10-13-files
├── profy
│   ├── daily
│   │   ├── 2014-10-13-files
│   │   └── 2014-10-14-files
│   └── weekly
│       └── 2014-10-13-files
├── rtfm
│   ├── daily
│   │   ├── 2014-10-13-files
│   │   └── 2014-10-14-files
│   └── weekly
│       └── 2014-10-13-files
├── domain
│   ├── daily
│   │   ├── 2014-10-13-files
│   │   └── 2014-10-14-files
│   └── weekly
│       └── 2014-10-13-files
├── worlddesign
│   ├── daily
│   │   ├── 2014-10-13-files
│   │   └── 2014-10-14-files
│   └── weekly
│       └── 2014-10-13-files
└── zabbix
    ├── daily
    │   ├── 2014-10-13-files
    │   └── 2014-10-14-files
    └── weekly
        └── 2014-10-13-files

[/simterm]

И в каждой директории – хранится архив с содержимым сайта за конкретную дату:

[simterm]

# ls -lh /home/setevoy/backups/rtfm/weekly/2014-10-13-files/
total 168M
-rw-r--r-- 1 root root 168M Oct 13 23:09 rtfm.co.ua.tar.bz2

[/simterm]

Скрипт делался под конкретную структуру директорий виртуалхостов:

[simterm]

# tree -L 2 -d /var/www/vhosts/
/var/www/vhosts/
├── freeproxy
│   └── freeproxy.in.ua
├── hudeem
│   └── hudeem.kiev.ua
├── profy
│   └── profy.kiev.ua
├── rtfm
│   └── rtfm.co.ua
├── domain
│   ├── forum.domain.kiev.ua
│   ├── postfixadmin.domain.org.ua
│   ├── domain.org.ua
│   └── webmail.domain.org.ua
├── worlddesign
│   └── worlddesign.org.ua
└── zabbix
    └── zabbix.domain.org.ua

[/simterm]

Вызывается по крону, каждый день в 4 утра (т.к. нагрузка на CPU во время сжатия большая):

[simterm]

# crontab -l
00 */12 * * * /usr/bin/sa-learn --spam /var/vmail/domain.kiev.ua/[email protected]/.Junk/
00 */12 * * * /usr/bin/sa-learn --ham /var/vmail/domain.kiev.ua/[email protected]/cur/

0 04 * * * /home/setevoy/opt/files_backup.py

[/simterm]

После каждого выполнения – отправляет письмо с уведомлением о выполнении и размере занятого дискового пространства:

[simterm]

# cat /var/vmail/domain.org.ua/[email protected]/cur/1413272219.M851594P3122.venti.domain.org.ua,S=829,W=855:2,a
Return-path: <[email protected]>
Envelope-to: [email protected]
Delivery-date: Tue, 14 Oct 2014 10:36:59 +0300
Received: from [127.0.0.1] (helo=venti.domain.org.ua)
        by mx0.domain.org.ua with esmtp (Exim 4.72)
        (envelope-from <[email protected]>)
        id 1XdwfD-0000oG-Dr
        for [email protected]; Tue, 14 Oct 2014 10:36:59 +0300
Content-Type: multipart/mixed; boundary="===============8677965755135430447=="
MIME-Version: 1.0
Subject: daily backup report
From: Backup Manager <[email protected]>
To: root <[email protected]>

--===============8677965755135430447==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit



    daily backup on venti.domain.org.ua finished.
    Total used disk space: 341 MB.


--===============8677965755135430447==--

[/simterm]

В самом скрипте все функции документированы:

[simterm]

# pydoc /home/setevoy/opt/files_backup.py | head -n 30
Help on module files_backup:

NAME
    files_backup - Create full (at Monday) or incremental backup files [VHOSTSPATH].

FILE
    /home/setevoy/opt/files_backup.py

DESCRIPTION
    Creation date: 13.10.2014
    Last modify date: 14.10.2014

FUNCTIONS
    arch_users_list(path)
        Call with VHOSTSPATH as argument.
        See 'run_backup()' help.

    arch_users_path(path, userlist)
        Call with VHOSTSPATH and userlist[] from 'arch_users_list()'.
        See 'run_backup()' help.

    arch_vhosts_names(vhosts_paths)
        Call with vhosts_paths[] from 'arch_users_path()'.
        See 'run_backup()' help.

    back_dir_create(user, day, bkptype)
        Creates new directory for each day;
        depending on data recieved from 'dir_bkptype()'
        can create 'weekly' or 'daily' directory.

[/simterm]

И сам скрипт:

#!/usr/bin/env python

'''
Create full (at Monday) or incremental backup files [VHOSTSPATH].

Creation date: 13.10.2014
Last modify date: 14.10.2014
'''

import tarfile
import os
import time
import sys
import heapq
import shutil
import logging
import smtplib
import mimetypes
import email
import email.mime.application
import socket

VHOSTSPATH = '/var/www/vhosts/'
BACKDIR = '/home/setevoy/backups/'
LOGPATH = '/var/log/backup.log'

# os.environ не сработает через cron
# HOST = os.environ['HOSTNAME']
HOST = socket.gethostname()

logging.basicConfig(format = '%(filename)s - %(levelname)-3s [%(asctime)s] %(message)s ', filename=LOGPATH, level=logging.DEBUG)

def dir_bkptype(day):

    '''On Sunday - create full weekly backup with 'full_backup()';
       other days - incremental backup with inc_backup()';
       also used in 'back_dir_create()' to create BACKDIR/weekly or daily;
       also used in 'clear_old_dirs()' for deleting old directories.'''

    if day == 'Mon':
        bkptype = 'weekly'
    else:
        bkptype = 'daily'

    logging.info('Today is %s, will use backuptype %s.' % (day, bkptype))

    return(bkptype)

def back_dir_create(user, bkptype):

    '''Creates new directory for each day;
       depending on data recieved from 'dir_bkptype()'
       can create 'weekly' or 'daily' directory.'''

    dirname = os.path.join(BACKDIR, user, bkptype, '%s-files/' % time.strftime('%Y-%m-%d'))

    if not os.path.exists(dirname):
        os.makedirs(dirname)
        logging.info('Directory %s created' % dirname)
        return(dirname)

def arch_users_list(path):

    '''Call with VHOSTSPATH as argument.
       See 'run_backup()' help.'''

    userlist = []

    for user in os.listdir(path):
        if not user == '.htpasswd':
            userlist.append(user)

    return(userlist)

def arch_users_path(path, userlist):

    '''Call with VHOSTSPATH and userlist[] from 'arch_users_list()'.
       See 'run_backup()' help.'''

    vhosts_paths = []

    for user in userlist:
        userdirpath = os.path.join(path, user)
        vhosts_paths.append(userdirpath)
    return(vhosts_paths)

def arch_vhosts_names(vhosts_paths):

    '''Call with vhosts_paths[] from 'arch_users_path()'.
       See 'run_backup()' help.'''

    vhosts_names = []

    for hostname in os.listdir(vhosts_paths):
        hostnamepath = os.path.join(vhosts_paths, hostname)
        if os.path.isdir(hostnamepath):
            vhosts_names.append(hostname)
    return(vhosts_names)

def inc_backup(dir_to_backup, backupname):

    '''Incremental backup. Call with [hostname] as dir_to_backup,
       and [archname] as backupname in 'run_backup()'.
       Walks through files and directories in [hostname],
       looks for data, modified in last 86400 seconds (24 hours).'''

    now = time.time()
    cutoff = 86400

    logging.info('Creating archive %s' % backupname)

    out = tarfile.open(backupname, mode='w:bz2')

    for root, dirs, files in  os.walk(dir_to_backup, followlinks=True):
        for file in files:
            file_to_backup = os.path.join(root, file)
            try:
                filemodtime = os.stat(file_to_backup).st_mtime
                if now - filemodtime < cutoff:
                    logging.debug('Adding file: %s' % file_to_backup)
                    out.add(file_to_backup)
                    logging.debug('File modified: %s' % time.ctime(os.path.getmtime(file_to_backup)))
            except OSError as error:
                logging.critical('ERROR: %s' % error)

    logging.info('Done.')
    out.close()

def full_backup(dir_to_backup, backupname):

    '''Full backup. Call with [hostname] as dir_to_backup,
       and [archname] as backupname in 'run_backup()'.
       Add all files and directories in [hostname] to [archname].'''

    logging.info('Creating archive %s' % backupname)
    out = tarfile.open(backupname, mode='w:bz2')
    try:
        logging.info('Adding directory: %s' % dir_to_backup)
        out.add(dir_to_backup)
    except OSError as error:
        logging.critical('ERROR: %s' % error)

    logging.info('Done.')
    out.close()

def clear_old_dirs(user, day, count, host, bkptype):

    '''Deletes old directories. Depending on [count] - can store 7
       (for daily) or 4 (for weekly) copies, including today's archive.'''

    dirname = os.path.join(BACKDIR, user, bkptype)

    last = heapq.nlargest(count, os.listdir(dirname))

    for i in last:
        logging.info('Saving previous data backups: %s/%s' % (host, i))

    dir_to_check = []

    for dir in os.listdir(dirname):

        if dir not in last:
            logging.info('Deleting old data backups: %s/%s' % (host, dir))
            dit_to_del = os.path.join(dirname, dir)
            shutil.rmtree(dit_to_del)

def get_size(path):

    '''Calculate all files size in [BACKDIR] directory.
       1048576 - for Megabytes, 1024 - for Kilobytes.'''

    total_size = 0

    for root, dirs, files in os.walk(path):
        for file in files:
            fp = os.path.join(root, file)
            total_size += os.path.getsize(fp)

    return(total_size / 1048576)

def sendmail(bkptype):

    '''Send email report to [to] with used disk space information.'''

    total_size = get_size(BACKDIR)

    sender = '[email protected]'
    to = ['[email protected]']

    msg = email.mime.Multipart.MIMEMultipart()

    msg['Subject'] = ('%s backup report' % bkptype)
    msg['From'] = 'Backup Manager <[email protected]>'
    msg['To'] = 'root <[email protected]>'

    body = email.mime.Text.MIMEText("""

    %s backup on %s finished.
    Total used disk space: %s MB.

    """ % (bkptype, HOST, total_size))

    msg.attach(body)

    smtpconnect = smtplib.SMTP('localhost:25')
    #smtpconnect.set_debuglevel(1)
    smtpconnect.sendmail(sender, to, msg.as_string())
    smtpconnect.quit()
    logging.info('Email sent from ['%s'] to %s.' % (sender, to))

def run_backup():

    '''userlist - data from 'arch_users_list()' in list:
                  ['rtfm', 'profy', 'freeproxy', 'worlddesign', 'domain', 'zabbix', 'hudeem'];

       userpaths - data from 'arch_users_path()' in list:
                  ['/var/www/vhosts/rtfm', '/var/www/vhosts/profy', ... '/var/www/vhosts/hudeem'];

       curday - return week day, like Sun, Mon etc;

       bkptype - backup type from 'dir_bkptype()' like 'weekly' or 'daily';

       backup_dir - directory, created by 'back_dir_create()'
                  like: /home/setevoy/backups/temp_new_test/hudeem/daily/2014-10-13-files/;

       virtual_hosts - data from arch_vhosts_names like: ['hudeem.kiev.ua'].'''

    userlist = arch_users_list(VHOSTSPATH)
    userpaths = arch_users_path(VHOSTSPATH, userlist)

    curday = time.strftime('%a')

    bkptype = dir_bkptype(curday)

    for hostdir, username in zip(userpaths, userlist):
        os.chdir(hostdir)
        logging.info('Working in: %s and user %s' % (os.getcwd(), username))

        backup_dir = back_dir_create(username, bkptype)

        if backup_dir:

            virtual_hosts = arch_vhosts_names(hostdir)

            for host in virtual_hosts:
                archname = (backup_dir + host + '.tar.bz2')
                # если используется вместе со скриптом бекапа MySQL баз
                # то увеличить х2 (4 = 8, 7 = 14)
                # т.к. там же находятся каталоги с бекапами баз
                if bkptype == 'weekly':
                    clear_old_dirs(username, curday, 4, host, bkptype)
                    full_backup(host, archname)
                else:
                    clear_old_dirs(username, curday, 7, host, bkptype)
                    inc_backup(host, archname)

        else:
            logging.info('Directory already present, skip.')

    sendmail(bkptype)

if __name__ == "__main__":
    starttime = time.strftime('%Y-%m-%d %H:%M:%S')
    logging.info('Backup started at: %s' % starttime)
    run_backup()
    finishtime = time.strftime('%Y-%m-%d %H:%M:%S')
    logging.info('Backup finished at: %s' % finishtime)
    logging.info('Total used size in %s - %s MB' % (BACKDIR, get_size(BACKDIR)))

Лог выглядит так:

[simterm]

files_backup.py - INFO [2014-10-13 23:06:29,145] Backup started at: 2014-10-13 23:06:29
files_backup.py - INFO [2014-10-13 23:06:29,147] Today is Mon, will use backuptype weekly.
files_backup.py - INFO [2014-10-13 23:06:29,147] Working in: /var/www/vhosts/rtfm and user rtfm
files_backup.py - INFO [2014-10-13 23:06:29,147] Directory /home/setevoy/backups/rtfm/weekly/2014-10-13-files/ created
files_backup.py - INFO [2014-10-13 23:06:29,147] Saving previous data backups: rtfm.co.ua/2014-10-13-files
files_backup.py - INFO [2014-10-13 23:06:29,148] Creating archive /home/setevoy/backups/rtfm/weekly/2014-10-13-files/rtfm.co.ua.tar.bz2
files_backup.py - INFO [2014-10-13 23:06:29,149] Adding directory: rtfm.co.ua
files_backup.py - INFO [2014-10-13 23:09:54,873] Done.

[/simterm]

Скрипт писался больше ради практики в Python, что бы совсем его не забыть, т.к. в работе писат ьприходится редко.