We had a black out over the weekend, while I was away, so I decided to add a few more tools to keep me informed. See, when the power goes out the server can’t exactly let me know that it’s down, can it?
Fortunately, I have a slice at SliceHost which never goes down, so I decided to use that to monitor our in-house server. I put together a simple python script to do the job. It simply queries the HTTP Headers of the home page of each of our sites and emails me and sends a TXT to my cellphone if it gets anything other than a 200 status code.
The script itself is run by a cron demon every 15 minutes, but it won’t overwhelm my phone because I have a built-in Message Delay in the script.
#! /usr/bin/env python
from urlparse import urlparse
from datetime import datetime, timedelta
import os, platform, time
# Modify your settings here
SITES = ( "http://www.site.com", ) # Sites to monitor
MAIL_FROM = "email@example.com"
MAIL_TO = ["Twoway.*********@messaging.nextel.com","firstname.lastname@example.org"]
MESSAGE_DELAY = 24 * 60 # The time between alerts in minutes
LOCK_FILE = "remote_mon_lock" # Lock file used for alert staggering
def send_alert(status, site):
Send an email alert to MAIL_TO if the status is not 200
Use a Lock File to ensure that message don't get sent more
frequently than the MESSAGE_DELAY, to prevent large TXT bills.
if status != 200:
if not os.path.exists(LOCK_FILE):
old_mod_date = datetime.now() - timedelta(minutes=MESSAGE_DELAY + 1)
old_timestamp = time.mktime(old_mod_date.timetuple())
f = open(LOCK_FILE, 'w')
os.utime(LOCK_FILE, (old_timestamp, old_timestamp))
mod_date = datetime.fromtimestamp(os.path.getmtime(LOCK_FILE))
next_send_time = mod_date + timedelta(minutes=MESSAGE_DELAY)
if datetime.now() > next_send_time:
s = smtplib.SMTP()
s.sendmail(MAIL_FROM, MAIL_TO, "%s - %s" % (str(status), site))
for site in SITES:
url = urlparse(site)
error = ""
conn = httplib.HTTPConnection(url)
# Use a HEAD request to get the status code
status = conn.getresponse()
error = status.status
print "%d : %s : %s" % (status.status, site, datetime.now())
print "Connection Failed : %s : %s" % (site, datetime.now())
error = "Connection Failed"