Response from Site5

I just received a response from Site5 regarding the recent unplanned outage. There are still a few points of contention I’d like to iron out with them, but for now I’ll post our most recent interaction:

#####

Dear Peter,

Thank you for taking the time to contact the Site5 management team with your concerns. I am happy to hear that the issue has already been resolved and I am extremely sorry for any inconveniences that this server’s recent downtime may have caused for you. As per your request I have reviewed your ticket and I will do my very best to thoroughly address all of your questions:

1) What was the cause of this issue? Specifically, what “problem” occurred with the IP bindings and why did it occur?

– Essentially, the on-duty systems administrator (David K.) reported that he needed to reboot the server due to restart services after making some tweaks to lower the server’s load. When the server was rebooted there was a very odd error where the IPs didn’t bind to the server. He noticed this almost immediately and went to work repairing it. He restarted ipaliases, which is how you would normally fix this issue, however only some IPs came online and it showed no errors whatsoever. Therefore, David believed the problem was resolved because there were no errors and most of the IP addresses did bind to the server without a problem.

2) Did this issue affect only prwdot.org, or were other customers’ domains affected?

– Well when the server was rebooted, everyone was down. But the reboot itself doesn’t take very long. When the server came back online and had the IP bind issue, I believe most clients were down for about a total of an hour.

3) Was this issue detected by Site5’s monitoring systems?

– Yes, David K. detected and reported this issue almost immediately.

4) If this issue was detected by Site5’s monitoring systems, why was it not addressed immediately?

– It certainly was, but it was an odd problem where the system showed no errors and only certain clients still experienced problems until they reported them.

5) If this issue was *not* detected by Site5’s monitoring systems, why was it not detected?

– Not applicable.

6) What will you be doing in the future to ensure that this type of issue does not come up again?

– This should never have happened in the first place. If the server does need to be rebooted again, the server had a comment on it from our staff explaining what happened the first times so that this specific problem can be avoided in the future. Whoever reboots the server will be responsible for making sure that the IPs properly binds to the server.

7) What will you be doing in the future to ensure that this type of issue is properly monitored and responded to in a timely manner?

– Unfortunately, due to the nature of this incident, even though the problem was properly monitored and we responded very quickly, some clients experienced a prolonged outage. Besides noting what happened the first time and making sure that the staff team knows what to do if it happens again, there is not much more that we can do. On the other hand, we have recently hired two additional support staff members to help cover our late night and early morning weekend shifts which are usually stretched pretty thin. This means that all tickets and problems that happen over the weekend will be responded to faster and more thoroughly which will help prevent delayed responses to support tickets and will allow us to quickly resolve more and more issues. We are working very hard on improving Site5’s overall level of customer service and this is a huge (necessary) step in the right direction.

Once again Peter I am extremely sorry for the downtime that this server problem has caused for you. I see that you already have a ticket open with the billing department and that is good – you should receive credit for this outage. If you have any additional questions or comments about this issue, or if you would simply like to discuss the situation further, please do let me know. Thank you kindly for your continued patience and understanding, I hope that you have a wonderful week!

Best regards,

Brendan Diaz
Customer Service / Retention Lead
Site5 Internet Solutions, Inc.

#####

I’ll keep you posted. Hopefully they will respond positively to some suggestions I’ve made in regards to their monitoring practices.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: