Go back to Plurk.com

Welcome to PlurkPlurk

Down with the Sickness – A Quick Explanation on Downtime Over the Past Month

February 4, 2009

In late December/early January, many of you were understandably upset and had asked for explanations regarding our instability and frequent downtime during what was a rather shaky holiday season for Plurk. Admittedly, we caught the sniffles and those holiday colds are not so easy to shake off. Of course, like all our users, we hate downtime; especially since all of us here at the A-Team are Plurk zealots much like you. Hopefully this quick recap will shed some light on the problems we had related to the downtime and slow service over the prior month and give some guidance on what we’ve done to counter it.

As a summary, these were some of the issues we faced:

  • We had servers scattered around our datacenter. This often made it difficult for various servers to talk to each other easily (extra overhead, multiple points of failure, etc.).
  • Disk and filesystem space issues.
  • MySQL replication issues, database log storage constraints.
  • Server migrations and consolidations.

What have we done to rectify these problems?

  • We consolidated all our servers into connected racks which lie next to each other: During the January 13 (GMT) downtime we migrated most of our previously scattered servers.
  • Enabled more proactive server monitoring. We’ve tuned our notification thresholds and started to monitor a whole slew of new parameters to alert us early so we can quickly fix potential problems before they get out of hand.
  • Updated MySQL; optimize our tables; monitor disk usage and purge binary logs as soon as the log space usage reaches a certain threshold.
  • Make our cache servers more robust so you’re served content faster without long wait times.
  • Kernel tuning to increase network performance.

We’re still continuing to work everyday on improving our infrastructure and managing our growth to keep issues to a bare minimum in the future. Of course with how passionate you guys all are, I know even a few minutes of downtime can seem like an eternity, so when we do hit snags and have prolonged outage windows, we get just as panicky as you, if not much more so.  All that said, we’ve been sailing very smoothly (knock on wood!) and look for a wonderful February with you all!

Posted by rlim
Plurk Labs - news about plurkland
Share |

RSS feed