Will Larson's Lessons from Digg v4 catastrophic launch229 2018-07-26 11:54
Digg v3.5 to v4 catastrophic launch was actually pretty ambitious. Digg had been devastated by Google’s Panda algorithm update. And Launching v4 was their chance to return to their rightful place among the giants of the Internet.
So there is no rollback plan and unnexpected scale problems occurred.
- there was Cassandra bottleneck -> implemented write-through-cache memcache
- however, MyNews page was still broken every four hours
- rewrote MyNews in Redis and keep deleting the excess data in secret to keep the site running
- it took a month to track down the bug in the Python tornado backend service, and there is some kind of API that uses mutable default value as an argument like
def get_user_by_names_or_ids(names=, ids=)
This causes memory leak - If you mutate those params, the mutations span across invocations. Accumulated data even blows up the memcache clusters.