2011年12月18日日曜日

FacebookのCIのはなし

「Facebookには、テストサーバが存在しない。すべての開発者は本番サーバで直接開発・リリースしている。」という話があるようですが、少なくとも開発者ならこの話を文面通りに受け取る人はいないはずで、CIでバッチリ管理してるだろうということで、少し調べてみた。


まず、


  • http://www.quora.com/What-kind-of-automated-testing-does-Facebook-do
  • For our PHP code, we have a suite of a few thousand test classes using the PHPUnit framework. They range in complexity from simple true unit tests to large-scale integration tests that hit our production backend services. The PHPUnit tests are run both by developers as part of their workflow and continuously by an automated test runner on dedicated hardware. Our developer tools automatically use code coverage data to run tests that cover the outstanding edits in a developer sandbox, and a report of test results is automatically included in our code review tool when a patch is submitted for review.
  • For browser-based testing of our Web code, we use the Watir framework. We have Watir tests covering a range of the site's functionality, particularly focused on privacy there are tons of "user X posts item Y and it should/shouldn't be visible to user Z" tests at the browser level. (Those privacy rules are, of course, also tested at a lower level, but the privacy implementation being rock-solid is a critical priority and warrants redundant test coverage.)


テストちゃんとやってますよ、と。
Watirてのは初めて聞いた。こんどドキュメント見てみよう。

そして、

  • http://framethink.blogspot.com/2011/01/how-facebook-ships-code.html
  • ops team runs code releases by gradually rolling code out
  • facebook has around 60,000 servers
  • there are 9 concentric levels for rolling out new code
  • [CORRECTION thx epriest] "The nine push phases are not concentric. There are three concentric phases (p1 = internal release, p2 = small external release, p3 = full external release). The other six phases are auxiliary tiers like our internal tools, video upload hosts, etc."
  • the smallest level is only 6 servers
  • e.g., new tuesday release is rolled out to 6 servers (level 1), ops team then observes those 6 servers and make sure that they are behaving correctly before rolling forward to the next level.
  • if a release is causing any issues (e.g., throwing errors, etc.) then push is halted.  the engineer who committed the offending changeset is paged to fix the problem.  and then the release starts over again at level 1.
  • so a release may go thru levels repeatedly:  1-2-3-fix. back to 1. 1-2-3-4-5-fix.  back to 1.  1-2-3-4-5-6-7-8-9.
最小レベルはサーバ6台構成。概念が違うだけで、p1=テスト、p2=ステージング、p3=本番 でした。
サーバ多くなってくると、いろいろな工夫が必要になってくるね!

かねこ(・ε・)