SUBSCRIBE - [ Tech News ] [ Make Money Blogging Tips ] [ Online Marketing Tips ] [ Web Dev News ]
Powered by MaxBlogPress  

Facebook released “Scribe” to open source community

November 5, 2008 by MK  
Filed under Tech News

Facebook is known for playing safe and not being much of an open source supporter. But in recent months, the company has been quite vocal about its commitment to open source and rightly so on Friday, Facebook announced that a piece of internally created software, called “Scribe,” would be released back to the open source community.

So what is Scribe? Well, per a post on Facebook’s blog, it’s been instrumental in helping Facebook handle the enormous amounts of data that come through its servers. As the page for Scribe says, “If you use the site, you’ve used Scribe.” More specifically, it’s a “server for aggregating log data streamed in real time from a large number of servers…designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine,” which means that the average Facebook user won’t have much use for the newly open-sourced product.

According to Facebook - they were collecting a few billion messages a day (which seemed like a lot at the time) for everything from access logs to performance statistics to actions that went to News Feed. Facebook used a variety of different technologies for the different use cases, and all of them were bursting at the seams. So they decided to build a unified system (called Scribe) to handle all of these cases.

The Scribe servers are arranged in a directed graph, but each server only knows about the next server in the graph. This flexible topology allows for things like adding an extra layer of fan-in if the system grows too large, and batching messages before sending them between datacenters, but without having any code that explicitly needs to understand datacenter topology, only a simple configuration.

When you’re building something that looks like a logging system there are a lot of things people expect: logging levels and rules about when they get sent, timestamping and ordering of messages, schemas for common messages, etc. Facebook decided that this was a can of worms that shouldn’t be mixed up with the asynchronous and mostly reliable delivery of data, so they made the data model very simple. A message is two strings: a category and the actual message. The category is the description of what the message is about, and the expectation is that messages of the same category end up in the same place. The message is the actual data to be logged. Facebook also don’t have any a priori list of categories that must be maintained. If you create a new category it shows up at a new file. This is following the Unix philosophy of doing exactly one thing and doing it well, and it has definitely paid off in ease of use and development. They started with four or five use cases in mind and now they have hundreds, but they didn’t have to modify the Scribe source for any of them.

What’s particularly interesting to me about Scribe is the fact that it was built using another open source tool developed by Facebook called Thrift.  According to Facebook, Thrift is its a software framework for scalable cross-language services development.

The release of Scribe is also, in a sense, a message to some of the critics who’ve been skeptical of Facebook’s ability to keep its infrastructure humming along at a reasonable cost now that it has more than 100 million active users sending messages and uploading photos around the clock. By releasing Scribe as open source, Facebook is effectively saying, “Not only can we come up with something to run our site efficiently, we’ll let you see it, too.”

I am not really sure how helpful it will be for us developers. Hopefully Facebook will have more open source releases in the coming months.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google
  • LinkedIn
  • Reddit
  • StumbleUpon
  • Technorati
  • TwitThis
  • Yahoo! Buzz