Friends of friends – performant social network implementations

I did some research on this because I was curious how Facebook handles their huge amount of data and search it in a quick way because I’ve seen people complaining about custom made social network scripts becoming slow when their user base grows. After I did some benchmarking myself with just 10k users and 2.5 millionen friend connections – not even trying to bother about group permissions and likes and wall posts – it quickly turned out that this approach is flawed. So I’ve spent some time searching the web on how to do it better and came across this official Facebook article:

I really recommend you to watch the presentation of the first link above before continue reading. It’s probably the best explanation of how FB works behind the scenes you can find.

The video and article tells you a few things:

  • They’re using MySQL at the very bottom of their stack
  • Above the SQL DB there is the TAO layer which contains at least two levels of caching and is using graphs to describe the connections.
  • I could not find anything on what software / DB they actually use for their cached graphs, I  think I’ve seen somewhere that they’ve used memcache but don’t know they’re still using it. I doubt it.

Let’s take a look at this, friend connections are top left:

enter image description here

This is a graph. It doesn’t tell you how to build it in SQL, there are several ways to do it but this site has a good amount of different approaches.

Also consider that you have to do more complex queries than just friends of friends, for example when you want to filter all locations around a given coordinate that you and your friends of friends like. A graph is the perfect solution here.

I can’t tell you how to build it so that it will perform well but it clearly requires some trial and error and benchmarking.

Here is my disappointing test for just findings friends of friends:

DB schema:

Friends of friends query:

I really recommend you to create you some sample data with at least 10k user records and each of them having at least 250 friend connections and then run this query. On my machine (i7 4770k, SSD, 16gb RAM) the result was ~0.18 seconds for that query. Maybe it can be optimized, I’m not a DB genius (suggestions are welcome). However, if this scales linear you’re already at 1.8 seconds for just 100k users, 18 seconds for 1 million users.

This might still sound OKish for ~100k users but consider that you just fetched friends of friends and didn’t do any more complex query like “display me only posts from friends of friends + do the permission check if I’m allowed or NOT allowed to see some of them + do a sub query to check if I liked any of them“. You want to let the DB do the check on if you liked a post already or not or you’ll have to do in code. Also consider that this is not the only query you run and that your have more than active user at the same time on a more or less popular site.

I’ve started experimenting with OrientDB to do the graph-queries and mapping my edges to the underlying SQL DB. If I ever get it done I’ll write an article about it.

Conclusion: Implementing a social network is easy but making sure it performs well is clearly not – IMHO.

Loading static JSON data from the rendered page into AngularJS

In our scenario we didn’t wanted to build a whole single page application but instead needed to deal with data after the post was send to the server but didn’t save for some reason, so the page is rendered again and the data shown again as well. This caused the Angular controller to lose it’s set data.

I was looking for a way to prevent this and injecting any kind of data into my Angular app. I finally came up with a small directive that will read the json data, decode it and set it to a given scope variable.

This is the small directive that will load your data in your controllers scope:

In your applications code, in this case php, you can now do this:

If any one knows a better way to deal with this scenario I’m open for any suggestions and criticism!

CakePHP3: When to use Elements, Helpers or View Cells?

This has been asked on Stackoverflow and I decided to turn my long answer in a Blog post as well. This article will explain when to use the different parts of views the best. This might be opinionated but it is based on writing clean and re-usable code.

Elements

Use it when you need to repeat presentation related stuff, usually HTML, a lot. For example I have a project in which three tables use records of an addresses table. The form part of all of these three that contains the address data is an element. There is no logic at all in this and I think if it is just a very simply if that wraps an element call or something similar you don’t need to come up with a helper for that, it’s just to basic. Elements are basically “dumb” snippets that repeat through the site.

Helpers

Use it to encapsulate view logik, don’t put HTML in it if possible or other presentation related things. For example let it do something and depending on the result you can use an element of that result type to render the data: `return $this->_view->render(‘items/’ . $type . ‘_item’);`

You might say now “But the Form- and HtmlHelper contain HTML…” well, they do but look at their code how they deal with it. If you look a the HtmlHelper for example you’ll see a property $_defaultConfig[1]:

These are the template strings that are used to generate the HTML output. This separtes the markup pretty nice from the actual code that generates the final output. Take a look at the FormHelper as well, it’s using widgets to render more complex output. See this section “Adding Custom Widgets“[3] of the official documentation.

So this works fine with element like pieces of markup. By a rule of thumb I would say if your markup is longer than what you see there make it an element and call it from within the helper or make it a widget.

View Cells

Think of view cells as “Mini MVC” stacks that have a view and can load multiple models. They’re IMHO similar to AngularJS directives if you’re familiar with them. See this article for an example [2]. I really suggest you to read it, it explains them and their use cases in detail.

I haven’t done much with them yet but they can be used to replace requestAction() calls for example. You won’t “pollute” your controller with methods that are not intended to be access by a request. Taken from the linked article above:

One of the most ill-used features of CakePHP is View::requestAction(). Developers frequently use this all over their applications, causing convoluted cases where you need to figure out if you are within a web request or an internal action request, cluttering controllers. You also need to invoke a new CakePHP request, which can add some unneeded overhead.

Disclaimer

The above reflects my personal view on these things, there is no ultimate and final rule how you have to use these three things. The goal is always clean and re-useable code and proper separation of concerns. How you archive that is up to you, you’ve got the tools.

[1]: http://api.cakephp.org/3.0/source-class-Cake.View.Helper.HtmlHelper.html#54
[2]: http://josediazgonzalez.com/2014/03/20/view-cells/
[3]: http://book.cakephp.org/3.0/en/views/helpers/form.html#adding-custom-widgets