This post is dedicated to the discussion around whether if you need to load weigh the Topology service app in a multi farm scenario where you have a farm that consumes Services from another farm.
I will try to reference the various blog posts that are about the same subject and at the same time show you a detailed diagram that outlines it all.
The discussion I want to trigger here is two fold:
  • First of all do we need to load weigh the topology service to achieve higher availibility?
  • Secondly, I have seen contradictions in the blog articles. Some people claim the Topology service is responsible for load balancing across service app instances whereas other people claim that these end top urls are stored in the cache of the service app proxies on the consumer farm.
Now let me stop throwing terms and start right there at the commencement by describing a common real world scenario.

Real World Scenario: cross farm services

Assume you want to centralize your Search, User profiles and Corporate taxonomies. What would you do? Possibly you are plotting to host different farms like a dedicated farm for your intranet, a specialized farm for your Departments where they can launch apps and a commodity farm for basic, standard team sites. To allow these farms to share the services you will need to introduce another farm often referred to as the endeavor services farm.
Imagine you have made an endeavor services farm which has two Attention servers. Take an example: the managed metadata service app. Typically you will have a service instance running on each Attention server. Both URIs will be published by the Topology service app to each consuming farm. Secondly the consuming farms will push there URIs to the URI cache of each Service App proxy. This may sound complicated but if you look at the diagram not more than you force be better able to know it.

Example: detailed overview Shared Services

There are some steps involved because you are able to consume services between farms. You will need to setup a trust and next publish the services.
After you have done that you are able to consume the services on the Consuming farm.

To illustrate: if you have a webpart that makes a call to the managed metadata service app, it will use the service app proxy to determine the end-top of the app. The first time the service app proxy will retrieve the URIs from the topology service app at the Publishing farm. After that it will cache the end points in its own URI cache. Its own load balancing component is by both URIs. In case one of them does not respond, the URI will be taken offline.

Secondly there is a timer job, called the Attention address refresh which calls the Topology service app on the Publishing farm every 15 mins. On the Publishing farm, the Topology service app discovers which end top URLs are available and income those URLs to the Consuming farm. At the Consuming farm these URLs are pushed to the local Service App proxies in case they are updated or deleted.

The Contradictions

If you are reading some blog posts you force already be wondering if I am telling you the truth. Fact is, I have stumbled upon some contradictions.

The post written by Russ Maxwell: http://blogs.msdn.com/b/russmax/archive/2010/05/06/sharepoint-2010-shared-service-architecture-part-2.aspx, shows you that after the proxies have cached the URIs locally, they reach out directly to the Service Apps, not requesting the URIs first through the Topology service app.

But, some posts like the ones from Steve Peschka: http://blogs.technet.com/b/speschka/archive/2011/01/04/additional-info-on-load-balancing-the-sharepoint-2010-topology-service.aspx and the one from Josh Gave: http://blogs.msdn.com/b/besidethepoint/archive/2010/12/08/load-balancing-the-sharepoint-2010-topology-service.aspx make it appear if the Topology service is called every time a Service app proxy needs to connect to the Publishing farm. If that would be the case then we force have a single top of failure!

Single top of failure

The topology service is often published by a server NETBIOS name and a port number (32844 I judge).. That would indeed mean that we have a single top of failure.

Now, what happens if this service app goes down? To my opinion it means that the Attention address refresh job will not be able to update the URI endpoints anymore. Secondly, you would not be able to publish new Service Apps to consuming farms anymore. Finally, at the consuming farm, Service apps would still be running fine as the Service app endpoint URLs are fetched from the URI cache of the proxy. The internal load weigh component of the Service App proxy would recognize if an endpoint is down and take it offline automaticaly.

Summarized
What does this all mean? To my opinion it would mean that if the Toplogy service app on the Publishing farm goes down you would still have some time to fix the issue and bring it back online. I do not reflect that the Service Apps that are consumed will be interupted. Although I must admit I haven’t tried it out myself yet.

Your input here!

The question is, do we need to load weigh the Topology service app on the Publishing farm? Personally, I would say that if your farm is offering services globally 24×7, then yes. You probably do not want to get out of bed in case things go incorrect. On the other hand it all depends on your service level and the choice is up to you.
Also, the contradictions still wait: is the Topology service being called every time by the Service app proxies on the consuming farm, or do they rely on their URI cache.

Fact is, I may have it all incorrect! ;-) To be on the safe side, I will go through the tests of bringing down the Topology service and have a close look on the ULS log, filter: category = Topology, message = WcfSendrequest..

I have written this blog post to get your input on this! Did you try this out yourself? What are your experiences? Input is more than welcome. Thanks!


Check it out:Servé’s SharePoint Blog