One of the features that Lync 2010 introduced was DNS load balancing. In this scheme, requests for an FQDN can return multiple entries, and Lync will send requests to one of the pool endpoints. There’s some decent information on technet about the basic idea with Lync servers, but there’s actually very little out there on how to use this scheme with UCMA applications. In theory, it’s supposed to work the same as for servers, but it’s not necessarily evident how to get there.
Provisioning your app
A lot of times, we’ll run New-CSTrustedApplicationPool and pass the machine FQDN in as the pool FQDN, usually because the app is only going to run on one server at a time. In this case though, we’ll actually have a pool FQDN and a separate ComputerFQDN like this:
PS C:\Users\cbardon> New-CsTrustedApplicationPool -Identity ucmapool.rnddev.comp
uter-talk.com -Registrar lync2010.rnddev.computer-talk.com -Site 1 -ComputerFqdn
ucma1.rnddev.computer-talk.com -RequiresReplication $false
This creates a new pool FQDN, as well as adds the first machine to it. Next, you’ll need to add the rest of the machines to the pool like this:
PS C:\Users\cbardon> New-CsTrustedApplicationComputer -Pool ucmapool.rnddev.comp
uter-talk.com -Identity ucma2.rnddev.computer-talk.com
Which you repeat for each of the servers that you want in the pool. Then, configure applications and endpoints the same as with a single computer pool-those still work exactly the same way as before.
A catch for manual provisioning
Now, if you’re using automatic provisioning, then you should be able to skip on to the next section. If you noticed the –RequiresReplication $false flag in the pool configuration though, you’d realize that this example uses manual provisioning, which is useful for cases where your app server can’t be joined to the Lync domain. This means specifying some extra parameters when creating your platform though, including the GRUU. When you created your application, you may have noticed that the output looked something like this:
Note that I have a service GRUU, as well as ComputerGRUUs for each machine in the pool. For a single computer pool these are the same, but now each individual machine in the pool has it’s own GRUU as well as the one for the service as a whole. When creating the platform, use the computer GRUU on each app server. You’ll also want to use the Pool FQDN as the application FQDN.
The next deviation in the procedure comes when you request a certificate for your application servers. Normally, the subject name of the cert needs to be the FQDN of the application server, but in this case, the subject needs to be the pool FQDN. The certificate you request should be the same on all application servers (so mark the keys as exportable), and should contain the pool FQDN and individual machine FQDNs as Subject Alt Name entries. This creates a bit of a maintenance headache for adding new capacity, but it’s reasonably easy to request new certs if you control the CA. What you’ll end up with is something that looks like this in the local machine store:
and the SANs:
If you’re using the web enrolment tools to request certificates from a windows CA, you can specify the SANs by putting something like this:
in the Attributes field. I always forget the syntax of this one…
Fun with DNS
At this point, the configuration for Lync and your app is done, so all that remains is the DNS configuration. Normally, this involves an A record for the app server FQDN, but with load balancing there are a few other things that need to change. Note-I’m writing this using a windows server 2008 R2 DNS server, so the settings may be different if you have a different DNS. Basically, we need two things in DNS: an entry for each server, and an entry for the pool that resolves to each server’s IP address. In my example, I have this in DNS for my pool machines:
Now, by the end of this, you want to be able to go to any machine in your network and do this:
Note that the ping command went to different servers each time, and that the nslookup command returned both entries in different orders. This is important-this means that DNS is working the way it’s supposed to. Unfortunately, the defaults in DNS might cause it to not work this way, so here’s what you may need to change:
DNS Server properties
Right click on the DNS server and bring up the advanced property page:
You’ll want to make sure that Round Robin is enabled, and that netmask ordering is disabled. Actually, disabling netmask ordering isn’t essential, but it’s a good idea if you want a “real” load balancing scenario. Basically, netmask ordering is an optimization that says that if you’re in a subnet (say 201.X) and get two entries for a DNS query in different subnets (e.g. 201.1 and 202.1), that you should bias towards the closer result. The result of this is that an nslookup query will return the entries in the same order every time.
Time to Live
In most cases, DNS caching isn’t a problem. The address for a service rarely changes, so as an efficiency, Windows remembers the DNS results for particular lookups. Most DNS servers also remember results for servers that they forward requests to, which can make it very difficult to actually get a change to propagate out when you want to make one. For load balancing, it’s even more troubling, since it depends on returning different results for each query. The test with subsequent PING requests would have each request going to the same server if caching is enabled, which you can verify by running ipconfig/flushdns to clear the local resolver cache, or disabling the cache service completely. The other way to ensure that your DNS records get re-queried each time though, is to set the Time To Live on the records themselves. For some reason, this setting is hidden in the Windows DNS server. Under the view menu check “Advanced”:
And then open your pool entries. You should see a new field for TTL at the bottom of the page:
The TTL is set to an hour by default. Change this value to 0. You may need to clear the DNS cache on the DNS server and the Lync server, but at this point you should be able to ping your pool FQDN and get different results back each time.
Finally load balancing?
Now you’re able to start your application instances, which should both register with Lync. Place a call with both instances running, and one of them will answer. Shut an instance down, and your call is answered by the other instance. This makes your app fault tolerant for sure-anytime an instance is down, calls will go to the other instances, but is this actually balancing any load? If you modify your app so you can identify which instance you’re actually talking to, you’ll notice something odd-calls are usually always going to the same instance of your app. There are some answers to why this happens, but that’s a blog post in and of itself. Part 2 will go into how load balancing actually works in Lync, and some techniques to get it to work the way you want it to.