Building a standalone Lync server, or, how to write UCMA applications on a plane

One of the difficult things about writing applications using UCMA is the fact that you need to connect to Lync in order to run or debug any of your code.  In fact, since you can’t connect UCMA applications through the edge server, you need direct access to the front end, which probably means VPN connectivity for any remote work.  On top of that, if you want to be able to provision and debug things on the server side, you’ll need administrative access to the Lync server, so it’s likely that there’ll be a separate development lab environment set up apart from your company’s everyday Lync deployment.  In the ideal case, each developer would have access to their own personal Lync sandbox, since then they could write and test whatever they needed to without impacting anyone else. 

Over the past few years, I’ve run into a few people that have built monster laptops that ran Hyper-V and a full Lync stack, but I’d never tried putting one together myself.  Last week though, I finally got the chance, and while it does work, there are a few pitfalls that I found while trying to get everything going. 

First, the hardware.  For this application, I picked up a Samsung 700G7A gamer laptop from Best Buy for about $2200.  I needed something that had a decent processor (Core i7) enough memory (16 GB) and a bunch of hard disk space (1.5 TB).  Having built a few Hyper-V servers before, I probably wouldn’t go with much less than 16GB of memory, especially if you want to run Lync, Exchange, perhaps SharePoint, and then do development on top of that.  The dual hard drives are also probably optional, but you will need a lot of storage for something like this, so 500 GB is probably a good minimum.

Once I got my hands on the hardware and did a quick check to make sure that nothing was broken from the factory, I proceeded to wipe out all of the OEM software and disk partitions and install Windows Server 2008 R2 (Standard) as the new host OS.  Since all of the servers you’ll need to run are 64 bit only, that means running Hyper-V, since Windows Virtual PC can’t run an x64 guest.  We’ll also need a domain controller for Lync, so the host, in addition to being the main development server will also be the DC. 

The initial install of the server OS was fairly straightforward, but once it finished, I ran into the first problem.  Most of the time, after installing a new server, connecting to Windows Update will manage to grab all the relevant signed drivers for a system.  In this case though, there was a whole list of known and unknown devices in the device manager that had no drivers, including the graphics card, Bluetooth module, and wireless adapter.  I suppose we’ve been spoiled by having easily updated drivers from a central location for a few years now, since it wasn’t that long ago that every piece of hardware had its own install disc, so the solution here was to search the web for all the relevant drivers.  For the most part, the Windows 7 drivers did work on Server 2008 R2, but in a few cases there were no drivers available.  Fortunately, enough did end up working that I don’t think it’ll be a major issue-the only major features on the laptop that aren’t working as advertised now are the mute key and the wireless toggle.  The takeaway here though is that before installing a server OS on a consumer laptop, take careful inventory of the hardware, and have as many drivers handy as possible.

As for features on the host OS, another discovery was that even with the driver installed, Windows Server will not connect to a wireless network out of the box.  Under the server manager, you need to add the “Wireless LAN service” feature to enable it.  While you’re at it, add “Desktop Experience”, “.NET 3.5.1” and “Telnet client” to the system (Telnet is still useful for troubleshooting connectivity). 

Once those are installed, it’s time to start with roles on the main OS.  First, install Active Directory Domain Services (you’ll need to do this on its own) and create a brand new domain, which in my case was “icedemo1.local”.  Once this role is set up, you can also go ahead and make sure that Web Server, DNS Server, Active Directory Certificate Services, and HyperV are all installed on the host.  This will probably take a bunch of reboots, so it’s a good thing to do while you have something else to keep busy with.

Eventually, after all the roles are installed, patched, and ready to go, you can start with the virtual machines.  I created three guest servers to start with-a Lync Standard Edition front end, a UCMA application server, and an Exchange server with Unified Messaging.  Creating those is fairly standard HyperV/Lync deployment-install the server OS, join the new domain (icedemo1.local), and then follow the Lync/Exchange install guides-so I won’t go into too much detail, but what is interesting about this isolated scenario is the way that networking is set up.  Since I wanted to have everything on its own isolated subnet (I created IPs in the 10.0.146.X range), I assigned static IPs to all my guests and created entries in the host DNS.  I still wanted the host (and guests) to be able to connect to the internet if it was available though.  Since my host was a DC, it needed a static IP (10.0.146.1), and I could easily assign an extra static IP to connect to the internet, but the catch would be manually reconfiguring this IP for every network that I wanted to get access to.  Ideally, I wanted to have a connection set up with DHCP for the internet, as well as a static IP for the virtual machines, which I managed to do using internet connection sharing.

Set up the main adapter (wired or wireless) for DHCP.  Then, in Hyper-V, create a new virtual network with the “Internal Only” option, assign this new virtual network adapter the static IP address, and connect the VMs to this adapter.  Note that this virtual adapter should also have the preferred DNS server set to the static IP address (e.g. 10.0.146.1) and not the default localhost (192.168.0.1)-if you don’t do this, then DNS queries on the host OS will fail if you’re not connected to a network.  Finally, on the main adapter, allow internet connection sharing with the virtual network-on the Sharing tab, check off both “Allow other network users to connect to the internet though this computer’s Internet Connection” and “Allow other network users to control or disable the shared Internet connection”.  Unfortunately, enabling connection sharing on one connection/adapter disables it on others, so it’s not simple to flip between wired and wireless networks, but other than this one limitation, everything appears to work as expected. 

After setting up Lync and Exchange, as well as some client software on the host OS, I had a small isolated domain that I could finally run UC apps against.  The Lync client on the host server could place a call into a UCMA application, and I could send IMs between two virtual clients.  Multiple voice endpoints, however, were another matter.  If you need to be able to have two users in a voice call at the same time, it may not work as well as you’d expect.  While there are options in remote desktop now to redirect microphone and speakers from a remote machine, I have never been able to get this to work with a virtual machine on Hyper-V.  Windows Virtual PC on Windows 7 has a great feature where USB devices can be redirected to a VM, so I’ve been able to have two USB speakerphones dedicated to different Lync instances that way, but this feature doesn’t exist on Hyper-V yet.  I also tried a couple of different USB sharing utilities, but none of them were able to share my Jabra conference phone to a virtual machine. 

So far, the only way I’ve managed to get two endpoints on an audio call at the same time is to log two users into the host operating system and switch between them.  This mixes the audio from both calls through the same device though, so it won’t sound great, even if you have each session’s Lync using a different audio device.  Still, if you only need to get the users in a particular state to test a scenario, then this will suffice. 

In the end, this is a great way to set up a simple Lync development or demo environment that can run off the grid, and also a good training tool to give to someone to learn Lync configuration in a sandbox environment.  It’s not that much of a stretch to include a connection to an external gateway to make and receive PSTN calls, and you could probably even set up an edge server and federate to other domains (although without a DMZ).  As for improvements, USB redirection would be great to have, as would a better solution for sharing internet connectivity between the host and guests (although this may just be something I haven’t found yet).  I’d also be interested to try again with a high end machine from Dell or Lenovo to see if the driver situation is any better, since the Samsung I’m using was obviously never intended to run a server OS. 

Posted in Uncategorized | 2 Comments

Troubleshooting the Lync Mobile Client

Last night, I noticed that Lync wasn’t able to sign in on my phone for some reason, while it was working fine earlier.  What’s interesting is that I’d seen earlier in the day that under the settings there was a Diagnostic Logging option, similar to what exists on the desktop client.  I’d turned it on already (after all, who doesn’t like diagnostics), but I hadn’t figured out how it actually saved/sent logs yet.  As it turns out, if you go to the about page, there’s a “Send Diagnostic Logs” button.  I’d seen this kind of thing before on the tanjay phones, and that created an etl file that got put on a sharepoint server, so I expected something similarly convoluted here.  Instead, I got instructions about an image. 

As it turns out, the client team takes all the logging information and embeds it into a jpeg.  When you send logs, it asks you to attach an image to an email that looks like this:

image

Once you get the email, save the image, change the extension to .log, open it in a text editor, and you’ll see the logging information right below the binary image data.  As it turns out, the problem was a 500 getting returned from the mobility web service (which an iisreset solved), but the key here was figuring out how to get the logs out of the mobile client.  I’m not sure how end users are going to cope with attaching images to emails or even enabling logging (it’s disabled by default), but it’s great that the feature exists for admins, and it’s actually kind of an ingenious way to get log data off a mobile client. 

Posted in Uncategorized | Leave a comment

Deploying the Lync Mobility service

Last week, the bits for the Lync mobility service were released, and yesterday the Windows Phone client went live in the marketplace.  The official documentation that comes along with the download is pretty good in this case, and there are already some other blog posts that cover the deployment in detail, which is great to see.  While it would have been nice to not have to need an extra server component, it does make sense that you’d need to abstract the SIP stack away from the client, and the way that piece works makes sense.  What I don’t understand though is the need for yet another autodiscover service. 

For the mobility clients to work, they’ll try to access lyncdiscover.domain.com (or lyncdiscoverinternal from inside your network), which is a new service that returns the location of the Lync server mobility service.  Now, why that address couldn’t be inferred from the existing DNS SRV records, I’m not quite sure yet, but the bottom line is that this new service requires new reverse proxy rules and certificate changes.  Fortunately, there’s a way to set things up that doesn’t require any changes to the public certificates on the reverse proxy, but it still seems like more work could have been done to make the deployment more streamlined.  Of course, given this new service, it’s quite possible that the next release of Lync will do away with the SRV records completely, and just use the discovery service, which seems to resemble the one in Exchange, for everything.  Either way, getting mobile clients up and running isn’t really a trivial change. 

As for the deployment itself, most of the documentation is relatively clear on how things work.  What isn’t explicit is whether or not you need to restart any of the Lync servers at any point (other than to install CU4), however I did notice that after getting to the verification step (running Test-CsMcxP2PIM) on Friday, I was getting authentication failures.  I rebooted the front end over the weekend though, and on Monday, the same test worked, so perhaps a restart of the service is required after all.  Also, it’s worth noting that you can’t run Test-CsMcxP2PIM using the same account for the sender and receiver (as Ken Lasko and I figured out while we were both trying to deploy the service).  Of course, with anything edge related, the hardest piece to get right is all of the firewall and NAT rules that exist between the internal and external networks, and making sure that traffic is allowed where it needs to be.  Internal users now need to be able to resolve (and connect to) the external web services address, and if you’re bypassing certificate changes on the reverse proxy, port 80 might have to be opened and allowed through the DMZ on the IP that lyncdiscover is listening on.  I piggybacked on the same IP that all the other external web services were listening on, and it also looks like using the exchange autodiscover IP will be a convenient way to get things working quickly. 

Posted in Uncategorized | 4 Comments

Troubleshooting UCMA applications-Lync, DNS entries, and failed calls

One of the most frustrating things that you can run into when working with UCMA is starting your application, placing a call (or IM) to it, and having that call fail.  As far as your code is concerned, everything is great.  Calls worked yesterday, calls work for other developers, but for some reason, your call is failing.  At this point, the only real way to track down what’s going on is to go on the server and run OCSLogger.exe and trace the call.

At this point, you may notice that Lync is routing calls to the wrong IP address, which can happen for several reasons, usually to do with something like switching from wired to wireless, or going on or off a VPN.  If you go to a command prompt and run an nslookup on your machine, you may even see the same, incorrect address coming back.  If you run the same nslookup from the lync server, you’ll almost definitely see the wrong address coming back from DNS. 

The solution to this is reasonably simple.  From your dev machine, run “ipconfig /registerDNS”, which forces your DNS registration to update with your actual IP address.  Then run “ipconfig /flushDNS” and then nslookup to verify that the new address is being returned.  Do the same on the lync server (flush DNS and nslookup) to verify that everyone does indeed know what the correct address for your server is.  Now, try your call again. 

Surprise-your call will probably fail this time too!  Even though the OS knows what your IP address is now, Lync appears to keep its own DNS cache that I have yet to find a way to clear.  The good news is that it will time out (it appears to be about 15 minutes) so calls will eventually work again.  If you’re really impatient though, you can restart the front end service at this point, however this isn’t always an option. 

Now, the likelihood of this coming up in production is pretty minimal, but in development I’ve run into it all the time.  It’s a fairly simple fix once you know what to look for.  Note that if anyone knows how to clear the internal Lync DNS cache from powershell, please let me know in the comments.

Posted in Unified Communications | Tagged , , , | Leave a comment

Binding customization and throttles-Load testing a PollingDuplexHttp WCF service-Part 2

Part 1 of this entry already covered a load testing app for a service, and in this part I’m looking at some of the binding problems I found when I started hammering the service with it.

I’d already run into the MaxSessionsPerAddress throttle in my original prototype, but the MaxEstablishingSessions throttle didn’t come into play until I started load testing with more than 10 clients at once.  The tester hit this throttle because it created a large number of proxies at the same time. If this throttle is exceeded, an HTTP error is returned by the service (Service too busy), however it’s worth noting that this does NOT fault the proxy. The tests in the library all have a retry mechanism whereby they will retry the operation if an http error is received (optionally waiting between attempts).  In most cases, I put a 1 second retry delay between attempts at the service.

The default value for this throttle is at 10 sessions, but what if you want to customize it?  Well, it turns out that it’s not exposed on PollingDuplexHttpBinding, so it means creating a custom binding.  As it turns out, this can be easier than you’d think:

PollingDuplexHttpBinding b= new PollingDuplexHttpBinding(PollingDuplexHttpSecurityMode.TransportWithMessageCredential, System.ServiceModel.Channels.PollingDuplexMode.MultipleMessagesPerPoll);
b.Security.Transport.ClientCredentialType = HttpClientCredentialType.Digest;
b.Namespace = "http://www.computer-talk.com/Service";
b.SendTimeout = new TimeSpan(0, 0, 10);  //timeout for sending messages to the client channel. 
b.InactivityTimeout = new TimeSpan(0, 5, 0);  //minimum message frequency-if no traffic in this interval, channel is faulted. 
b.ServerPollTimeout = new TimeSpan(0, 10, 0);
b.ReceiveTimeout = new TimeSpan(0, 10, 0);
b.UseTextEncoding = false;
b.MaxReceivedMessageSize = int.MaxValue;  

//customize the binding...

CustomBinding bc = new CustomBinding(b);
foreach (BindingElement e in bc.Elements)
{
    if (e is PollingDuplexBindingElement)
    {
        ((PollingDuplexBindingElement)e).MaxPendingSessions=1000;
        ((PollingDuplexBindingElement)e).MaxPendingMessagesPerSession= 1000;
    }
}

You create your PollingDuplexHttpBinding (b) with all the security, encoding, and polling options that you normally would, but then just create a custom binding based on the original , add your new throttles, and use that binding instead of the standard one.  Best of all, there are no code changes on the client, everything else seems to work.

Now, the question is, should you change that value?  Here’s some tests I ran using the default value for MaxPendingSessions.  The number of retries is the number of times a service call failed because of a “service busy” response, and each had a 1s delay followed by another attempt.  The number of threads is the number of simultaneous clients started against the service:

# Threads Average time Max Time Retries
10 2 2.05 0
50 3 4.76 21 1x
100 6 8.73 90 1x 18 2x

As you can see, large numbers of clients resulted in large numbers of retries.  Once I changed the throttle to 1000 though, here’s what happened:

# Threads Avg Max Retries
10 1 1.72 0
50 3 3.74 0
100 14 23.75 0

All of these tests were doing the same thing, so as the number of threads scales up, the performance of the service as a whole starts to actually decrease.  True, a slightly higher value (maybe 20 or 30) may work out to be optimal, but it appears that letting the calls fail might be the better option.

Note that this throttle only really affects new sessions.  Another test I tried had threads create a proxy, wait, and then invoke a second method.  In that case, once the first method has been successfully called, the client didn’t get any more http failures from the service.  In cases where there’s a network interruption between your clients and service though, you could see this throttle come into play, so the important take away here is to build a suitable retry logic into the client when accessing a service like this.

Posted in Silverlight, WCF | Leave a comment

Load testing a PollingDuplexHttp WCF service-Part 1

One of the challenges of writing services that are designed to be consumed by hundreds (or thousands) of clients at once is testing under any sort of load.  Sure, you can test every method to see how it responds one at a time, and maybe you’ve got three or four testers hitting it at once, but how do you test the real world load (or even the extreme load) of a whole pile of clients using the service at the same time?  The simple solution would appear to be to just create a simple load test client that creates multiple threads and uses each of these threads to create connections to the service.  Sure, the scenario isn’t completely accurate, since you’re going to create some degree of latency on the client with a multithreaded app, but it’s probably going to give you a good idea of where the bottlenecks exist.

This was the approach I took with a PollingDuplexHttp service, and since the binding was only available in Silverlight, the load test client needed to be a silverlight app as well.  The structure of a basic load test looked something like this:

image

The test application would start a whole pile of threads, connect to the service, and then time how long it takes to invoke a method.  This basic idea worked, but it turns out that there’s a throttle that gets in the way.

MaxSessionsPerAddress

The MaxSessionsPerAddress throttle will come up if you enable tracing on your service (which is well documented here), and it caps out at 10 sessions.  I couldn’t find this in any of the documentation, but after digging a little deeper, I found that it’s a non-configurable throttle designed to prevent DOS attacks.  In normal usage, this makes sense, but in this case, it makes things more complicated.

Load tester v2

To get around this problem, I morphed the load tester into something that looks more like this:

image

The test controller launches N Silverlight applications, which each run T threads.  Tests are loaded from a test library DLL, and results are recorded in a database using another simple service.  This way, I can run hundreds of clients from one controller, and run multiple controllers on multiple machines to generate higher load.  Each of the Silverlight projects can take everything as URI parameters to initialize the tests (test name,number of threads, synchronized start time etc), so it’s pretty simple to seed this to multiple machines to start at a predetermined time.  Yes, it’s not a full on enterprise app architecture, but it’s something I put together in a day, and it did its job.

Within the load tester, I’ve created a load test library that collects all the test methods.  My load test object looks like this:

public class LoadTest
{

    public string TestName { get; set; }
    public DateTime TestStartTime { get { return m_testStartTime; } set { m_testStartTime = value; } }
    private DateTime m_testStartTime = new DateTime();
    public DateTime TestEndTime { get { return m_testEndTime; } set { m_testEndTime = value; } }
    private DateTime m_testEndTime;
    public String ResultString { get { return m_resultString; } set { m_resultString = value; } }
    private string m_resultString;
    public TimeSpan Duration
    {
        get { return TestEndTime - TestStartTime; }
    }

    public int RetryCount { get; set; }
    public int MaxRetries { get; set; }

    public ManualResetEvent DoneEvent = new ManualResetEvent(false);

    public Func<LoadTest,string> TestFn;
    public DateTime StartTime { get { return m_startTime; } set { m_startTime = value; } }
    private DateTime m_startTime = new DateTime();  //used to coordinate start across N threads
    public LoadTest(Func<LoadTest,string> fn)
    {
        TestFn = fn;
        m_thread = new Thread(new ThreadStart(ThreadFn));
        m_thread.Start();
        StartTime = DateTime.Now;
    }

    public string AddRetryStatus(string msg)
    {
        RetryCount++;
        return "\r\ngot error-**retrying(" + RetryCount + "):" + msg;
    }

    public ManualResetEvent TestDone = new ManualResetEvent(false);

    public static iceService.IiceServiceClient CreateProxy()
    {
        EndpointAddress address = new EndpointAddress("https://chrislaptop.corp.computer-talk.com:8085/iceWCFService");
        PollingDuplexHttpBinding b = new PollingDuplexHttpBinding(PollingDuplexHttpSecurityMode.TransportWithMessageCredential, System.ServiceModel.Channels.PollingDuplexMode.MultipleMessagesPerPoll);

        b.InactivityTimeout = new TimeSpan(0, 10, 0);
        b.CloseTimeout = new TimeSpan(0, 10, 0);
        b.SendTimeout = new TimeSpan(0, 10, 0);
        b.ReceiveTimeout = new TimeSpan(0, 10, 0);

        b.MaxReceivedMessageSize = long.MaxValue;

        //need to register to use the client HTTP stack, and not the browser stack
        //as per http://blogs.msdn.com/b/silverlightws/archive/2010/12/15/pollingduplex-using-multiplemessagesperpoll-issue-in-latest-sl4-gdrs.aspx
        //since there's a Silverlight bug...

        bool httpResult = WebRequest.RegisterPrefix("http://", WebRequestCreator.ClientHttp);
        bool httpsResult = WebRequest.RegisterPrefix("https://", WebRequestCreator.ClientHttp);

        service.IServiceClient proxy = new service.IServiceClient(b, address);
        return proxy;
    }

    private void ThreadFn()
    {
        TestStartTime = DateTime.Now;
        ResultString = TestFn.Invoke(this);  //do whatever the test method says
        TestEndTime = DateTime.Now;
        TestDone.Set();
    }
    private System.Threading.Thread m_thread;

To use this class, I construct objects with delegates like this:

public static LoadTest DemoTest()
{
    LoadTest t1 = new LoadTest((t) =>
    {
        try
        {
            string returnText = "";
            service.IServiceClient c = LoadTest.CreateProxy();

            c.TestMethodCompleted += (sndr, ea) =>
            {
                if (ea.Error != null)
                {
                    returnText += t.AddRetryStatus(ea.Error.Message);
                    Thread.Sleep(1000);
                    c.TestMethodAsync();
                }
                else
                {
                    if (ea.Result != null)
                        returnText += "\r\n Got Result " + ea.Result.ToString();
                    else
                        returnText += "\r\n return is null!";
                    t.DoneEvent.Set();
                }
            };

            c.TestMethodAsync();
            t.DoneEvent.WaitOne();
            return returnText;
        }
        catch (Exception ex)
        {
            return "Got Exception: " + ex.ToString();
        }
    });
    t1.TestName = "DemoTest";
    return t1;
}

The framework then takes care of starting threads, submitting results, and all the other plumbing that a tester like this would need.

While this testing approach is convenient, it does have some drawbacks:

1. Having such a large number of requests initiating from the same machine will cause load on the client side as well. 10 test runners each running 10 threads will likely not respond as well on the client side as 100 individual clients. This may cause execution times to be higher than actual performance.

2. Creating the proxy for the first time is a longer operation than using an existing proxy. If a test in the library creates a proxy, runs a single method, and then closes the proxy, the establishment time will be a factor in the test. This is demonstrated by the test results.

3. This framework does not take into account or control for network conditions between the client and the service being tested.

4. The probability that under normal operation that many clients would be creating sessions at once is low (likely only in a case where a network connection is broken).

5.Probably something else I didn’t think of…

Of course, even with these in mind, I’m still able to get a worst case scenario for the service, and see how different scenarios stack up relative to each other.

Obviously, I’ve left out a lot of details about how to build a load tester, but this should give you a basic idea of what I put together.  I’ll post a follow up to this later in the week on some of the results that I came away with.

Posted in Silverlight, WCF | 1 Comment

WCF PollingDuplexHttp Services, Silverlight, and the task parallel library-lessons learned

The PollingDuplexHttp binding has been in the Silverlight SDK for a while now, and it’s a great way to have a service that can send messages to a client.  While Silverlight does support sockets and net.tcp, the restriction to ports 4502-4534 can be problematic, since firewall ports may need to be opened on both client machines and in the data center.  With duplex HTTP though, everything goes through port 443 (or 80 if you don’t care about encryption), so all your traffic is likely to just make it through, which helps ease deployment headaches.  If you’re not familiar with it, there are some great blog posts out there that you should check out that describe it in detail.

That being said, the binding isn’t without its issues, and having built some services with the binding I’ve figured out a few things that would have been nice to know going into the process:

A simple duplex service:

To start with, consider a service like this one:

    [ServiceContract(CallbackContract=typeof(IDuplexClient))]
    public interface ITestService
    {
        [OperationContract]
        string GetCurrentTimeString();
        [OperationContract]
        DateTime GetCurrentTime();
        [OperationContract]
        void RequestTimeUpdates();
        [OperationContract]
        void StopTimeUpdates();
    }

    [ServiceContract]
    public interface IDuplexClient
    {
        [OperationContract(IsOneWay = true)]
        void PublishTime(DateTime time);
    }

This service obviously has a couple of methods to get the current time, as well as ones to subscribe and stop time updates.  The callback contract has an event that can be fired to notify that the time has changed.  A sample implementation of the RequestTimeUpdates() method might look like this:

List<IDuplexClient> clients = new List<IDuplexClient>();
public void RequestTimeUpdates()
{
    // Grab the client callback channel. 
    //this works-each client generates a unique callback channel...
    IDuplexClient c= OperationContext.Current.GetCallbackChannel<IDuplexClient>();
    lock (clients)
    {
        if (clients.Contains(c) == false)
            clients.Add(c);
    }
}

In this case, the duplex channel is unique per client proxy, and storing it on a list is one possibility (more on this later).  Another method could do something like this to publish to all registered Silverlight clients:

private void Publish()
{
    lock(clients)
    {
        foreach (IDuplexClient c in clients)
            c.PublishTime(DateTime.Now);
    }
}

Now, at this point the service will work, but there are a couple of questions I started to wonder about.  For example, what happens if a client registers, and then exits without calling your obviously named deregistration method?  As it turns out, if you do this, you’ll get a timeout on the method call, which blocks your service.  To handle that case, you can try something like this:

private void Publish()
{
    lock(clients)
    {
        List<IDuplexClient> toRemove = new List<IDuplexClient>();
        foreach (IDuplexClient c in clients)
        {
            try
            {
                c.PublishTime(DateTime.Now);
            }
            catch (Exception ex)
            {
                toRemove.Add(c);
            }
        }
        foreach (IDuplexClient c in toRemove)
            clients.Remove(c);
    }
}

Yes, this is a bit extreme, but now ANY exception when calling the duplex event will result in the client getting removed from the callback subscriptions, so at most you’ll time out once per dead client (and not time out perpetually).  Of course, now the problem is that if you have, say, 10 clients, and 9 of them exit without unregistering, then these timeouts (which default to 60 seconds) happen sequentially.  A third iteration of the publish method might look like this:

private void Publish()
{
    lock (clients)
    {
        List<IDuplexClient> toRemove = new List<IDuplexClient>();
        Action<IDuplexClient> invokeAction = (callback) =>
        {
            try
            {
                callback.PublishTime(DateTime.Now);
            }
            catch (Exception ex)
            {
                string s = ex.ToString();
                toRemove.Add(callback);
            }
        };
        System.Threading.Tasks.Parallel.ForEach<IDuplexClient>(clients, invokeAction);

        foreach (IDuplexClient r in toRemove)
            clients.Remove(r);
    }
}

Now, by invoking the callbacks in parallel, you can have, at most, one 60 second timeout.  This means that in our previous example, instead of the 10th client having to wait 9 minutes for the event, it should receive its event immediately.  The service, however, will still be blocked for up to 60 seconds while the publish method fires, which,  depending on your architecture, might mean yet another level of threading.  On top of that, the code to fire an event is starting to get messy, and avoiding having to duplicate it for a service with a large number of events is a good idea.

The Callback Manager

To combat these problems, I’ve wrapped these ideas into a callback manager for duplex services:

public class CallbackManager<KEYTYPE, VALUETYPE>
{
    internal CallbackManager()
    {

    }
    public bool AddSubscription(KEYTYPE key, VALUETYPE value)
    {
        if (m_dict.ContainsKey(key) == false)
        {
            if (m_dict.TryAdd(key, new ConcurrentDictionary<VALUETYPE, bool>()) == false)
                return false;
        }
        return m_dict[key].TryAdd(value, true);
    }

    public bool RemoveSubscription(KEYTYPE key, VALUETYPE value)
    {
        if (m_dict.ContainsKey(key) == true)
        {
            bool b;
            return m_dict[key].TryRemove(value, out b);
        }
        return false;
    }
    //remove ALL subscribers for a key
    public bool RemoveSubscription(KEYTYPE key)
    {
        if (m_dict.ContainsKey(key) == true)
        {
            ConcurrentDictionary<VALUETYPE, bool> b;
            m_dict.TryRemove(key, out b);
            if (b != null)
                return true;
        }
        return false;
    }
    public ICollection<KEYTYPE> GetKeys()
    {
        return m_dict.Keys;
    }

    //if you fire an event, it will only ever be for one key...
    private ICollection<VALUETYPE> GetSubscribers(KEYTYPE key)
    {
        if (m_dict.ContainsKey(key))
            return m_dict[key].Keys;
        return new List<VALUETYPE>(); //empty
    }
    private ICollection<KeyValuePair<VALUETYPE, bool>> GetSubscribersEx(KEYTYPE key)
    {
        if (m_dict.ContainsKey(key))
        {
            return (ICollection<KeyValuePair<VALUETYPE,bool>>)m_dict[key];
        }
        return new List<KeyValuePair<VALUETYPE, bool>>(); //empty
    }

    public void FireEvent(KEYTYPE key,Action<VALUETYPE,bool> action)
    {
        try
        {
            using (BackgroundWorker bg = new BackgroundWorker())
            {
                bg.DoWork += (sender, ea) =>
                    {
                        Action<KeyValuePair<VALUETYPE, bool>> a = callback =>
                            {
                                try
                                {
                                    action(callback.Key, callback.Value);
                                }
                                catch (Exception ex)
                                {
                                    Logger.LogError(ErrorLevel.Information, "Exception sending to callback channel-removing subscription-" + ex.ToString());
                                    RemoveSubscription(key, callback.Key);
                                }
                            };
                        Parallel.ForEach<KeyValuePair<VALUETYPE, bool>>(GetSubscribersEx(key), a);
                    };
                bg.RunWorkerAsync();
            }
        }
        catch (Exception ex)
        {
            Logger.LogError(ErrorLevel.Information, "Exception invoking callback-" + ex.ToString());
        }
    }

    private ConcurrentDictionary<KEYTYPE, ConcurrentDictionary<VALUETYPE, bool>> m_dict = new ConcurrentDictionary<KEYTYPE, ConcurrentDictionary<VALUETYPE, bool>>();
}
}

This class contains methods to manage a collection of subscribers, as well as the main FireEvent method that wraps some of the earlier findings into one generic method.  It takes a delegate for the callback event that you want to fire, and then calls those delegates in parallel.  It also invokes the parallel loop on a background worker thread, which ensures that the service will not block when firing an event.  One side effect of this is that firing multiple events could result in multiple attempts on the same dead proxy, but the code handles these cases, and it’s better than the alternative.

It’s also interesting to note that the delegate that gets passed into the method must itself get wrapped in a delegate (i.e. I can’t just parallel.foreach the action that I get passed in), since it won’t be able to catch a service exception in that case.

One other thing to note-the ConcurrentDictionary class gets used here with a second (bool) template parameter simply because there isn’t an equivalent non-ordered type (like ConcurrentList) in the library.  I’m not sure why, but for the meantime I just added the second type, because the concurrent classes are incredibly useful.  You can modify the collection while it’s being iterated elsewhere, and you aren’t at the mercy of someone forgetting a lock.

So finally, this means that firing an event in the service code looks something like this:

    m_TimeUpdateChannels = new CallbackManager<string, IDuplexClient>();
    m_TimeUpdateChannels.FireEvent("key", (callback, b) =>
    {
        //do something else if I want
        callback.PublishTime(DateTime.Now);
    });

Which is much cleaner than having to write an individual method to fire each event.

So, with a scheme like this in place, writing a duplex service has gotten a whole lot easier to manage.  If you do need to keep track of subscribers, this at least encapsulates it a little, and it keeps most of the WCF plumbing in one place, meaning that the service can concentrate on business logic.

Posted in Silverlight, WCF | 5 Comments