This weekend we’re rolling out our new PBX (phone system). The old system was an integrated hardware solution, that not only started to limit us by a lack of features, but also because of multiple failures after automatic installation of updates. The old system was somewhat scalable, but over time after adding more and more multi-line telephones, it was really apparent that the system was never designed for more than just a few phones with only a handful of calls at the same time. The settings where limited, there was no multi language support and it was hard to setup time schedules to forward calls to an answering machine outside of business hours. It was possible, but never felt like is was done the right way. Also things like music on hold was limited to only a short single clip.
The new system runs on our own hardware and can be ported easily to more powerful hardware in the future when needed. We’ve got more control over updates and we can theoretically expand to an unlimited numbers of phones and concurrent calls. Our new phone system is based on FreePBX and runs on our own selected operating system. Our development team can easily extent the functionality when needed, something we’ve already done. We’ve written code to add a module to be able to change the music on hold from the interactive voice response (IVR), just like it was already possible to change the language and the caller Id (CID) from with the IVR or any place in the call flow. We’ve written a module to register incoming calls and are planning to create a full integration with our customer relations solution.
Thus far we’ve been installing, configuring, extending and testing the new system on a test setup. This weekend we’re taking the system live by changing over the trunks and closing down the old system.
Everything looked fine and worked as intended. This until the second telephone number was placed into the trunk at our telephone providers end. From that moment on, outgoing calls worked on both telephone numbers, but incoming calls no longer worked. When the trunk is registered, the message changes from “the number you’ve dialed is not in use”, to “the number you’ve dialed is not responding”. We’ve looked at the network traffic between our server and the telephone service provider. We receive keep-alive SIP packets regularly from the provider. When one of our telephone numbers is dialed, no packets originate from the telephone service provider. It looks like something is wrong on their side of the trunk, or somehow the SIP packets sent by them do not reach us, while the keep-alive packets do reach us.
Almost the whole weekend, we’ve been busy tracking down the problem with tcpdump and trying about every PJSIP setting combination in existence. Multiple time, suddenly it was possible to receive inbound calls again, but a couple of minutes to hours later, it was suddenly impossible again.
In the output of tcpdump, we noticed both UDP checksum errors and missing packets. In the past, on other networks than our own, we’ve seen similar checksum errors. This didn’t seem to cause the issue. The checksum errors always came from the same device and didn’t seem to have any effect. We decided to run SIP asymmetrically. While keeping outband traffic on the default SIP port, we changed inbound SIP traffic to a non-standard port. After rebooting the FreePBX server, the problem was permanently fixed. There were no missing packets anymore and all test calls came trough without any interruption. It seems that either the internet service provider, or the modem provided by them is randomly passing and blocking incoming SIP traffic. Either a anti-competitive measure (the old phone system was provided by our ISP) or a bug.