Kaloom's Startup Journey - Part 1
Posted by Laurent Marchand ● Aug 10, 2020 10:21:34 AM
Part I of a 2-part blog
When we founded Kaloom, we embarked upon our journey with the vision of creating a truly P4-programmable, cloud- and open standards-based software defined networking fabric. Our name, Kaloom comes from the Latin word "caelum" which means "cloud."
Armed solely with our intrepid crew of talented and experienced engineers, we left our good-paying, secure jobs at one of the world’s largest incumbent networking vendors and set off on our adventure. This two-part blog is the story of why we founded Kaloom and where we stand today.
In our previous blog, “SDN’s Struggle to Summit the Peak of Inflated Expectations,” Sonny outlined the challenges posed by the initial vision of SDN and the attempt to solve them with VM-based overlays. Upcoming 5G deployments will only amplify those challenges, disruptively creating an extreme need for lightweight container-based solutions that satisfy 5G’s cost, space and power requirements.
5G Amplifying SDN’s Challenges
With theoretical peak data rates up to 100 times faster than current 4G networks, 5G enables incredible new applications and use cases such as autonomous vehicles, AR/VR, IoT/IIoT and the ability to share real-time medical imaging – just to name a few. Telecom, cloud and data center providers alike look forward to 5G forming a robust edge “backbone” for the industrial internet to drive innovative new consumer apps, enterprise use cases and increased revenue streams.
These new 5G-enabled apps require extreme low latency which demands a distributed edge architecture that puts applications close to their data source and end users. When we founded Kaloom, incumbents were still selling the same technology platforms and architectural approach as they had with 4G, however, the economics for 5G are dramatically different.
The Pain Points are the Main Points – Costs, Revenues and Latency
For example, service providers’ revenues from smart phone users running 3G/4G were about $50 per month. However, connected cars will only generate about $1 or $2 per month and installing 5G requires extreme amounts of upfront investments (CAPEX), not only in antennas but also in the backend servers, storage and networking switches required to support these apps. So, it was clear to us that the cost point for deploying 5G had to be much cheaper than for 4G. Even if 5G costs were the same as 4G rollouts, service providers still could not economically justify their revenue models throttling down to $1 or $2 per month.
The low latency requirement is another pain point, mainly because 5G-enabled high speed apps won’t work properly with low throughput and high latency. Many of these new applications require an end-to-end latency below 10 milliseconds, unfortunately, typical public clouds are unable to fulfill such requirements.
To take advantage of lower costs via centralization, most hyperscale cloud providers have massive regional deployments, with only about six to 10 major facilities serving all North America. If your cloud hub is 1,000 miles away, even the speed of light (fiber optics) does not transfer data fast enough to meet the low latency requirements of 5G apps such as connected cars. For example, the latency from New-York to Amazon Web Services or Microsoft Azure in Northern -Virginia is greater than 20 milliseconds.
Today, the VM-based network overlay has dumb, or unmanaged, switches; then all the upper layer functions such as firewalling, load balancing and 5G UPF run on VMs in servers. Because you couldn’t previously run these functions at Tbps speeds on switches, they were put on x86 servers. High-end servers cost about $25,000 and consume 1,000 watts of power each. With x86 servers’ costs and power consumption, we could see that the numbers were way off from what was needed to run a business-viable environment for 5G providers. Rather than have switching and routing run via VMs on x86, Kaloom’s vision was to be able to run all networking functions at Tbps speeds to meet the power- and cost-efficient points needed for 5G deployments to generate positive revenues. This is why Kaloom decided to use containers, specifically open source Kubernetes, and the programmability of P4-enabled chips as the basis for our solutions.
Solving the latency problem required a paradigm shift to the distributed edge, which places compute, storage and networking resources closer to the end user. Because of these constraints, we were convinced the edge would become far more important. As seen in the figure below, there are many edges.
Kaloom’s Green Logo and Telco’s Central Office Constraints
While large regional cloud facilities were not built for the new distributed edge paradigm, service providers’ legacy Central Office (CO) architectures are even more ill-suited for the shift. Containing an odd mishmash of old and new equipment from each decade going back at least 50 years, these facilities are also typically near the limit of their space, power and cooling requirements.
There are about 25,000 legacy COs in the US and they need to be converted, or updated in some way, to sustain low latency networks that support 5G’s shiny new apps. Today, perhaps 10 percent of data is processed outside the CO and data center infrastructure. In a decade, this will grow to 25 percent which means telcos have their work cut out for them.
This major buildup of required new 5G-supporting edge infrastructure will have an extremely negative impact on the environment in terms of energy consumption. According to the Linux Foundation’s State of the Edge 2020 report, by 2028 it will consume 102,000 megawatts of power and over $700 billion in cumulative CAPEX will be spent within the next decade on edge IT infrastructure and data center facilities. The power consumption of a single 5G base station alone is three times that of its 4G LTE predecessor; 5G needs three times the base stations for the same coverage as LTE due to higher frequencies; and 5G base stations costs 4 times the price of LTE, according to mobile executives and an analyst quoted in this article at Light Reading. Upgrading a CO’s power supply involves permits, closing and digging up streets and running new high capacity wiring to the building, which is very costly. Unless all of society’s power sources suddenly turned sustainable, our founding team could see we were going to have to create a technology that could dramatically and rapidly reduce the power needed to provide these new 5G services and apps to consumers and enterprises.
Have you noticed that Kaloom’s logo is green?
This is the reason why. We have set out to solve all the issues facing telcos as they work to bring their infrastructure into the 21st century, including power.
P4 and Containers vs. ASICs and VM-Overlays
Where we previously worked, we were using a lot of switches that were not programmable. We were changing hardware every three to four years to keep pace with changes in the mobile, cloud and networking industries that constantly require more network, compute and storage resources. So, we were periodically throwing away and replacing very expensive hardware. We thought, why not bring the same flexibility by building a 100 percent programmable networking fabric. This, again, was driven by costs.
Before we left our jobs, we were part of a team running 4G Evolved Packet Gateway (EPG) using virtual machines (VMs) on Red Hat’s software platforms. Fortunately, we were asked to investigate the difference in the performance and other characteristics of containers versus VMs. That’s when we realized that, by using containers, the same physical servers could sustain several times more applications as compared to those running VM-based overlay networks.
This meant that the cost per user and cost per Gbps could be lowered dramatically. Knowing the economics of 5G would require this vastly reduced cost point, containers were the clear choice. The fact that they were more lightweight in terms of code meant that we could use less storage to house the networking apps and less compute to run them. They’re also more easily portable and quicker to instantiate. When a VM crashes it can take four or five minutes to resume, but you can restore a container in a few seconds.
Additionally, because VM-based overlays sit on top of the fabric so they provide no direct interaction with the network underlay below in terms of visibility and control. If there is congestion or a bottleneck in the lower hardware’s fabric then the overlays may not know when, where, or how bad the problem is. Delivering Quality of Service (QoS) for service providers and their end users is a key consideration.
We remain committed to address the various pain points including networking’s costs, the need to reduce 5G’s growing energy footprint so it has a less negative impact on the environment, and to deliver much-needed software control and programmability. Looking to solve all the networking challenges facing service providers’ 5G deployments, we definitely had our work cut out for us.
I remember when we first met with investors, they asked us to brief some analysts. Initially, the analysts thought we were “mad” and so the investors said they couldn’t support our decision to launch and our vision of the future of networking. Tune in to next week’s blog Part II to find out what happens next!