Hyper Open Edge Cloud

Edge Computing for Industrial Automation and Control

The business of industrial automation and control will move away over time from existing, highly profitable, proprietary product lines towards much more flexible yet less profitable open hardware and Free Software. Future profits will eventually emerge from online services based on industrial big data, personalisation, legal regulation and trade secret powered by increasingly smart sensors and gateways at the edge.
  • Last Update:2022-05-04
  • Version:003
  • Language:en

Nexedi with SlapOS is one of the inventors of Edge Computing about 10 years ago. Since then we have acquired experience in the deployment of Edge Computing in various industries: web, wind energy, and radio networks. We share in this article some of the experiences we have learnt while creating SlapOS and after operating SlapOS for ten years. We will provide in conclusion some views on the evolution of the automation and control industry with Edge Computing and how Nexedi stack can support this evolution.

TLDR: "open edge whiteboard gateway" enhances existing proprietary automation and control systems with smart services at the edge; disruptive automation and control system product line entirely based on Free Software and Open Hardware replaces legacy hardware; new data oriented centralised services generate most profits; OCP costs 10 times less than public clouds; Nexedi stack is ready to support the future of automation and control industry.

SlapOS: why do we still use proprietary clouds? 

SlapOS is a service orchestration system that can be used to operate decentralised Cloud Computing systems and among them Edge Computing systems. It covers the complete lifecycle of a service: build, provisioning, configuration, monitoring, disaster recovery, accounting, billing, etc. SlapOS covers all types of deployments: bare metal, virtual machines, containers, etc. Back in 2008, it was primarily focusing on low cost, resource efficient, high-performance bare metal deployment. Since then, it has also been used to operate big data cloud architectures with virtualisation and network isolation such as Teralab, one of the most successful big data providers in France.

With SlapOS, anyone can build and operate in less than 48 hours his own commercial cloud computing or edge computing system. No additional component, no additional software is needed to start competing with public cloud providers such as Amazon (AWS), Microsoft (Azure) or Alibaba (Aliyun). SlapOS can also act as a Network Management System (NMS) and deploy Amarisoft's 4G/5G fully virtualised radio access network (including PHY) that runs on generic PC and standard RRU. With SlapOS, anyone can become his own 4G/5G telecommunication provider too.

Pricewise, deployment of SlapOS often leads to cutting costs by 10 compared to traditional solutions used by enterprise IT. At the same time, it provides a level of flexibility and simplicity that is unmatched.

The great denial: broken connectivity

Most companies considering Edge Computing tend to ignore - and even deny - the existence of a big problem in real-world TCP/IP networks: they are randomly disconnected, unreliable and lack resiliency. The more device, the more sites, the more APIs, the more trouble. Compared to traditional controls systems, Edge Computing based on TCP/IP poses the risk of a major regression of quality and safety for the industry.

We learn this fact the hard way when we created SlapOS. We are now aware of which solutions must be deployed to overcome this situation.

Since its early days, SlapOS has been based on the idea that every device or edge node in the world will eventually get a reliable IPv6 address. It was a great design choice because IPv6 provides a globally unique and routable address to every connected thing and can thus simplify a lot machine to machine communication. 

However, we overestimated a lot the actual maturity of IPv6 in 2008. Even in 2018, the situation did not improve as much as we would have expected: 

  • 10% of routes in China are disconnected;
  • 1% of routes in Europe are disconnected;
  • DPI used by governments (ex. China, Iran, etc.) or by telcos (ex. Orange) can break TCP/IP either on purpose of by mistake;
  • firewalls used by large corporations can break TCP/IP either on purpose of by mistake;
  • many IPv6 routers in production do not implement IPv6 protocol properly and even break TCP/IP;
  • home gateways from some ISP providers randomly drop connectivity for some of the IPv6 addresses they previously allocated to the user equipment;
  • public clouds are partly disconnected with a yearly average of more about 8 hours.

The situation we describe above for IPv6 also applies for many aspects to IPv4. It is ignored or denied by most vendors of Edge Computing and IoT because their architecture cannot be reliable under such conditions:

  • MQTT messages sent by a device are never received by the data lake and lost forever;
  • events sent over HTTP by the provisioning platform to a device are never received and software running on the device is thus wrongly configured;
  • alarms generated by the maintenance application sent over XMPP but are never received or received too late by the end user, leading to a potential industrial incident.

All those problems are quite typical of service architectures based on an event-based, imperative design. The larger they get, the more APIs, the more unstable they become because it is simply impossible to take into account through feedback loops all possible exceptions that the system can fall into, especially if it has many components which are often upgraded. One of the most famous examples of technical failures of such event-based, imperative design is a software called "Open Stack" which, even after billion dollars investment and years of development, is still unstable.

If Edge Computing for control systems is designed on event-based, imperative design, if will end up with the same instability.

There are of course solutions to solve the problem of lack of resiliency of most TCP/IP networks. The one we created, called re6st, consists of deploying through a mesh of random tunnels a kind of Internet on top of the Internet with a latency optimisation routing protocol called "Babel". It reduces drastically if not completely the problem of broken connectivity of TCP/IP. It can be used anywhere in the world, including in China through a local partner, Grandenet. re6st has been a key factor for the technical success of SlapOS. Without re6st, it would have been for example impossible to host SlapOS nodes on the networks of Free, OVH, Eircom or in China. 

Another technology, called fluentd and created by Treasure Data of Japan uses buffering so that data transferred over unreliable networks has 99.999% chances of reaching the data lake. It is an important component of Nexedi's Wendelin architecture. Thanks to buffering, a device connected to a single, disconnected Internet access is capable of buffering the data it has to send to a data lake until its Internet access is connected back.

The last technology to consider is the use of a transactional network protocol. This includes transactional https, an implementation of the https protocol that guarantees that an https request will either return an error or process an API request consistently and entirely. Very few platforms actually provide a transactional implementation of https. Platforms that do not implement transactional https may, for example, return an error, yet process partly an incoming API request: this leads to an inconsistent state of the system and to the absence of determinism. The numerous platforms based on non transactional NoSQL databases or non transactional application servers are unable to provide transactional https and should therefore never be used for industrial automation and control applications. With Nexedi stack, all core platforms involved in managing industrial processes (ERP5, SlapOS, Wendelin) implement transactional https.

A common state must be shared to ensure stability

To reach stability of a distributed system, one of the most important principles to follow is to introduce a common shared state in the system. This state can be centralised in a single database or decentralised through a hierarchy of databases or through a kind of distributed database. We use here the term "database" in the very broad meaning equivalent to a network protocol with indefinite persistence: it includes relational databases, file systems, block chains, etc. 

Once a common state is defined, all components in the system can refer to that common state and autonomously converge to it. 

This idea was inspired by various observations:

  • the concept of digital whiteboard used in a chemical plant in Japan as soon as in 1993 to let distributed control system components share their state and mutual requests;
  • the concept of promises and autonomous systems introduced in 1993 by Prof. Burgess who is also the author of cfengine;
  • the concepts of grid computing coordination introduced to us by Prof. Cérin around 2007.

As the original author of ERP5, we were also inspired by the idea in ERP5  that every management process follows a structure of order/delivery which keeps tracks through workflow states of the current common state of the ordering/delivering process.

As a system designed to manage device and "orders/deliveries" of services at the Edge, SlapOS uses ERP5 to share a common state between all components which order and deliver one each other various value-added services and publish their mutual address on a common whiteboard. To ensure that it is scalable, SlapOS implements a recursive architecture: SlapOS can deploy SlapOS. A global Edge Computing system can be built by aggregating or federating small Edge Computing systems without limit.

Other approaches for sharing a common state are possible. Some people have considered block chains, but their practical scalability (not much more than few transactions per second) is far away from the scalability of MariaDB transactional database (a billion transactions per second). Despite various claims, it does not seem that scalable block chains exist, unless it abandons its transactional nature at the expense of consistency. Rather than bitcoing, we rather believe in alternate implementations of the "order/delivery" model of ERP5 using various forms transactional databases or programming languages, possibly combined with distributed data distribution protocols.

Cost of data centre does matter

Many companies rely on public clouds from companies such as Amazon, IBM, Microsoft, Alibaba, Google, Rackspace, etc. and use a virtualised architecture. The price of a public cloud is 5 to 10 times more expensive than a low cost dedicated server from companies such as OVH, Online or Hetzner. Virtualised block storage kills the performance of the database by an order of magnitude.

Most companies in big data are perfectly aware of this situation and will rather build their own server infrastructure or rely on low cost dedicated servers without virtualisation rather than on virtualised public clouds. Companies such as Facebook or Google even design their own hardware to cut costs compared to products from IBM, HP, Dell, Supermicro, etc. The Open Compute Project is now capable of delivering excellent hardware that lets any company create their own infrastructure at a total cost which is about half compared to OVH, Online or Hetzner, or about 10 times less than Amazon AWS. This is what we use now at Nexedi. We also offer it to our customers together with SlapOS (see: "Turnkey Enterprise Cloud Solution with 2880 x86 cores priced 205.000€ announced by Horizon Computing and Nexedi").

Let us now explain why saving costs matters.

First, one should be aware that the highest cost of IT is the cost of skilled engineers: from 100.000€ per year in Europe or South America up to 500.000€ in California. China and India's costs are in between for skilled engineers. Skilled engineers are also the scarcest resource. Unskilled engineers are useless for Big Data problems or scalability.

Let us now compare the costs of three approaches for purchasing a hosting infrastructure with 1500 x86 Xeon cores:

  • about 100.000€ with OCP hardware and host-attached SSD storage (an equivalent of 33.000€ yearly amortisation);
  • about 1.000.000€ with traditional hardware and storage area network (an equivalent of 333.000€ yearly amortisation);
  • about 700.000€ per year with AWS on demand public cloud.

Even if we include the service to operate the hardware (36.000€ per year), hosting and electricity for two racks (10.000€ per year in the same data centre as one of those where Amazon is hosting their servers), the yearly difference of cost is equivalent to one skilled engineer in California or 5 in Europe.

Going the way of public clouds is thus equivalent to wasting money in services with no added value rather than investing in skilled engineers to create future products and services.

What is even worse happens when we look at the P/L:

  Public Cloud Traditional hardware OCP Infrastructure
Expense: infrastructure 700.000€ 333.000€ 33.000€
Expense: engineers 200.000€ 200.000€ 200.000€
New business revenue: 300.000€ 300.000€ 300.000€
P/L (600.000€) (233.000€) 77.000€

In this example, engineers cost 200.000€ per year and their R&D leads to a revenue of 300.000€ per year. The company that wisely selects cost-efficient hardware makes profits and keeps on focusing on R&D.

The company that selects public clouds or traditional hardware loses money and - because it refuses to change its purchasing habits - tries to reduce the loss by:

  • stopping the project;
  • asking engineers to find a way to use less hardware;
  • replacing skilled engineers by lower cost but unskilled engineers. 

 In each case, the company that selects public clouds or traditional hardware goes in a direction that creates much less value and future profits than the company that relies on OCP.

Bridging legacy: the autonomy box

Industrial automation and control industry has been traditionally based on proprietary hardware, proprietary operating system, proprietary software and proprietary networking. Beckhoff was one of the first automation industries to adopt a more open approach based on Windows and Ethernet, and to develop an ecosystem around its EtherCAT protocol. 

Another characteristic of automation and control industry is to design proprietary embedded systems with "just enough RAM and CPU". This comes from a legacy culture from times when memory and processing power were the most expensive and scarce resources in a product. Even though nowadays, engineering has actually become the most expensive and scarce resources in a product, this legacy culture remains.

The consequences for automation and control products are numerous:

  • a poor networking stack (ex. no IPv6, no buffering);
  • poor multi-tenancy (ex. no way to let third parties configure the product);
  • incompatible programming standards and APIs;
  • very little flexibility to provision third-party algorithms;
  • not enough hardware resources to run smart algorithms that can provide autonomy.

Yet, automation and control products are tested, mature, reliable and proven.

This situation defines implicitly  what could be an ideal Edge gateway for industrial automation and control: an Edge gateway that is able to provide to legacy products all the features which their design prevents to provide. This includes:

  • resilient IPv6 connectivity and low latency routing (re6st);
  • multi-protocol buffered data collection (fluentd);
  • whiteboard for stable M2M communication based on transactional https (SlapOS);
  • multi-tenant provisioning of smart algorithms based on standard POSIX API (SlapOS);
  • multi-tenant proxy REST API for remote configuration (SlapOS);
  • local GPU, a lot of RAM and a lot of CPU;
  • compatibility with legacy device and protocols.

Everything in this Edge gateway should be Open Source / Free Software with the exception probably of the software that is needed to communicate to legacy devices through proprietary protocols. Although there exist some open source implementations of some protocols (ex. Etherlab for EtherCAT), other protocols will likely require to sign Non Disclosure Agreements (NDA) and distribute a proprietary compatibility layer (driver, library).

Edge at the micro-controller: Free Software matters 

Beyond legacy, the future of the automation and control industry may well depend on open hardware and Free Software. Companies such as Seeed and Olimex already distribute a wide range of open platforms that include everything one needs to create their own automation and control device. Leading vendors in the automation and control industry could actually embrace this movement and launch a dedicated brand for a new product line entirely based on Free Software and open hardware.

Current types of systems fall into three categories:

Relying on Unix-like (Linux, BSD, etc.) is in our opinion the best approach to consider for Edge Computing applied to industrial automation and control. The flexibility provided by GNU/Linux and the wide number of options to support real-time or on-board A.I. is in our opinion a no-brainer decision against non GNU/Linux based approaches.

In rare cases where power consumption or cost might really matter, it could make sense relying on solutions such as ESP32 combined with flexible development environments such as micropython. However, integrating such device in a true multi-tenant Edge environment still poses some challenges which Nexedi is currently working on with a version of SlapOS' SLAP protocol designed for micro-controllers. 

Nevertheless, an approach based on Free Software and Open Source Hardware can simplify a lot development tasks thanks to modern programming languages and easy integration of C libraries. If further connectivity is required beyond Wifi and Bluetooth, it is possible to imagine releasing an open source implementation of NB-IoT using software defined radio technologies inspired by those which were demonstrated by Amarisoft.

With everything implemented in software with full source code access and no dependency to specific chipset vendors, industrial automation and control products remain fully trustable, resilient and independent of aggressive business tactics which are common in the IT economy to take away the value generated by digitisation of traditional industries.

Amarisoft has implemented NB-IoT protocol entirely in software

Accelerating the future of industrial automation and control with Nexedi stack

Edge Computing is an opportunity for vendors of industrial automation and control products to take a strategic move towards open standards and service economy. This move could be embedded in a new dedicated brand focusing on the following products and services: 

  • an edge gateway that brings resiliency, smart algorithms, self-convergence and multi-tenancy to legacy products;
  • open hardware and Free Software for industrial automation and control;
  • value-added online services operated on a cost-efficient infrastructure.

The hardware business of industrial automation and control will then move over time from existing, highly profitable, proprietary product lines towards much less profitable edge gateway and open hardware based on Free Software. Meanwhile, value-added digital online services such as predictive maintenance, test and certification, personalisation, aggregated data analysis, etc. will generate future profits, as long as costs of infrastructure are kept low enough. Online services relying on data aggregation, trade secret or regulations will be very difficult to copy as a result of the winner-takes-all nature and positive feedbacks of e-commerce described by Prof. Hal Varian.

It is thus mandatory to move fast, cut infrastructure costs and keep in control of both data and software.

Nexedi stack can accelerate this transformation in multiple ways:

  • SlapOS is one of the very few - if not the only - technology of Edge Computing that works reliably thanks to 10 years of experience;
  • SlapOS combined with OCP can be used to cut costs by 10 compared to public clouds;
  • Wendelin is the only industrial platform for out-of-core data sciences in python;
  • ERP5 can already handle billing batches of a million invoices per day;
  • NEO distributed transactional database can store petabytes of IoT data;
  • re6st solves growing connectivity problems of the Internet.

All technologies have already been used in Germany by Woelfel for the monitoring and data analysis of hundreds of wind turbines, constructions sites, etc.

It would only take a few months to launch a new brand of general purpose automation and control products based on open hardware designs from Olimex, SlapOS at the edge, Wendelin for industrial Big Data online services, re6st for resilient connectivity and fluentd for interoperability.