FPGA Clustering for HPC, Blockchain Enabling Network Sharing, Intelligent Wireless Battery Management

Xilinx talks about its new Alveo data center accelerator card and clustering solution for FPGAs in this episode of our Embedded Edge podcast. Also: Weaver Labs on how its software lets telecoms providers use blockchain to manage and monetize virtualized networks; and intelligent wireless battery management with Dukosi.

[FULL TRANSCRIPT AVAILABLE BELOW]

Welcome to Embedded Edge with Nitin, a podcast show hosted by Nitin Dahad that brings to life to the stories behind the embedded systems, technologies and products. It’s the show where you’ll hear from both engineers and executives on some of the topical news and challenges in the world of embedded systems design.

Here’s your host, Editor-in-chief of embedded.com, Nitin Dahad.

NITIN DAHAD

Hello, welcome to this edition of Embedded Edge with Nitin. In this episode, I look at an API-driven clustering solution for deploying FPGAs at massive scale in high performance computing and for big data workloads; the digitization of telecoms assets, enabling shared network infrastructure, addressing interoperability and secure authentication for private network operators to monetize excess network capacity; and I talk about battery management, with a new cell-based wireless battery monitoring architecture which claims it will transform the safety and efficiency of battery systems in electric vehicles, industrial transport and energy storage.

On the FPGA front, we’ve seen a lot happening in the industry, with Renesas entering the FPGA market with a new family of entry-level devices targeting high-volume applications needing less than 5,000 logic gates, with initial device sizes costing less than 50 cents in volume and with standby power of less than 20 microamps.

And at the other end of the scale, Xilinx used the SC21 Supercomputing conference to introduce its Alveo U55C data center accelerator card and a new standards-based, API-driven clustering solution for deploying FPGAs at massive scale. The company said by enabling clustering of hundreds of Alveo cards and enabling high-level programmability of both the application and the cluster, this new card makes scaling out Alveo compute capabilities to target high performance computing, or HPC, workloads easier and more efficient than before.

You can hear my interview with Nathan Chang of Xilinx in the first clip in this podcast.

With many of our industry events starting to go hybrid with physical events which are also streaming live, I took the opportunity to visit the Cambridge Wireless International Conference which this year was held amongst the planes, including Concorde, at the aviation museum at Duxford, near Cambridge in the U.K.

There I spoke to Weaver Labs Maria Lema, co-founder of Weaver Labs, about how their software layer enables more shared network infrastructure assets with the marketplace platform they have created for virtualized networks. Addressing interoperability and secure authentication were a key part of their platform, she said, along with an internal cryptocurrency system that allows private network operators to monetize excess network capacity on their infrastructure assets.

In the third and final segment of this podcast, we turn to the topic of wireless battery management, in which Joel Sylvester, CTO and founder of Dukosi, tells us how they are enabling smart battery management using wireless communication of battery performance data will disrupt the battery management industry. He said that current systems are still using battery management solutions based on 1990s technology, and explains how his company is hoping to change that.

First, we go to Nathan Chang, HPC product manager for data centers at Xilinx. Hello Nathan.

NATHAN CHANG

Hi Nitin, I’m glad to be here.

NITIN DAHAD

So, you have a new video card, the U55C. Before we go into that, can you just explain a little bit about the background why it was necessary and how long it’s been in the works?

NATHAN CHANG

Sure. In the world of HPC we’ve been throwing large amounts of compute at our problems for quite some time now, but we’re starting to see that compute isn’t always the bottleneck. Actually, it’s more often than not a different bottleneck. It tends to be the memory bandwidth. Compute problems are memory bandwidth bound.

So we took a card that and we were able to slim it down to a single slot and then also double the HBM on that card.

But more importantly, we provided the ability to scale out across these cards. Being able to create large clusters, hundreds of cards, and to be able to target all the HBM on those cards. Anybody that has ever bought an Alveo card has been able to see that there are QSFP ports on these cards and just like any other card, this card comes with two 100 Gbps QSFP ports which means that you have 200 Gbps of bandwidth.

But unlocking that bandwidth, unlocking the ability to cluster across these cards has always been a pretty big endeavor for our community, right? Developers had to create teams and then go and create their own clustering designs to meet their needs.

Now we’re coming forward with an open standards based clustering package, meaning that we’ll be leveraging RoCE v2, data center bridging, all over configured converged Ethernet, with 200 Gbps of bandwidth in each card.

And this means that in existing infrastructure in data centers, you’ll be able to put these cards in existing servers, be able to leverage them on existing Ethernet networks, and if the network is lossy, sure you can do that. You can put these cards on there, but if you’re able to create a separate lossless Alveo network using RoCE v2 in data center bridging you’ll be able to compete with InfiniBand and performance and latency.

NITIN DAHAD

I think you talked about enabling access to this capability for a wider set of software developers. So just explain that a little bit as well.

NATHAN CHANG

Yes, absolutely. When people look at Alveo’s they understand that, compared to commoditized compute, you have a couple of key features: the ability to pipeline and parallelize more, the ability to create your own custom memory hierarchies, the ability to create custom data movement and precondition data between blocks of math and compute. All of those features are now going to scale out across more and more cards, creating more and more opportunities to address bigger workloads.

NITIN DAHAD

Also, I think you were saying there’s only a small developer community that can actually work with Alveo cards, so you’re actually enabling wider access to more developers. Is that right?

NATHAN CHANG

Yes, and I think that’s the other key point. And thank you for bringing that up. Not only are we making room for bigger workloads, but we’re also ensuring that Vitis is available to more of the development community. No longer do you need to understand RTL, no longer do you need to understand the Verilog. You’re able to program Alveo cards and target Alveo boards with existing high level languages like C, C++ and Python.

NITIN DAHAD

And now you’ve been working with some people already, like CSIRO in Australia for a couple of years, and there are some examples coming out from there. Also, I think you talked about health tech and astronomy. How is this is enabling some advantage compared to what was done before?

NATHAN CHANG

Yes, so CSIRO and the square kilometer array. They’ve built a gigantic, massive square kilometer 131,000 radio antenna hub array to actually observe and catalog the origins of history. We’ve been working with them to help them design a cluster to be able to not only take in all that data but process it in line en route to a larger HPC cluster. And that took 420 U55Cs networked together to handle 15 terabits per second of throughput.

Another example is with Tiger graph. They’re taking on larger and larger graphs, and the algorithms that they’re needing to accelerate like a cosine similarity or something larger like a Louvain modularity, or even more complex like a maximum independent set, it just requires bigger math and bigger graphs, and so they’re going to need more cards, more cards to be able to query larger portions of the graph across HBM.

NITIN DAHAD

It’s going to be available via Xilinx as well as distributors. And then you’re going to make the clustering IP available open source sometime next year, is that right?

NATHAN CHANG

Yes, we’ll provide it in private preview. You can work with your sales people. You can work with your reseller to get ahold of us and our FAE network, but we’ll definitely make sure we’re working towards ensuring that it’s available early next year.

NITIN DAHAD

OK, well Nathan, thank you very much.

NATHAN CHANG

Yes, thanks Nitin.

NITIN DAHAD

Next, we talk to Weaver Labs, with this interview from the recent Cambridge Wireless International Conference.

I’m here at the Cambridge Wireless International Conference in Duxford and I’m here with Maria Lema of Weaver Labs.  She’s the founder. Hello Maria.

MARIA LEMA

Hi.

NITIN DAHAD

Maria, tell me about Weaver Labs, what do you do?

MARIA LEMA

We create a software product that it’s called Cell-Stack. This software sits on top of telecom infrastructure. It aggregates it all in a shared pool of resources with the objective of creating a network-as-a-service marketplace.

What we do is we try and solve the problem of interoperability across infrastructure that is owned by different players and also different technologies such as IoT, Wi-Fi, 5G, satellite and also security. Adding a security layer on top that enables all of these bits and pieces of infrastructure to be interconnected in a secure way.

NITIN DAHAD

So let’s look at two things. First of all, interoperability, and then we’ll talk about security. On interoperability, there’s an example I think you have connecting network devices on LoRaWAN and Globalstar.

MARIA LEMA

Exactly so we were approached with a very big challenge few years ago by a company who wanted to provide a network to humanitarian services when a disaster happened and there was no access to connectivity. The point was to be able to track the humanitarian aid that was arriving into that area.

So what we did was we created an IoT mesh network with the LoRaWAN technology, but in order to get that information out to the Internet we needed to use a satellite gateway. The thing is that Globalstar and LoRaWAN they don’t speak the same language. They are different protocols, and the payload of those packets don’t really match.

But using our software layer on top, we were able to interoperate these two systems and make the messages transport across two very different systems, and we do the same with 5G networks, satellite backhauls, and other technologies.

NITIN DAHAD

And then tell me about the security. So how is that being enabled?

MARIA LEMA

So you know how before we used to have these networks that was just one vendor suits everything and we used to have this security by obscurity. One system completely closed. Now we’re moving into different providers for different bits and pieces of the network, and everything is communicated using open APIs.

And that is a very big security problem because you have all sorts of open doors in software.  So the way we do it is by applying a zero trust architecture. We break all of the pieces of the network and each element of the network is itself an agent that connects to a layer, a peer to peer layer, and by using zero trust architecture, what we do is we break perimeter models that go beyond that little element of the network.

So for two systems to be able to talk to each other, two elements of this network to be able to talk to each other they need to authenticate and be authorized into this network. So that is the zero trust approach, so we leverage public key infrastructure based on blockchain technology to create the zero trust layer and this is a huge topic for Open RAN for example, because before we used to have just one network provider and now we have five. So imagine the complexity.

NITIN DAHAD

Exactly, and that leads us very nicely into the token product that you’ve launched.

MARIA LEMA

Yes, because we are a blockchain company, we used blockchain to enable this marketplace because we think that the big part of the inefficiencies of the telecom supply chain is one single element of command and control.

So breaking that and enabling decentralization is going to help this democratization of connectivity. But for that we need an ecosystem currency and to transact value within this ecosystem we use a token. At Weaver Labs we enable this by using the Adino token, which is our own native token, into the platform.

We haven’t launched it yet, but we are soon to be launching it in the platform.

NITIN DAHAD

And what would you say are the biggest challenges to enable private networks right now?

MARIA LEMA

Skill set.

Mobile networks are a very complex system to install, and now that we are even talking about five or six different providers to create one network, it becomes even more complex and the fact that all these private networks are owned by stakeholders that are not necessarily network providers. It makes it really difficult for them to own 5G networks.

So our mission is to make it easier for them to own networks and get plugged into a system by creating this larger marketplace.

NITIN DAHAD

And lastly, what’s next for Weaver Labs, what are you doing next? I think you’ve got some things coming up.

MARIA LEMA

Yes, we’re really excited about delivering our smart junctions 5G network in Manchester next year. And we really hope that that’s going to be a huge success by showing how a transport authority (Transport for Greater Manchester) owns and operates a 5G network by using a sustainable business model in leveraging this marketplace. And all sorts of other exciting things coming after that.

NITIN DAHAD

And I think what you’re enabling with their infrastructure is allowing them to, let’s say, sell off excess capacity on that. Is that right?

MARIA LEMA

Exactly, yes, so there is a capital investment that is leveraged by one smart city application which is provided by the city, and then there is a lot of excess of capacity that can be used for other public sector use cases or even private sector applications running on top. So the idea is to leverage this network as a service model to create a sustainable revenue model for them.

NITIN DAHAD

And that monetization is through an internal ecosystem using cryptocurrency, is that right?

MARIA LEMA

Exactly, so the idea is to use the blockchain technology. The smart contracts to transact with the value exchange and a native currency such as ours.

NITIN DAHAD

Thank you very much Maria.

MARIA LEMA

Thank you very much.

NITIN DAHAD

Finally, I spoke to Joel Sylvester, CTO and founder of Dukosi, a firm that delivers battery management data directly from the battery cell using wireless to eliminate masses of cable harnesses. He describes the technology.

JOEL SYLVESTER

What we’ve developed is a cell monitoring device for use in very large high voltage lithium-ion battery packs, of the sort that you’ll find in electric vehicles, electric buses, marine applications, grid energy storage applications, and so on. Basically, anything that requires a large battery pack these days is moving to or has moved to lithium-ion chemistries and you need to monitor those cells very closely. There’s a lot of energy in them. You need to pay very close attention to what the cell voltage is, what the temperature is, and so on in order to keep the pack safe and to make them last as long as you possibly can.

What we’ve developed is a silicon chip and the software to go with it that allows you to monitor the voltage, the currents, the temperature, and many other things characteristics of individual lithium-ion cells.

And then when you group hundreds or possibly thousands of these cells together into a battery system, we have a near-field communications technology that allows you to connect all of those chips across the whole of the battery system into a single battery network communications network which reports back to a radio manager device which is also, one of our products and then the data is used by the battery management system to ensure the battery safety to control it, switch on, switch it off, do thermal management, and all the other things that these large battery packs do.

So it’s a chip and software to go with it.

NITIN DAHAD

How is that different to what’s out there at move?

JOEL SYLVESTER

The problem has been around for, you know, 20 odd years. You need to make measurements on lithium-ion cells. But actually, the way that is done, if you were to look at what’s on the market now from, you know some of the big-name semiconductor companies, the devices look almost exactly the same to those that were available in the late 1990s. It’s not really evolved very much in that time. The way that the technology has gone elsewhere, is trying to address more and more cells at the same time, so 12 cell, 14 cells, 16 cells and that’s taking them down a particular route of trying to go to higher and higher voltages.

Our product just does 1 cell at a time. So you need a lot more of them. But it makes the measurements on that cell really, really well.  We’ve got industry-leading accuracy on the measurements. We can measure temperature on every cell.  We can run algorithms on the cells to tell you what the state of charge is or the state of health, or many other characteristics of the lithium-ion cells, we can do that really, really well at 1 cell and then you can connect them all together very easily into a battery network. No additional connectors. No wiring harnesses or all of the other stuff there. That’s all gone.

That battery network then tells you everything you need to know about the battery system. You’re taking away cables, you’re taking away connectors, you’re taking away all the mechanical structures you require to support them and make sure they don’t kind of braid, moving the measurements, the sensor, right to the point where you need to make the measurements.

If you think of a battery pack module being the size of a suitcase, if you’ve got a board at one end of that suitcase, making measurements on cells at the other end, then that’s a lot of wiring. There’s a lot of opportunity there for interference to degrade the quality of the measurements that you’re making.

We are right on the cell. You’re not going to get a better measurement. The analog signals, because they are right close on the cell, the analog measurements are as good as they’re going to possibly get.

And at the other side, with the radio frequency communications we’re using, you’re up at 2.5 GHz, so you are way, way above where almost all of the noise in a powertrain, say, exists.

So rather than trying to communicate at the same kind of frequencies as the inverters screaming, you’re many many orders of frequency above that.

NITIN DAHAD

You call it disruptive. What is the disruptive bit, the wireless?

JOEL SYLVESTER

The wireless part is really kind of the enabler. There are other wireless BMS solutions out there at the moment, but they’re actually other than doing away with wiring, they are more complex than the systems they’re replacing.

We have simplified the monitoring technology considerably, and the disruptive aspect of it is that, because you can associate 1 cell monitor with one cell, it gives you opportunities then to change the way in which you manufacture battery packs.

If all you’re doing is building a battery management system and applying it to an existing pack, yes, it’s simpler. And we would say that it’s more reliable with better quality measurements and stuff like that, but that’s better than the existing solution.

What’s disruptive is when you put the chip onto the cell. Now you’ve got an intelligent cell that you can configure into battery packs of any size, shape, configuration. You can create multiple battery products using the same cells in the same intelligent cells. That’s more disruptive because it changes the way in which the battery industry is going to approach the way that they monitor and manage their batteries.

The thing that always comes up first is this: getting rid of wiring harnesses. The pack manufacturers hate them. All they do is reduce the reliability, create safety issues, they’re expensive to design and manufacture and install, so getting rid of wiring harnesses is always the first one; after that, it’s the quality of the measurements. We can make a temperature measurement on every cell in exactly the same manner, position on every cell. That allows them to improve the performance of their battery packs.

Previously it has always been a tradeoff between the number of sensors and the amount of wiring and how well they understand the performance of their packs. Well, get rid of the wiring and you can do both.

The battery industry is still quite young. There’s no kind of single accepted way of building a battery. There’s still arguments about: do you make it out of cylindrical cells or pouch cells or prismatics? And there’s no indication that’s going to be decided anytime soon. And we can work with all of the formats, so we help work with the customers with the evaluation of the engineering devices to go: “This is how you might consider packaging it.”

NITIN DAHAD

OK, so you’ve got it on show in Stuttgart and then you have product launch next year?

JOEL SYLVESTER

Product launch next year, automotive qualification takes a long time. So we’re going to have products with customers next year and then the real kind of volume qualified products is going to be in 2023.

NITIN DAHAD

Joel, good luck and thank you.

JOEL SYLVESTER

Thank you very much.

NITIN DAHAD

So that brings us to the end of this episode. That was embedded edge with Nitin, and I’m Nitin Dahad.  Thanks for listening and see you next time.

Embedded Edge with Nitin is brought to you by Aspencore Media. The host is Nitin Dahad and the producer is James Ede.