CORENEXT
SCIENTIFIC PUBLICATIONS
Welcome to the Scientific Publications section of COREnext, a dynamic hub for our scientific and literature stemming from our European project. As pioneers in research and innovation, we delve into the realms of 5G, 6G, cyber security, and trustworthiness, producing a diverse array of publications. Here, you’ll find a comprehensive collection of our scientific papers, conference papers and book chapters. Explore our publications to gain insights into the latest advancements and discoveries shaping the digital landscape. Join us as we share knowledge and drive progress towards a safer and more connected future!
A 0.3 THz Transmitter in 90-nm BiCMOS Technology for Energy-Efficient High Data Rate Communication
In this work, an energy-efficiency high data rate transmitter at 0.3 THz is presented. The design is implemented in Infineon’s 90-nm SiGe BiCMOS technology. The 3-dB RF bandwidth is between 254–303 GHz, and the 3-dB IF bandwidth is 32 GHz. The transmitter is evaluated to support OOK data rates up to 32 Gbps, and 40 Gbps QPSK using an IF carrier. The DC power consumption is 285 mW.
Distrusting cores by separating computation from isolation
In this paper, we propose the untrusted core isolation model to protect critical computation on trusted cores from untrusted and potentially buggy cores. We survey how current architectural building blocks such as MMUs fall short of this goal and derive requirements for untrusted core isolation. To demonstrate its feasibility, we discuss both changes to commodity platforms and show how research works such as M3 fulfill the requirements. We evaluate the security benefits via a qualitative comparison of current architectures in both industry and academia and study its costs by a quantitative comparison of the most promising approaches on off-the-shelf and FPGA-based platforms.
An Energy-Efficient 56-Gb/s D-band TX-to-RX Link using CMOS ICs and Transmitarray Antennas
This paper presents an energy-efficient system with a transmitter and a receiver comprising multi-channel integrated circuits in 45-nm CMOS technology, in-package antenna feeds and high-directivity planar lenses. Data-rates up to 56 Gb/s are demonstrated over a 1-m point-to-point link using a full-digital communication system. The active circuits integrate multi-LO generators for implementing a channel-aggregation architecture that provides a large RF bandwidth using a significantly narrower baseband interface bandwidth. The energy consumption of the overall RF system is only 33 pJ/bit.
Ultra-Broadband Frequency Multiplier (x8) Chain in 90-nm SiGe BiCMOS Technology at H-band
This work presents an H-band (220–325 GHz) frequency octupler realized in a 90-nm SiGe BiCMOS process. The 3-dB bandwidth is between 234–305 GHz, resulting in a fractional bandwidth of 26.3 %. The multiplier achieves a conversion gain between 230–310 GHz. The peak output power is −0.5 dBm, using an input power of −5 dBm. The DC power consumption is 122 mW. This type of circuit is suitable for future communication and radar systems.
TCDM Burst Access: Breaking the Bandwidth Barrier in Shared-L1 RVV Clusters Beyond 1000 FPUs
As computing demand and memory footprint of deep learning applications accelerate, clusters of cores sharing local (L1) multi-banked memory are widely used as key building blocks in large-scale architectures. When the cluster’s core count increases, a flat all-to-all interconnect between cores and L1 memory banks becomes a physical implementation bottleneck, and hierarchical network topologies are required. However, hierarchical, multi-level intra-cluster networks are subject to internal contention which may lead to significant performance degradation, especially for SIMD or vector cores, as their memory access is bursty. We present the TCDM Burst Access architecture, a software-transparent burst transaction support to improve bandwidth utilization in clusters with many vector cores tightly coupled to a multi-banked L1 data memory. In our solution, a Burst Manager dispatches burst requests to L1 memory banks, multiple 32b words from burst responses are retired in parallel on channels with parametric data-width.
A Dynamic Allocation Scheme for Adaptive Shared-Memory Mapping on Kilo-core RV Clusters for Attention-Based Model Deployment
Attention-based models demand flexible hardware to manage diverse kernels with varying arithmetic intensities and memory access patterns. Large clusters with shared L1 memory, a commonarchitectural pattern, struggle to fully utilize their processing elements (PEs) when scaled up due to reduced throughput in the hierarchical PE-to-L1 intra-cluster interconnect. This paper presents Dynamic Allocation Scheme (DAS), a runtime programmable address remapping hardware unit coupled with a unified memory allocator, designed to minimize data access contention of PEs onto the multi-banked L1. We evaluated DAS on an aggressively scaled-up 1024-PE RISC-V cluster with Non-Uniform Memory Access (NUMA) PE-to-L1 interconnect to demonstrate its potential for improving data locality in large parallel machine learning workloads. For a Vision Transformer (ViT)-L/16 model, each encoder layer executes in 5.67ms, achieving a 1.94× speedup over the fixed word-level interleaved baseline with 0.81 PE utilization. Implemented in 12nm FinFET technology, DAS incurs <0.1% area overhead.
D-band Transmitter Achieving 57.6 Gb/s and 30 dBm EIRP Based on Channel Aggregation 45-nm ICs and a Low-Profile Flat Lens Antenna
We present two high-efficiency baseband-to-D-band transmitting systems based on a novel channel-aggregation architecture. They comprise a first integrated circuit for channel aggregation at intermediate frequency, a two-channel 45-nm CMOS transmitter flip-chipped on antenna-in-package which illuminates a high-gain flat lens antenna in standard printed circuit board technology. The two signals emitted by the active antenna module, in adjacent bands, are aggregated over the air and collimated by the flat lens. Two transmitters using, respectively, a standard and a folded transmitarray antenna, three times thinner than the former one, are realized and compared. They achieve similar performance in the operating band (139.3-156.6 GHz), with a measured peak effective isotropic radiated power of 30 dBm. A wireless point-to-point link is demonstrated with the proposed transmitting system and a commercial receiver. A data rate of 57.6 Gb/s is measured at 1 m using 8 channels and a 16-QAM scheme, with an energy efficiency of 27.4 pJ/b.
Development of a Universal FPGA-Based Coprocessor for 5G NR and WLAN LDPC Coding
Low-density parity-check (LDPC) codes are widely used in modern communication systems due to their near-capacity error correction performance. This paper presents a practical FPGA implementation of a universal hardware coprocessor for LDPC encoding and decoding, focusing on a system-level architecture, achievable data rate, latency measurements, and hardware resource utilization. The LDPC coding is realized by the Xilinx hardware macros available in the Xilinx RF-SoC FPGAs. We explore various design simplifications, including core combining, memory management, and data scheduling, to achieve high throughput while maintaining the lowest implementation complexity. The proposed architecture is implemented on an FPGA platform and is equipped with 10 Gb/s Ethernet interfaces, demonstrating real-time decoding capabilities and improved performance compared to software-based approaches. Experimental results validate the design, showcasing its applicability in high-speed communication systems. This work can serve as a reference for engineers and researchers aiming to deploy LDPC decoding in FPGA-based environments by reusing the existing Intellectual Property (IP), which is freely available in Xilinx SoC.
Fast End-to-End Simulation and Exploration of Many-RISCV-Core Baseband Transceivers for Software-Defined Radio-Access Networks
The fast-rising demand for wireless bandwidth [1] requires rapid evolution of high-performance baseband processing infrastructure. Programmable many-core processors for software-defined radio (SDR) have emerged as high-performance baseband processing engines, offering the flexibility required to capture evolving wireless standards and technologies [2]–[4]. This trend must be supported by a design framework enabling functional validation and end-to-end performance analysis of SDR hardware within realistic radio environment models. We propose a static binary translation based simulator augmented with a fast, approximate timing model of the hardware and coupled to wireless channel models to simulate the most performance critical physical layer functions implemented in software on a many (1024) RISC-V cores cluster customized for SDR. Our framework simulates the detection of a 5G OFDM-symbol on a server-class processor in 9.5s-3min, on a single thread, depending on the input MIMO size (three orders of magnitude faster than RTL simulation). The simulation is easily parallelized to 128 threads with 73-121× speedup compared to a single thread.
Separate but Together: Integrating Remote Attestation into TLS
Confidential computing based on Trusted Execution Environments(TEEs) allows software to run on remote servers without trusting the administrator. Remote attestation offers verifiable proof of the software stack and hardware elements comprising the TEE.However,setting up a secure channel to such a TEE requires a security guarantee that the channel actually terminates inside the TEE. TLS is an existing protocol for secure channel establishment, and in its most common use on the Web,it uses a keypair to assert the server identity encoded in a certificate. Various approaches have been proposed to integrate remote attestation into TLS. Unfortunately, they all have short comings. In this paper, we present a protocol that combines the existing certificate-based assurances of TLS with remote attestation-based assurances in a way that they can be deployed independently and can fail independently. We design these two assurances to be additive without relying on each other, a property that has not been considered by existing approaches.
Analog Crosstalk Cancellation for High Data Rate Communication Links
In this work, a design of an analog I/Q crosstalk compensation circuit in 130 nm SiGe BiCMOS is proposed. The circuit consists of four baseband variable gain amplifiers based on Gilbert cells. As a proof of concept, a 20 Gbps QPSK input signal with high bidirectional crosstalk (20% and 30 %), equal to an error vector magnitude (EVM) of 25. 1 % and signal-to noise ratio (SNR) of 12.0 dB, was improved to EVM=16.1 % and SNR= 15.9 dB at the output. Unidirectional cross-talk up to 50%, was investigated, and at 50 % the EVM improved from 25. 9 % to 18. 6 %, and the SNR from 11.7 to 14.6 dB.
D-Band Channel Modeling in Data Centers and Industrial Environments
This paper presents D-band channel modelling for short-range line-of-sight communications in data centre and industrial environments. Using a multipath estimation framework with 0.1° angular resolution, we extract precise large- and small-scale channel characteristics. In data centres, inter-rack links show a high path loss exponent (PLE 2.39) and low delay spread (1.32 ns), with cable obstructions raising the PLE to 3.24 but reducing delay spread to 0.59 ns. Industrial machine-to-machine links exhibit a lower PLE (1.67), with moderate delay and angular spreads (4.35 ns, 16.46°) and a K-factor of 8.11 dB. Results reveal strong clustering in data centres but weaker effects in industrial settings. These findings support D-band channel standardisation by highlighting distinct propagation features across environments.
A 66Gbps/5.5W RISC-V Many-Core Cluster for 5G+ Software-Defined Radio Uplinks
With the scale-up of 5G and beyond, base stations face rising computing demands under tight latency and power constraints. We present a many-core cluster design featuring 1024 streamlined RISC-V cores with floating-point extensions and 4MiB shared memory. It supports software-defined processing of the 5G physical uplink shared channel, achieving throughput up to 302Gbps — around 10× higher than state-of-the-art processors. The cluster delivers competitive energy efficiency (2–41Gbps/W), completing end-to-end PUSCH processing in 1.7ms at under 6W, reaching 12Gbps/W.
Conflict Management in Vector Register Files
Vector processors' instruction set architecture works with vectors in a vector register file, which must manage multiple concurrent accesses. High utilization leads to access conflicts, causing performance degradation. Using a software model, we explore the impact and characteristics of these conflicts and methods to manage them: avoidance, resolution, and mitigation. For avoidance, we examine static bank layouts and propose a dynamic one to address their limitations by assigning new registers a unique starting bank. For resolution, we compare arbitration algorithms and optimize round-robin for mixed-width arithmetic by prioritizing wide operands. For mitigation, we study operand queues of varying depths. Our solutions aim to improve vector processors' area efficiency by allowing shallower operand queues or reducing the number of banks, with a performance impact of 10% or less. These insights can also apply to other shared memory systems.
An Architecture for Shrinking the TCB of TEEs on Heterogeneous Systems
Trusted Execution Environments (TEEs) enable secure code execution on machines that are not fully controlled by the user who runs the code. However, existing TEE solutions do not provide unified support for systems with heteroge neous core architectures or accelerators. Furthermore, their implementation is complex and requires the user to trust (typically closed) firmware in addition to the TEE hardware. Wepropose a heterogeneous TEE architecture with minimal hardware support to reduce the trust in firmware, as well as a minimal Root-of-Trust that enables features such as remote attestation for such TEEs.
Towards adaptive RISC-V based systems for non-terrestrial sub-THz communication
The upcoming 6G communication standard promises unrivaled bandwidth, connectivity, and coverage and will likely span from most remote places over densely populated areas into low earth orbit. The implementation of this vision, however, poses many considerable challenges to the underlying processing hardware with advanced solutions needed to meet these requirements – especially in space. These challenges include the need for significant technological advances, critical demands in terms of performance, reliability, and adaptability, and considerations in terms of the trustworthiness of devices, to name only a few of them. This paper presents our joint efforts to address these needs and enable open-source, adaptive, and fault-tolerant processing systems for 6G communication systems in low-earth orbit.
A novel sparse-connected architecture for multi-user mmWave communication
This paper proposes a sparse analog beamforming (ABF) architecture for multiuser mmWave downlink systems, where each RF chain is connected to a subset of antennas. Beam alignment and RF chain-antenna associations are jointly optimised using a binary association matrix (BAM), with an iterative algorithm addressing the complexity. Simulations show the approach outperforms fully and partially connected architectures, improving SINR and bit error rate while reducing the reliance on precise beam allocation.
Broadband Sub-THz Dielectric Waveguides Characterization
This paper presents the sub-THz characterisation of various plastic fibres for high-speed mmWave communications using cost-effective, broadband waveguide transitions. Transitions from rectangular to circular and circular to plastic waveguides were designed and tested at D-band (110–170 GHz) and H-band (220–330 GHz). Results show insertion losses as low as 1–2 dB and fibre losses between 4 and 20 dB/m, depending on geometry and frequency, with strong alignment between simulations and measurements.
A Transmitter/Receiver Link for High Data Rate Polymer Microwave Fiber Communication at Y-band
A Y-band (170-260 GHz) ultra high data rate transmitter (Tx) and receiver (Rx), are designed and fabricated in a commercial 130 nm silicon germanium (SiGe) BiCMOS process. The link has demonstrated data rates up to 30 Gbps over a one meter polymer microwave fiber (PMF), using a carrier of 237 GHz. This is the first PMF link above 200 GHz reaching a distance of one meter.
D-Band Channel Modelling by 3D Ray Tracing for Joint Communications and Sensing
This paper presents 3D geometrical channel modelling experiments in the D-band frequency and presents the feasibility to develop joint communications and sensing (JCAS) applications in this spectrum. We propose a novel flexible 3D ray tracer for deterministic channel modelling in D-band and its output is benchmarked with existing measurements with quantified differences, showing that the received power deviation is within 2 dB, the delay deviation is within 1 ns and the angle deviation is within 4°• Statistics of the multipath components simulated by the ray tracer are also investigated under different ray tracing configurations, featuring a non-linear relationship versus the scatterer settings, and the output of the ray tracer can be exploited for sensing applications.
A 1024 RV-Cores Shared-L1 Cluster with High Bandwidth Memory Link for Low-Latency 6G-SDR
We introduce an open-source architecture for next-generation Radio-Access Network baseband processing: 1024 latency-tolerant 32-bit RISC-V cores share 4 MiB of L1 memory via an ultra-low latency interconnect (7-11 cycles), a modular Direct Memory Access engine provides an efficient link to a high bandwidth memory, such as HBM2E (98% peak bandwidth at 910GBps). The system achieves leading-edge energy efficiency at sub-ms latency in key 6G baseband processing kernels: Fast Fourier Transform (93 GOPS/W), Beamforming (125 GOPS/W), Channel Estimation (96 GOPS/W), and Linear System Inversion (61 GOPS/W), with only 9% data movement overhead.
Sensitivity Analysis of mmWave Multiuser MIMO with Imperfect Analog Beamforming State Information
This paper analytically examines the impact of imperfect beamforming (BF) information on analog-beamformed multiuser downlink MIMO systems. It derives approximations for average SINR and symbol error probability (SEP) under various BF error distributions and validates them through simulations for both partially and fully connected architectures. Results show that sub-degree BF accuracy, especially at the transmitter, is crucial, particularly as system load and antenna count increase. The study highlights the significant alignment challenges facing future mmWave systems using analog BF.
Twisting Effects on X-Shaped Millimeter-Wave Plastic Waveguides
This paper investigates the impact of twisting on hybrid and mode propagation in X-shaped plastic waveguides, which offer a lightweight, low-cost alternative for high-speed data transmission. Unlike prior studies focusing on bending, this work addresses twisting, both theoretically and experimentally. Results show that in both twisted and twisted-bent configurations, the X-shaped design maintains a stable polarisation direction, highlighting its robustness for applications like data centres and autonomous vehicles.
High-Performance Polymer Microwave Fiber Coupler in eWLB Package for Sub-THz Communication
In this paper, a compact and efficient transceiver integrated circuit (IC) to polymer microwave fiber (PMF) coupler realized in an embedded wafer level ball grid array (eWLB) package is presented for the first time. The proposed solution uses a Vivaldi antenna realized using the redistribution layer of eWLB. The system operates around 140 GHz and achieves a coupling loss of only 4 dB.
The Evolution of Mobile Network Operations: A Comprehensive Analysis of Open RAN Adoption
This paper examines the transformative potential of open RAN (O-RAN) technology for Mobile Network Operators (MNOs) aiming to modernise their infrastructure in response to increasing data demands. It provides a thorough overview of the current state of open RAN research, deployments, and technologies, followed by an analysis of the decision-making roadmap for adoption, covering network design, vendor selection, and implementation strategies. The paper also explores key components, functional splits, and accelerator options, offering practical guidance for MNOs. It concludes by discussing the modular nature of O-RAN and the complexity of its design phase, highlighting both challenges and proposed solutions.
TeraPool-SDR: An 1.89TOPS 1024 RV-Cores 4MiB Shared-L1 Cluster for Next-Generation Open-Source Software-Defined Radios
In this paper, the authors address the increasing demands of 5G and future RAN workloads by presenting Terapool-SDR, a highly efficient processing cluster designed for Software Defined Radio (SDR). The cluster features 1024 processing elements and a fast memory system, achieving high energy efficiency across key 5G tasks, while consuming less than 10W of power. (DOI: https://doi.org/10.1145/3649476.365873)
LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems through Polling-Free and Retry-Free Operation
In this paper, the authors address the issue of polling in shared-memory manycore systems, which leads to contention and inefficiencies. They propose LRwait and SCwait synchronization methods, along with the scalable Colibri implementation, to reduce polling by allowing cores to sleep while waiting. This approach results in a 6.5x improvement in throughput and a 7.1x increase in energy efficiency on a 256-core RISC-V platform, with only a 6% area overhead. (DOI: https://doi.org/10.48550/arXiv.2401.09359)
Towards Disaggregation-Native Data Streaming between Devices, 3rd Workshop on Heterogeneous Composable and Disaggregated Systems (HCDS)
This paper explores the ongoing trend of disaggregation in datacenters, which aims to increase flexibility by connecting pools of CPUs, accelerators, and memory using interconnect technologies like CXL.
(DOI: https://doi.org/10.48550/arXiv.2406.09421)
Core-Local Reasoning and Predictable Cross-Core Communication with M3
This paper delves into enhancing the real-time capabilities of the M³ architecture while maintaining its robust security properties. This research addresses the critical balance between performance and security, offering innovative solutions for advanced system architecture. (DOI: https://doi.org/10.1109/RTAS61025.2024.00024)
Towards Modular Trusted Execution Environments
In this conference workshop the authors propose a modular TEE design. They apply this modular design to the M3 hardware/software co-design platform and demonstrate how TEE support can be made a first-class feature at the system-architecture level.
(DOI: https://doi.org/10.1145/3578359.3593037)
Circularly Polarized Sub-THz Antenna Design for Distributed Deployment
In this paper, the writers propose an antenna-in-package concept for the single-layer substrate on low-cost embedded wafer level ball grid array (eWLB) packages. (DOI: https://doi.org/10.23919/EuCAP60739.2024.10501515)
Distributed Radar Network with Polymer Microwave Fiber (PMF) Based Synchronization
This conference paper presents advancements on distributed radar networks, which provide numerous advantages such as increased angular resolution and improved signal-to-noise ratio. (DOI: https://doi.org/10.1109/WiSNeT59910.2024.10438574)
MinPool: A 16-core NUMA-L1 Memory RISC-V Processor Cluster for Always-on Image Processing in 65nm CMOS
This paper presents MinPool, a low-power image processor for always-on functions implemented in TSMC’s 65 nm technology and based on a tailored MemPool architecture. (DOI: https://doi.org/10.1109/ICECS58634.2023.10382925)
An 80 Gbps QAM-16 PMF Link Using a 130 nm SiGe BiCMOS Process
In this work a D-band (110 GHz – 170 GHz) polymer microwave fiber (PMF) link for high data rate communication is presented.
(DOI: https://doi.org/10.1109/IMS37964.2023.10188207)
A Beyond 100-Gbps Polymer Microwave Fiber Communication Link at D-band
In this work, a D-band (110-170 GHz) ultra high data rate link is presented and characterized.
(DOI: https://doi.org/10.1109/TCSI.2023.3262725)
Disruptive TRX design for D-band
In today’s connected world, the demand for mobile communications and instant access to information, anytime and anywhere, has drastically changed the electronics landscape, both consumer and industrial. This book provides an overview of the latest research results in RF and digital SOI technology development for 5G and 6G, device and substrate characterization, packaging technology, and the realization of full systems including power amplifiers, linearization techniques, beamforming transceivers, access points, and radar detection. (Electronic ISBN:9788770040730)
Software-Defined CPU Modes
CPUs contain a compute instruction set, which regular applications use. This paper explores the question, whether CPU modes could be defined entirely by software. Researchers show how such a design would function and explore the advantages it enables. They believe that pushing all existing modes under a common design umbrella would enforce a cleaner structure and more control over exposed functionality. At the same time, the flexibility of software-defined modes enables interesting new use cases.
(DOI: https://doi.org/10.1145/3593856.3595894)
Dual Vector Load for Improved Pipelining in Vector Processors
Vector processors execute instructions that manipulate vectors of data items using time-division multiplexing (TDM). In this paper, the researchers propose a dual vector load: A parallel or interleaved load of the two input vectors. Their investigation finds that compute-bound and some memory-bound applications profit from this feature when the memory and compute bandwidths are sufficiently high. A speedup of up to 33 % is possible in the ideal case.
(DOI: https://doi.org/10.1109/COOLCHIPS57690.2023.10121996)
