Oscar Castañeda Fernández


Loading...

Last Name

Castañeda Fernández

First Name

Oscar

Organisational unit

09695 - Studer, Christoph / Studer, Christoph

Search Results

Publications1 - 10 of 38
  • Castañeda Fernández, Oscar; Jacobsson, Sven; Durisi, Giuseppe; et al. (2018)
    IEEE International Symposium on Circuits and Systems (ISCAS). Proceedings, 27–30 May 2018, Florence, Italy
    Fifth-generation (5G) cellular systems will build on massive multi-user (MU) multiple-input multiple-output (MIMO) technology to attain high spectral efficiency. However, having hundreds of antennas and radio-frequency (RF) chains at the base station (BS) entails prohibitively high hardware costs and power consumption. This paper proposes a novel nonlinear precoding algorithm for the massive MU-MIMO downlink in which each RF chain contains an 8-phase (3-bit) constant-modulus transmitter, enabling the use of low-cost and power-efficient analog hardware. We present a high-throughput VLSI architecture and show implementation results on a Xilinx Virtex-7 FPGA. Compared to a recently-reported nonlinear precoder for BS designs that use two 1-bit digital-to-analog converters per RF chain, our design enables up to 3.75 dB transmit power reduction at no more than a 2.7x increase in FPGA resources.
  • Castañeda Fernández, Oscar; Benini, Luca; Studer, Christoph (2022)
    ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC)
    We present PULPO, a floating-point baseband-processing accelerator for massive multi-user multiple-input multiple-output (MU-MIMO) basestations (BSs). PULPO accelerates matrix-vector products, not only with a matrix but also with its Hermitian, as well as affine transforms and nonlinear projections used in iterative algorithms that outclass traditional linear methods in various applications. PULPO is integrated in a system-on-chip (SoC) with a tight integration to the system's data memory, facilitating data exchange and co-operation with 8 RISC-V cores. The fabricated accelerator achieves comparable efficiency as recently-proposed fixed-point baseband processors, while eliminating the burdens associated with fixed-point design, thus simplifying massive MU-MIMO BS development.
  • Castañeda Fernández, Oscar; Bobbett, Maria; Gallyas-Sanhueza, Alexandra; et al. (2019)
    2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
    Processing in memory (PIM) moves computation into memories with the goal of improving throughput and energy-efficiency compared to traditional von Neumann-based architectures. Most existing PIM architectures are either general-purpose but only support atomistic operations, or are specialized to accelerate a single task. We propose the Parallel Processor in Associative Content-addressable memory (PPAC), a novel in-memory accelerator that supports a range of matrix-vector-product (MVP)-like operations that find use in traditional and emerging applications. PPAC is, for example, able to accelerate low-precision neural networks, exact/approximate hash lookups, cryptography, and forward error correction. The fully-digital nature of PPAC enables its implementation with standard-cell-based CMOS, which facilitates automated design and portability among technology nodes. To demonstrate the efficacy of PPAC, we provide post-layout implementation results in 28nm CMOS for different array sizes. A comparison with recent digital and mixed-signal PIM accelerators reveals that PPAC is competitive in terms of throughput and energy-efficiency, while accelerating a wide range of applications and simplifying development.
  • Castañeda Fernández, Oscar; Tom Goldstein; Studer, Christoph (2018)
    IEEE Transactions on Circuits and Systems I: Regular Papers
    Channel estimation errors have a critical impact on the reliability of wireless communication systems. While virtually all existing wireless receivers separate channel estimation from data detection, it is well known that joint channel estimation and data detection (JED) significantly outperforms conventional methods at the cost of high computational complexity. In this paper, we propose a novel JED algorithm and corresponding VLSI designs for large single-input multiple-output (SIMO) wireless systems that use constant-modulus constellations. The proposed algorithm is referred to as PRojection Onto conveX hull (PrOX) and relies on biconvex relaxation (BCR), which enables us to efficiently compute an approximate solution of the maximum-likelihood JED problem. Since BCR solves a biconvex problem via alternating optimization, we provide a theoretical convergence analysis for PrOX. We design a scalable, high-throughput VLSI architecture that uses a linear array of processing elements to minimize hardware complexity. We develop corresponding field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) designs, and we demonstrate that PrOX significantly outperforms the only other existing JED design in terms of throughput, hardware-efficiency, and energy-efficiency.
  • Marti, Gian; Castañeda Fernández, Oscar; Studer, Christoph (2021)
    IEEE Open Journal of Circuits and Systems
    Millimeter-wave (mmWave) massive multi-user multiple-input multiple-output (MU-MIMO) promises unprecedented data rates for next-generation wireless systems. To be practically viable, mmWave massive MU-MIMO basestations (BSs) must rely on low-resolution data converters which leaves them vulnerable to jammer interference. This paper proposes beam-slicing, a method that mitigates the impact of a permanently transmitting jammer during uplink transmission for BSs equipped with low-resolution analog-to-digital converters (ADCs). Beam-slicing is a localized analog spatial transform that focuses the jammer energy onto few ADCs, so that the transmitted data can be recovered based on the outputs of the interference-free ADCs. We demonstrate the efficacy of beam-slicing in combination with two digital jammer-mitigating data detectors: SNIPS and CHOPS. Soft-Nulling of Interferers with Partitions in Space (SNIPS) combines beam-slicing with a soft-nulling data detector that exploits knowledge of the ADC contamination; projeCtion onto ortHOgonal complement with Partitions in Space (CHOPS) combines beam-slicing with a linear projection that removes all signal components co-linear to an estimate of the jammer channel. Our results show that beam-slicing enables SNIPS and CHOPS to successfully serve 65% of the user equipments (UEs) for scenarios in which their antenna-domain counterparts that lack beam-slicing are only able to serve 2% of the UEs.
  • Bucheli, Florian; Castañeda Fernández, Oscar; Marti, Gian; et al. (2024)
    2024 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)
    We present the first multi-user (MU) multiple-input multiple-output (MIMO) receiver ASIC that mitigates jamming attacks. The ASIC implements a recent nonlinear algorithm that performs joint jammer mitigation (via spatial filtering) and data detection (using a box prior on the data symbols). Our design supports 8 user equipments (UEs) and 32 basestation (BS) antennas, QPSK and 16-QAM with soft-outputs, and enables the mitigation of single-antenna barrage jammers and smart jammers. The fabricated 22 nm FD-SOI ASIC includes preprocessing, has a core area of 3.78 mm² , achieves a throughput of 267 Mb/s while consuming 583 mW, and is the only existing design that enables reliable data detection under jamming attacks.
  • Castañeda Fernández, Oscar; Goldstein, Tom; Studer, Christoph (2017)
    2017 IEEE International Symposium on Circuits and Systems (ISCAS)
    Joint channel estimation and data detection (JED) enables near-optimal error-rate performance in realistic wireless communication systems that suffer from channel estimation errors. In this paper, we propose a new JED algorithm and a corresponding FPGA design for large single-input multiple-output (SIMO) wireless systems that use constant-modulus constellations. Our algorithm, referred to as PrOX (short for PRojection Onto conveX hull), relies on biconvex relaxation (BCR) in order to efficiently compute an approximate solution of the maximum-likelihood JED problem that exhibits prohibitive complexity. PrOX is a simple and hardware-friendly algorithm that achieves near-optimal error-rate performance for a wide-range of system configurations. To demonstrate the efficacy of PrOX, we develop a scalable VLSI architecture and present reference implementation results on a Xilinx Virtex-7 FPGA. Compared to a recently-reported reference JED design, PrOX achieves 3x higher throughput, 20x better hardware-efficiency (in terms of throughput per look-up tables), and 8x improved energy-efficiency.
  • Castañeda Fernández, Oscar; Goldstein, Tom; Studer, Christoph (2016)
    IEEE Transactions on Circuits and Systems I: Regular Papers
    Practical data detectors for future wireless systems with hundreds of antennas at the base station must achieve high throughput and low error rate at low complexity. Since the complexity of maximum-likelihood (ML) data detection is prohibitive for such large wireless systems, approximate methods are necessary. In this paper, we propose a novel data detection algorithm referred to as Triangular Approximate SEmidefinite Relaxation (TASER), which is suitable for two application scenarios: i) coherent data detection in large multi-user multiple-input multiple-output (MU-MIMO) wireless systems and ii) joint channel estimation and data detection in large single-input multiple-output (SIMO) wireless systems. For both scenarios, we show that TASER achieves near-ML error-rate performance at low complexity by relaxing the associated ML-detection problems into a semidefinite program, which we solve approximately using a preconditioned forward-backward splitting procedure. Since the resulting problem is non-convex, we provide convergence guarantees for our algorithm. To demonstrate the efficacy of TASER in practice, we design a systolic architecture that enables our algorithm to achieve high throughput at low hardware complexity, and we develop reference field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) designs for various antenna configurations.
  • Castañeda Fernández, Oscar; Jacobsson, Sven; Durisi, Giuseppe; et al. (2019)
    2019 53rd Asilomar Conference on Signals, Systems, and Computers
    Power consumption of multi-user (MU) precoding is a major concern in all-digital massive MU multiple-input multiple-output (MIMO) base-stations with hundreds of antenna elements operating at millimeter-wave (mmWave) frequencies. We propose to replace part of the linear Wiener filter (WF) precoding matrix by a finite-alphabet WF precoding (FAWP) matrix, which enables the use of low-precision hardware that consumes low power and area. To minimize the performance loss of our approach, we present methods that efficiently compute FAWP matrices that best mimic the WF precoder. Our results show that FAWP matrices approach infinite-precision error-rate and error-vector magnitude performance with only 3-bit precoding weights, even when operating in realistic mmWave channels. Hence, FAWP is a promising approach to substantially reduce power consumption and silicon area in all-digital mmWave massive MU-MIMO systems.
  • Castañeda Fernández, Oscar; Studer, Christoph; Jeon, Charles (2019)
    IEEE Solid-State Circuits Letters
    This letter presents a novel data detector application-specific integrated circuit (ASIC) for massive multiuser multiple-input multiple-output (MU-MIMO) wireless systems. The ASIC implements a modified version of the large-MIMO approximate message passing algorithm (LAMA), which achieves near-optimal error-rate performance (i) under realistic channel conditions and (ii) for systems with as many users as base-station (BS) antennas. The hardware architecture supports 32 users transmitting up to 256-QAM simultaneously and in the same frequency band, and provides soft-input soft-output capabilities for iterative detection and decoding. The fabricated 28nm CMOS ASIC occupies 0.37 mm2 , achieves a throughput of 354 Mb/s, consumes 151 mW, and improves the SNR by more than 11 dB compared to existing data detectors in systems with 32 BS antennas and 32 users for realistic wireless channels. In addition, the ASIC achieves 4x higher throughput per area than a recently proposed message-passing detector.
Publications1 - 10 of 38