Hyunwoo Oh

Hello there! I’m Hyunwoo Oh.

I’m a PhD student in Computer Science at the University of California, Irvine, supervised by Prof. Mohsen Imani @ BIASLab.

My research focuses on scaling emerging AI models (e.g., multimodal, ViT, GNN) that demand massive computational resources into more affordable and efficient solutions. I specialize in architecture-level hardware-software co-design and implementation, often exploring more high- and low-level topics such as AI model optimizations and novel circuit/system designs like processing-in-memory (PIM). I love venturing beyond my core research areas and embracing challenges.

Previously, I was a Junior Engineer at Hanwha Systems, a leading Korean defense electronics company, where I worked on designing SoC FPGA-based image processors, developing RTOS, and optimizing compute kernels for heterogeneous SoCs—primarily for infrared image processing.

I earned my M.S. in Electronic Engineering from Seoul National University of Science and Technology in 2023, advised by Prof. Seung Eun Lee. My Master’s research included:

Designing flexible architectures to incorporate novel standard for real number arithmetic, specifically “posit”, into general-purpose processors.
Integrating domain-specific hardware for parallel processing into conventional general-purpose processors.

Most of my work was implemented using FPGAs, and some projects were fabricated into ASICs.

All Publications Curriculum Vitae

selected publications

2025

DAC Oral
iTask: Task-Oriented Object Detection in Resource-Constrained Environments

SungHeon Jeong, Hamza Errahmouni Barkam, Hyunwoo Oh, Hanning Chen, Tamoghno Das, Zhen Ye, and Mohsen Imani

ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, Jun 2025, pp. 1–7

Accepted and to be appear

Abstract BibTeX

Task-oriented object detection is increasingly essential for intelligent sensing applications, enabling AI systems to operate autonomously in complex, real-world environments such as autonomous driving, healthcare, and industrial automation. Conventional models often struggle with generalization, requiring vast datasets to accurately detect objects within diverse contexts. In this work, we introduce iTask, a taskoriented object detection framework that leverages large language models (LLMs) to generalize efficiently from limited samples by generating an abstract knowledge graph. This graph encapsulates essential task attributes, allowing iTask to identify objects based on high-level characteristics rather than extensive data, making it possible to adapt to complex mission requirements with minimal samples. iTask addresses the challenges of high computational cost and resource limitations in vision-language models by offering two configuration models: a distilled, task-specific vision transformer optimized for high accuracy in defined tasks, and a quantized version of the model for broader applicability across multiple tasks. Additionally, we designed a hardware acceleration circuit to support real-time processing, essential for edge devices that require low latency and efficient task execution. Our evaluations show that the task-specific configuration achieves a 15% higher accuracy over the quantized configuration in specific scenarios, while the quantized model provides robust multi-task performance. The hardware-accelerated iTask system achieves a 3.5x speedup and a 40% reduction in energy consumption compared to GPU-based implementations. These results demonstrate that iTask ’s dual-configuration approach and situational adaptability offer a scalable solution for task-specific object detection, providing robust and efficient performance in resource-constrained environments.
@inproceedings{jeong_itask_2025, address = {San Francisco, CA, USA}, title = {{iTask: Task-Oriented Object Detection in Resource-Constrained Environments}}, isbn = {}, url = {}, doi = {}, booktitle = {{ACM/IEEE Design Automation Conference (DAC)}}, author = {Jeong, SungHeon and Barkam, Hamza Errahmouni and Oh, Hyunwoo and Chen, Hanning and Das, Tamoghno and Ye, Zhen and Imani, Mohsen}, month = jun, year = {2025}, pages = {1--7}, }
FCCM Poster
A Multimodal AI Acceleration with Dynamic Pruning and Run-time Configuration

Hyun Woo Oh, Hanning Chen, Sanggeon Yun, Yang Ni, Behnam Khaleghi, Fei Wen, and Mohsen Imani

IEEE International Symposium on Field-Programmable Custom Computing Machines(FCCM), Fayetteville, AR, USA, May 2025

Abstract BibTeX HTML PDF Poster

The computational diversity of multimodal AI workloads—spanning vision transformers (ViTs), graph neural networks (GNNs), CNNs, and transformer-based NLP—poses a fundamental challenge to embedded acceleration platforms. We propose a fully integrated FPGA-based acceleration framework that addresses this heterogeneity via compile-time and runtime configurability. Our system introduces a reconfigurable processing unit (RPU) capable of executing dense and sparse matrix operations (DDMM, SpMM, SDDMM), a scalable top-k pruning engine for ViTs, and a domain-specific compiler for hardware-software co-design. The architecture supports real-time configuration without reloading bitstreams, enabling unified deployment across tasks. Implementations on Xilinx U50 and ZCU104 demonstrate up to 22.57× and 6.86× latency reductions versus RTX 4090 and Jetson Orin Nano, respectively, validating the design’s efficiency for real-time, resource-limited environments.
@inproceedings{oh_multimodal_2025, address = {Fayetteville, AR, USA}, title = {{A Multimodal AI Acceleration with Dynamic Pruning and Run-time Configuration}}, isbn = {}, url = {}, doi = {}, booktitle = {{IEEE International Symposium on Field-Programmable Custom Computing Machines(FCCM)}}, author = {Oh, Hyun Woo and Chen, Hanning and Yun, Sanggeon and Ni, Yang and Khaleghi, Behnam and Wen, Fei and Imani, Mohsen}, month = may, year = {2025}, }

2024

RTCSA Oral
A Compact Real-Time Thermal Imaging System Based on Heterogeneous System-on-Chip

Hyun Woo Oh, Cheol-Ho Choi, Jeong Woo Cha, Hyunmin Choi, Jung-Ho Shin, and Joon Hwan Han

IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Sokcho, Korea, Aug 2024, pp. 97–107

Acceptance rate = 38.10% (16 out of 42)

Abstract BibTeX HTML

This paper presents a real-time embedded thermal imaging system architecture for compact, energy-efficient, high-quality imaging utilizing heterogeneous system-on-chip and uncooled infrared focal plane arrays (IRFPAs). In contrast to previous systems that organized separate devices for complex image processing, our system provides integrated image processing support for robust sensor-to-surveillance. We organized the image processing architecture into two algorithm stacks: a non-uniformity correction stack to mitigate the distinctive noise vulnerability of uncooled IRFPAs and an image enhancement stack, which includes contrast enhancement and frame-level temporal noise filters. We optimized the algorithms for domain-specific factors, including asymmetric multiprocessing (AMP), cache organization, single instruction multiple data (SIMD) instructions, and very long instruction word (VLIW) architectures. The implementation on TI TDA3x SoC demonstrates that our system can process 640×480, 60 frames per second (FPS) videos at a peak core load of 57.5% while consuming power less than 2.2 W for the entire system, denoting the possibility of processing the 1280×1024, 30 FPS videos from the state-of-the-art IRFPAs.
@inproceedings{oh_compact_2024, address = {Sokcho, Korea}, title = {A {Compact} {Real}-{Time} {Thermal} {Imaging} {System} {Based} on {Heterogeneous} {System}-on-{Chip}}, isbn = {979-8-3503-8795-7}, doi = {10.1109/RTCSA62462.2024.00023}, booktitle = {{IEEE} {International} {Conference} on {Embedded} and {Real}-{Time} {Computing} {Systems} and {Applications} ({RTCSA})}, publisher = {IEEE}, author = {Oh, Hyun Woo and Choi, Cheol-Ho and Cha, Jeong Woo and Choi, Hyunmin and Shin, Jung-Ho and Han, Joon Hwan}, month = aug, year = {2024}, pages = {97--107}, }
TCAS-II
DL-Sort: A Hybrid Approach to Scalable Hardware-Accelerated Fully-Streaming Sorting

Hyun Woo Oh, Joungmin Park, and Seung Eun Lee

IEEE Transactions on Circuits and Systems II: Express Briefs, May 2024, pp. 2549–2553

Invited from IEEE ISCAS 2024 (16 out of 1497 = 1.07%).

Abstract BibTeX HTML PDF Code

Designing high-performance hardware sorter for resource-constrained systems is challenging due to physical limitations and the need to balance streaming bandwidth with memory throughput. This paper introduces a novel, scalable hardware sorter architecture with fully-streaming support and an accompanying RTL generator to provide versatile, energy-efficient hardware acceleration. Our solution employs a dual-layer architecture consisting of a parallel one-way linear insertion sorter (OLIS) for bandwidth optimization and a cyclic bitonic merge network (CBMN) for a compact, high-throughput implementation. Furthermore, we developed the RTL generator written in Chisel to provide the agile implementation of the scalable architecture. Experimental results targeting the Xilinx XVU37P-FSVH2892-2L-E FPGA show that our design achieves up to 126.26% increase in throughput and 68.46% decrease in latency, with an area increment of no more than 132.94% for LUTs, and a decrement of up to 79.84% for flip-flops, compared to state-of-the-art streaming sorter.
@article{oh_dl-sort_2024, title = {{DL-Sort}: {A} {Hybrid} {Approach} {to} {Scalable} {Hardware}-{Accelerated} {Fully}-{Streaming} {Sorting}}, volume = {71}, number = {5}, issn = {1549-7747}, url = {https://ieeexplore.ieee.org/document/10472626}, doi = {10.1109/TCSII.2024.3377255}, journal = {IEEE Transactions on Circuits and Systems II: Express Briefs}, author = {Oh, Hyun Woo and Park, Joungmin and and Lee, Seung Eun}, month = may, year = {2024}, pages = {2549--2553}, }

2023

ISLPED Oral
RF2P: A Lightweight RISC Processor Optimized for Rapid Migration from IEEE-754 to Posit

Hyun Woo Oh, Seongmo An, Won Sik Jeong, and Seung Eun Lee

ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), Vienna, Austria, Aug 2023, pp. 1–6

Acceptance rate = 24.13% (35 out of 145)

Abstract BibTeX HTML PDF Slides

This paper presents a lightweight processor and evaluation platform for migrating from IEEE-754 to posit arithmetic, with an optimized posit arithmetic unit (PAU) supporting existing floating-point instructions. The PAU features a reconfigurable divider architecture for diverse operating conditions and lightweight square root logic. The platform includes a posit-optimized compiler, divider generator, JTAG environment builder, and programmable logic controller. The experimental results demonstrate the successful execution of legacy IEEE-754 code with a small additional workload and up to 60.09 times the performance improvement through hardware acceleration. Additionally, the PAU and divider consume 11.00% and 57.87% fewer LUTs, respectively, compared to the best prior works.
@inproceedings{oh_rf2p_2023, address = {Vienna, Austria}, title = {{RF2P}: {A} {Lightweight} {RISC} {Processor} {Optimized} for {Rapid} {Migration} from {IEEE}-754 to {Posit}}, isbn = {979-8-3503-1175-4}, url = {https://ieeexplore.ieee.org/document/10244582/}, doi = {10.1109/ISLPED58423.2023.10244582}, booktitle = {{ACM}/{IEEE} {International} {Symposium} on {Low} {Power} {Electronics} and {Design} ({ISLPED})}, publisher = {IEEE}, author = {Oh, Hyun Woo and An, Seongmo and Jeong, Won Sik and Lee, Seung Eun}, month = aug, year = {2023}, pages = {1--6}, }