publications | Hyunwoo Oh

List of peer-reviewed publications in reverse chronical order.

Types
Int'l Conference Int'l Journal

2025

DAC Oral
iTask: Task-Oriented Object Detection in Resource-Constrained Environments

SungHeon Jeong, Hamza Errahmouni Barkam, Hyunwoo Oh, Hanning Chen, Tamoghno Das, Zhen Ye, and Mohsen Imani

ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, Jun 2025, pp. 1–7

Accepted and to be appear

Abstract BibTeX

Task-oriented object detection is increasingly essential for intelligent sensing applications, enabling AI systems to operate autonomously in complex, real-world environments such as autonomous driving, healthcare, and industrial automation. Conventional models often struggle with generalization, requiring vast datasets to accurately detect objects within diverse contexts. In this work, we introduce iTask, a taskoriented object detection framework that leverages large language models (LLMs) to generalize efficiently from limited samples by generating an abstract knowledge graph. This graph encapsulates essential task attributes, allowing iTask to identify objects based on high-level characteristics rather than extensive data, making it possible to adapt to complex mission requirements with minimal samples. iTask addresses the challenges of high computational cost and resource limitations in vision-language models by offering two configuration models: a distilled, task-specific vision transformer optimized for high accuracy in defined tasks, and a quantized version of the model for broader applicability across multiple tasks. Additionally, we designed a hardware acceleration circuit to support real-time processing, essential for edge devices that require low latency and efficient task execution. Our evaluations show that the task-specific configuration achieves a 15% higher accuracy over the quantized configuration in specific scenarios, while the quantized model provides robust multi-task performance. The hardware-accelerated iTask system achieves a 3.5x speedup and a 40% reduction in energy consumption compared to GPU-based implementations. These results demonstrate that iTask ’s dual-configuration approach and situational adaptability offer a scalable solution for task-specific object detection, providing robust and efficient performance in resource-constrained environments.
@inproceedings{jeong_itask_2025, address = {San Francisco, CA, USA}, title = {{iTask: Task-Oriented Object Detection in Resource-Constrained Environments}}, isbn = {}, url = {}, doi = {}, booktitle = {{ACM/IEEE Design Automation Conference (DAC)}}, author = {Jeong, SungHeon and Barkam, Hamza Errahmouni and Oh, Hyunwoo and Chen, Hanning and Das, Tamoghno and Ye, Zhen and Imani, Mohsen}, month = jun, year = {2025}, pages = {1--7}, }
FCCM Poster
A Multimodal AI Acceleration with Dynamic Pruning and Run-time Configuration

Hyun Woo Oh, Hanning Chen, Sanggeon Yun, Yang Ni, Behnam Khaleghi, Fei Wen, and Mohsen Imani

IEEE International Symposium on Field-Programmable Custom Computing Machines(FCCM), Fayetteville, AR, USA, May 2025

Abstract BibTeX HTML PDF Poster

The computational diversity of multimodal AI workloads—spanning vision transformers (ViTs), graph neural networks (GNNs), CNNs, and transformer-based NLP—poses a fundamental challenge to embedded acceleration platforms. We propose a fully integrated FPGA-based acceleration framework that addresses this heterogeneity via compile-time and runtime configurability. Our system introduces a reconfigurable processing unit (RPU) capable of executing dense and sparse matrix operations (DDMM, SpMM, SDDMM), a scalable top-k pruning engine for ViTs, and a domain-specific compiler for hardware-software co-design. The architecture supports real-time configuration without reloading bitstreams, enabling unified deployment across tasks. Implementations on Xilinx U50 and ZCU104 demonstrate up to 22.57× and 6.86× latency reductions versus RTX 4090 and Jetson Orin Nano, respectively, validating the design’s efficiency for real-time, resource-limited environments.
@inproceedings{oh_multimodal_2025, address = {Fayetteville, AR, USA}, title = {{A Multimodal AI Acceleration with Dynamic Pruning and Run-time Configuration}}, isbn = {}, url = {}, doi = {}, booktitle = {{IEEE International Symposium on Field-Programmable Custom Computing Machines(FCCM)}}, author = {Oh, Hyun Woo and Chen, Hanning and Yun, Sanggeon and Ni, Yang and Khaleghi, Behnam and Wen, Fei and Imani, Mohsen}, month = may, year = {2025}, }
Electronics
EOS: Edge-Based Operation Skip Scheme for Real-Time Object Detection Using Viola-Jones Classifier

Cheol-Ho Choi, Joonhwan Han, Hyun Woo Oh, Jeongwoo Cha, and Jungho Shin

Electronics, May 2025

Abstract BibTeX HTML

Machine learning-based object detection systems are preferred due to their cost-effectiveness compared to deep learning approaches. Among machine learning methods, the Viola-Jones classifier stands out for its reasonable accuracy and efficient resource utilization. However, as the number of classification iterations increases or the resolution of the input image increases, the detection processing speed may decrease. To address the detection speed issue related to input image resolution, an improved edge component calibration method is applied. Additionally, an edge-based operation skip scheme is proposed to overcome the detection processing speed problem caused by the number of classification iterations. Our experiments using the FDDB public dataset show that our method reduces classification iterations by 24.6157% to 84.1288% compared to conventional methods, except for our previous study. Importantly, our method maintains detection accuracy while reducing classification iterations. This result implies that our method can realize almost real-time object detection when implemented on field-programmable gate arrays.
@article{choi_eos_2025, title = {{EOS: Edge-Based Operation Skip Scheme for Real-Time Object Detection Using Viola-Jones Classifier}}, issn = {2079-9292}, url = {https://www.mdpi.com/2079-9292/14/2/397}, doi = {10.3390/electronics14020397}, author = {Choi, Cheol-Ho and Han, Joonhwan and Oh, Hyun Woo and Cha, Jeongwoo and Shin, Jungho}, journal = {Electronics}, vol = {14}, year = {2025}, no = {397}, }

2024

ICCE-Asia
Algorithm for LWIR Thermal Imaging Camera with Minimal Mechanical Shutter Utilization

Taehyun Kim, Joonhwan Han, Jeongwoo Cha, Hyunmin Choi, Jungho Shin, Eunchong Kim, Hyun Woo Oh, Cheol-Ho Choi, Seongtaek Hong, and Taehyung Kim

IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), Danang, Vietnam, Nov 2024, pp. 1–4

Abstract BibTeX HTML

Uncooled LWIR (Long-Wave InfraRed) thermal imaging cameras are characterized by non-uniformity. because infrared detectors exhibit nonlinear characteristics depending on the environmental temperature. In this paper, we propose a method to smoothly transition between a method of correcting non-uniformity using a shutter one time when the thermal imaging camera is not stable at the start-up, and a method of correcting non-uniformity by performing conventional NUC (Non-Uniformity Correction) when thermal image camera is stabilized. The proposed method was confirmed to have similar performance to the conventional method in which the thermal imaging camera uses the shutter several times during initial start-up. The conventional method closes the shutter multiple times to correct non-uniformity, which obscures information necessary for driving. In contrast, the proposed method closes the shutter only one time during initial start-up to correct non-uniformity, which does not obscure information necessary for driving. Therefore, it is suitable for auxiliary systems used in autonomous driving platforms.
@inproceedings{kim_algorithm_2024, address = {Danang, Vietnam}, title = {{Algorithm for LWIR Thermal Imaging Camera with Minimal Mechanical Shutter Utilization}}, isbn = {979-8-3315-3083-9}, doi = {10.1109/ICCE-Asia63397.2024.10773806}, author = {Kim, Taehyun and Han, Joonhwan and Cha, Jeongwoo and Choi, Hyunmin and Shin, Jungho and Kim, Eunchong and Oh, Hyun Woo and Choi, Cheol-Ho and Hong, Seongtaek and Kim, Taehyung}, booktitle = {{IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia)}}, year = {2024}, month = nov, pages = {1--4}, }
Sensors
Contrast Enhancement Method using Region-based Dynamic Clipping Technique for LWIR-based Thermal Camera of Night Vision Systems

Cheol-Ho Choi, Joonhwan Han, Jeongwoo Cha, Hyunmin Choi, Jungho Shin, Taehyun Kim, and Hyun Woo Oh

Sensors, Jun 2024, pp. 3829

Abstract BibTeX HTML PDF

In the autonomous driving industry, there is a growing trend to employ long-wave infrared (LWIR)-based uncooled thermal-imaging cameras, capable of robustly collecting data even in extreme environments. Consequently, both industry and academia are actively researching contrast-enhancement techniques to improve the quality of LWIR-based thermal-imaging cameras. However, most research results only showcase experimental outcomes using mass-produced products that already incorporate contrast-enhancement techniques. Put differently, there is a lack of experimental data on contrast enhancement post-non-uniformity (NUC) and temperature compensation (TC) processes, which generate the images seen in the final products. To bridge this gap, we propose a histogram equalization (HE)-based contrast enhancement method that incorporates a region-based clipping technique. Furthermore, we present experimental results on the images obtained after applying NUC and TC processes. We simultaneously conducted visual and qualitative performance evaluations on images acquired after NUC and TC processes. In the visual evaluation, it was confirmed that the proposed method improves image clarity and contrast ratio compared to conventional HE-based methods, even in challenging driving scenarios such as tunnels. In the qualitative evaluation, the proposed method demonstrated upper-middle-class rankings in both image quality and processing speed metrics. Therefore, our proposed method proves to be effective for the essential contrast enhancement process in LWIR-based uncooled thermal-imaging cameras intended for autonomous driving platforms.
@article{choi_contrast_2024, title = {{Contrast} {Enhancement} {Method} using {Region}-based {Dynamic} {Clipping} {Technique} for {LWIR}-based {Thermal} {Camera} of {Night} {Vision} {Systems}}, volume = {24}, issn = {1424-8220}, url = {https://www.mdpi.com/1424-8220/24/12/3829}, doi = {10.3390/s24123829}, number = {12}, journal = {Sensors}, author = {Choi, Cheol-Ho and Han, Joonhwan and Cha, Jeongwoo and Choi, Hyunmin and Shin, Jungho and Kim, Taehyun and Oh, Hyun Woo}, month = jun, year = {2024}, pages = {3829}, }
RTCSA Oral
A Compact Real-Time Thermal Imaging System Based on Heterogeneous System-on-Chip

Hyun Woo Oh, Cheol-Ho Choi, Jeong Woo Cha, Hyunmin Choi, Jung-Ho Shin, and Joon Hwan Han

IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Sokcho, Korea, Aug 2024, pp. 97–107

Acceptance rate = 38.10% (16 out of 42)

Abstract BibTeX HTML

This paper presents a real-time embedded thermal imaging system architecture for compact, energy-efficient, high-quality imaging utilizing heterogeneous system-on-chip and uncooled infrared focal plane arrays (IRFPAs). In contrast to previous systems that organized separate devices for complex image processing, our system provides integrated image processing support for robust sensor-to-surveillance. We organized the image processing architecture into two algorithm stacks: a non-uniformity correction stack to mitigate the distinctive noise vulnerability of uncooled IRFPAs and an image enhancement stack, which includes contrast enhancement and frame-level temporal noise filters. We optimized the algorithms for domain-specific factors, including asymmetric multiprocessing (AMP), cache organization, single instruction multiple data (SIMD) instructions, and very long instruction word (VLIW) architectures. The implementation on TI TDA3x SoC demonstrates that our system can process 640×480, 60 frames per second (FPS) videos at a peak core load of 57.5% while consuming power less than 2.2 W for the entire system, denoting the possibility of processing the 1280×1024, 30 FPS videos from the state-of-the-art IRFPAs.
@inproceedings{oh_compact_2024, address = {Sokcho, Korea}, title = {A {Compact} {Real}-{Time} {Thermal} {Imaging} {System} {Based} on {Heterogeneous} {System}-on-{Chip}}, isbn = {979-8-3503-8795-7}, doi = {10.1109/RTCSA62462.2024.00023}, booktitle = {{IEEE} {International} {Conference} on {Embedded} and {Real}-{Time} {Computing} {Systems} and {Applications} ({RTCSA})}, publisher = {IEEE}, author = {Oh, Hyun Woo and Choi, Cheol-Ho and Cha, Jeong Woo and Choi, Hyunmin and Shin, Jung-Ho and Han, Joon Hwan}, month = aug, year = {2024}, pages = {97--107}, }

AICAS Oral

Fast Object Detection Algorithm using Edge-based Operation Skip Scheme with Viola-Jones Method

Cheol-Ho Choi, Joonhwan Han, Jeongwoo Cha, Jungho Shin, and Hyun Woo Oh

IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS), Abu Dhabi, UAE, Apr 2024, pp. 199–203

HTML

@inproceedings{choi_fast_2024,
  address = {Abu Dhabi, UAE},
  title = {Fast Object Detection Algorithm using Edge-based Operation Skip Scheme with Viola-Jones Method},
  booktitle = {{IEEE} {International} {Conference} on {Artificial} {Intelligence} {Circuits} and {Systems} ({AICAS})},
  publisher = {IEEE},
  author = {Choi, Cheol-Ho and Han, Joonhwan and Cha, Jeongwoo and Shin, Jungho and Oh, Hyun Woo},
  month = apr,
  year = {2024},
  pages = {199--203},
}

TCAS-II
DL-Sort: A Hybrid Approach to Scalable Hardware-Accelerated Fully-Streaming Sorting

Hyun Woo Oh, Joungmin Park, and Seung Eun Lee

IEEE Transactions on Circuits and Systems II: Express Briefs, May 2024, pp. 2549–2553

Invited from IEEE ISCAS 2024 (16 out of 1497 = 1.07%).

Abstract BibTeX HTML PDF Code

Designing high-performance hardware sorter for resource-constrained systems is challenging due to physical limitations and the need to balance streaming bandwidth with memory throughput. This paper introduces a novel, scalable hardware sorter architecture with fully-streaming support and an accompanying RTL generator to provide versatile, energy-efficient hardware acceleration. Our solution employs a dual-layer architecture consisting of a parallel one-way linear insertion sorter (OLIS) for bandwidth optimization and a cyclic bitonic merge network (CBMN) for a compact, high-throughput implementation. Furthermore, we developed the RTL generator written in Chisel to provide the agile implementation of the scalable architecture. Experimental results targeting the Xilinx XVU37P-FSVH2892-2L-E FPGA show that our design achieves up to 126.26% increase in throughput and 68.46% decrease in latency, with an area increment of no more than 132.94% for LUTs, and a decrement of up to 79.84% for flip-flops, compared to state-of-the-art streaming sorter.
@article{oh_dl-sort_2024, title = {{DL-Sort}: {A} {Hybrid} {Approach} {to} {Scalable} {Hardware}-{Accelerated} {Fully}-{Streaming} {Sorting}}, volume = {71}, number = {5}, issn = {1549-7747}, url = {https://ieeexplore.ieee.org/document/10472626}, doi = {10.1109/TCSII.2024.3377255}, journal = {IEEE Transactions on Circuits and Systems II: Express Briefs}, author = {Oh, Hyun Woo and Park, Joungmin and and Lee, Seung Eun}, month = may, year = {2024}, pages = {2549--2553}, }

2023

IEEE Access
Cell-Based Refinement Processor Utilizing Disparity Characteristics of Road Environment for SGM-Based Stereo Vision Systems

Cheol-Ho Choi, Hyun Woo Oh, Joonhwan Han, and Jungho Shin

IEEE Access, Dec 2023, pp. 138122–138140

Abstract BibTeX HTML PDF

Embedded stereo vision systems based on traditional approaches often require a disparity refinement process to enhance image quality. Weighted median filter (WMF)-based processors are commonly employed for their excellent refinement performance. However, when implemented on a field-programmable gate array (FPGA), WMF-based processors face a trade-off between hardware resource utilization and refinement performance. To address this trade-off, we previously proposed a new disparity refinement processor based on the hybrid max-median filter (HMMF). However, our earlier work did not guarantee flawless operation in large occluded and texture-less regions, particularly in areas with numerous holes. In order to overcome this limitation of conventional processors, we proposed a cell-based disparity refinement processor. This processor extends our previous HMMF-based disparity refinement processor. To evaluate its refinement performance, we conducted experiments using four types of publicly available stereo datasets. When comparing refinement performance, our proposed processor outperforms conventional processors when using the KITTI 2012 and 2015 stereo benchmark datasets. Additionally, the results demonstrate that our proposed processor exhibits superior refinement performance when applied to the Cityscapes and StereoDriving datasets in comparison to conventional processors. Furthermore, when considering hardware resource utilization, our proposed processor demonstrates lower resource requirements than conventional processors when implemented on an FPGA. Therefore, our proposed disparity refinement processor is well-suited for the disparity refinement process in stereo vision systems that require cost-effectiveness and high performance.
@article{choi_cell-based_2023, title = {Cell-{Based} {Refinement} {Processor} {Utilizing} {Disparity} {Characteristics} of {Road} {Environment} for {SGM}-{Based} {Stereo} {Vision} {Systems}}, volume = {11}, issn = {2169-3536}, url = {https://ieeexplore.ieee.org/document/10339275}, doi = {10.1109/ACCESS.2023.3338649}, journal = {IEEE Access}, author = {Choi, Cheol-Ho and Oh, Hyun Woo and Han, Joonhwan and Shin, Jungho}, month = dec, year = {2023}, pages = {138122--138140}, }
DSD Oral
An SoC FPGA-based Integrated Real-time Image Processor for Uncooled Infrared Focal Plane Array

Hyun Woo Oh, Cheol-Ho Choi, Jeong Woo Cha, Hyunmin Choi, Joon Hwan Han, and Jung-Ho Shin

Euromicro Conference on Digital System Design (DSD), Durres, Albania, Sep 2023, pp. 660–668

Long Presentation (48 out of 159)

Abstract BibTeX HTML

This paper presents an integrated image processor architecture designed for real-time interfacing and processing of high-resolution thermal video obtained from an uncooled infrared focal plane array (IRFPA) utilizing a modern system-onchip field-programmable gate array (SoC FPGA). Our processor provides a one-chip solution for incorporating non-uniformity correction (NUC) algorithms and contrast enhancement methods (CEM) to be performed seamlessly. We have employed NUC algorithms that utilize multiple coefficients to ensure robust image quality, free from ghosting effects and blurring. These algorithms include polynomial modeling-based thermal drift compensation (TDC), two-point correction (TPC), and run-time discrete flat field correction (FFC). To address the memory bottlenecks originating from the parallel execution of NUC algorithms in real-time, we designed accelerators and parallel caching modules for pixel-wise algorithms based on a multi-parameter polynomial expression. Furthermore, we designed a specialized accelerator architecture to minimize the interrupted time for run-time FFC. The implementation on the XC7Z020CLG400 SoC FPGA with the QuantumRed VR thermal module demonstrates that our image processing module achieves a throughput of 60 frames per second (FPS) when processing 14-bit 640×480 resolution infrared video acquired from an uncooled IRFPA.
@inproceedings{oh_soc_2023, address = {Durres, Albania}, title = {An {SoC} {FPGA}-based {Integrated} {Real}-time {Image} {Processor} for {Uncooled} {Infrared} {Focal} {Plane} {Array}}, isbn = {979-8-3503-4419-6}, url = {https://ieeexplore.ieee.org/document/10456855}, doi = {10.1109/DSD60849.2023.00095}, booktitle = {{Euromicro} {Conference} on {Digital} {System} {Design} ({DSD})}, publisher = {IEEE}, author = {Oh, Hyun Woo and Choi, Cheol-Ho and Cha, Jeong Woo and Choi, Hyunmin and Han, Joon Hwan and Shin, Jung-Ho}, month = sep, year = {2023}, pages = {660--668}, }
DSD Oral
Disparity Refinement Processor Architecture utilizing Horizontal and Vertical Characteristics for Stereo Vision Systems

Cheol-Ho Choi, and Hyun Woo Oh

Euromicro Conference on Digital System Design (DSD), Durres, Albania, Sep 2023, pp. 220–226

Long Presentation (48 out of 159)

Abstract BibTeX HTML

In embedded stereo vision systems based on semiglobal matching, the matching accuracy of the initial disparity map can be degraded because of various factors. To solve this problem, weighted median-based disparity refinement hardware architectures are utilized to improve the matching accuracy. However, for the conventional hardware architectures, there is a trade-off between hardware resource utilization and refinement performance when they are implemented on a field programmable gate array (FPGA). Therefore, in this paper, we propose a hybrid max-median filter and its hardware architecture to improve the refinement performance and reduce hardware resource utilization. To evaluate the refinement performance, we used two public stereo datasets. When using the various window sizes for KITTI 2012 and 2015 stereo benchmark datasets, the proposed hardware architecture showed better matching accuracy performance compared with the conventional hardware architectures. In terms of the hardware resource utilization, when implemented on an FPGA, the proposed hardware architecture has low requirements for all types of hardware resources. That is, the proposed hardware architecture overcomes the trade-off between hardware resource utilization and refinement performance.
@inproceedings{choi_disparity_2023, address = {Durres, Albania}, title = {Disparity {Refinement} {Processor} {Architecture} utilizing {Horizontal} and {Vertical} {Characteristics} for {Stereo} {Vision} {Systems}}, isbn = {979-8-3503-4419-6}, url = {https://ieeexplore.ieee.org/document/10456793}, doi = {10.1109/DSD60849.2023.00040}, booktitle = {{Euromicro} {Conference} on {Digital} {System} {Design} ({DSD})}, publisher = {IEEE}, author = {Choi, Cheol-Ho and Oh, Hyun Woo}, month = sep, year = {2023}, pages = {220--226}, }
ISLPED Oral
RF2P: A Lightweight RISC Processor Optimized for Rapid Migration from IEEE-754 to Posit

Hyun Woo Oh, Seongmo An, Won Sik Jeong, and Seung Eun Lee

ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), Vienna, Austria, Aug 2023, pp. 1–6

Acceptance rate = 24.13% (35 out of 145)

Abstract BibTeX HTML PDF Slides

This paper presents a lightweight processor and evaluation platform for migrating from IEEE-754 to posit arithmetic, with an optimized posit arithmetic unit (PAU) supporting existing floating-point instructions. The PAU features a reconfigurable divider architecture for diverse operating conditions and lightweight square root logic. The platform includes a posit-optimized compiler, divider generator, JTAG environment builder, and programmable logic controller. The experimental results demonstrate the successful execution of legacy IEEE-754 code with a small additional workload and up to 60.09 times the performance improvement through hardware acceleration. Additionally, the PAU and divider consume 11.00% and 57.87% fewer LUTs, respectively, compared to the best prior works.
@inproceedings{oh_rf2p_2023, address = {Vienna, Austria}, title = {{RF2P}: {A} {Lightweight} {RISC} {Processor} {Optimized} for {Rapid} {Migration} from {IEEE}-754 to {Posit}}, isbn = {979-8-3503-1175-4}, url = {https://ieeexplore.ieee.org/document/10244582/}, doi = {10.1109/ISLPED58423.2023.10244582}, booktitle = {{ACM}/{IEEE} {International} {Symposium} on {Low} {Power} {Electronics} and {Design} ({ISLPED})}, publisher = {IEEE}, author = {Oh, Hyun Woo and An, Seongmo and Jeong, Won Sik and Lee, Seung Eun}, month = aug, year = {2023}, pages = {1--6}, }
IEEE Access
The Design of Optimized RISC Processor for Edge Artificial Intelligence Based on Custom Instruction Set Extension

Hyun Woo Oh, and Seung Eun Lee

IEEE Access, May 2023, pp. 49409–49421

Abstract BibTeX HTML PDF Code

Edge computing is becoming increasingly popular in artificial intelligence (AI) application development due to the benefits of local execution. One widely used approach to overcome hardware limitations in edge computing is heterogeneous computing, which combines a general-purpose processor with a domain-specific AI processor. However, this approach can be inefficient due to the communication overhead resulting from the complex communication protocol. To avoid communication overhead, the concept of an application-specific instruction set processor based on customizable instruction set architecture (ISA) has emerged. By integrating the AI processor into the processor core, on-chip communication replaces the complex communication protocol. Further, custom instruction set extension (ISE) reduces the number of instructions needed to execute AI applications. In this paper, we propose a uniprocessor system architecture for lightweight AI systems. First, we define the custom ISE to integrate the AI processor and GPP into a single processor, minimizing communication overhead. Next, we designed the processor based on the integrated core architecture, including the base core and the AI core, and implemented the processor on an FPGA. Finally, we evaluated the proposed architecture through simulation and implementation of the processor. The results show that the designed processor consumed 6.62% more lookup tables and 74% fewer flip-flops while achieving up to 193.88 times enhanced throughput performance and 52.75 times the energy efficiency compared to the previous system.
@article{oh_design_2023, title = {The {Design} of {Optimized} {RISC} {Processor} for {Edge} {Artificial} {Intelligence} {Based} on {Custom} {Instruction} {Set} {Extension}}, volume = {11}, issn = {2169-3536}, url = {https://ieeexplore.ieee.org/document/10124773/}, doi = {10.1109/ACCESS.2023.3276411}, journal = {IEEE Access}, author = {Oh, Hyun Woo and Lee, Seung Eun}, month = may, year = {2023}, pages = {49409--49421}, }

2022

ISOCC Poster
Evaluation of Posit Arithmetic on Machine Learning based on Approximate Exponential Functions

Hyun Woo Oh, Won Sik Jeong, and Seung Eun Lee

International SoC Design Conference (ISOCC), Gangneung, Korea, Oct 2022, pp. 358–359

Abstract BibTeX HTML PDF Poster

Recent advances in semiconductor technology lead to ongoing applications to adopt complex techniques based on neural networks. In line with this trend, the concept of optimizing real number arithmetic has been raised. In this paper, we evaluate the performance of the noble number system named posit on neural networks by analyzing the execution of approximate exponential functions, which is fundamental to several activation functions, with posit32 and float32. To implement the functions with posit arithmetic, we designed the software posit library consisting of basic arithmetic operations and conversion operations from/to C standard data types. The result shows that posit arithmetic reduces the average relative error rate by up to 87.12% on the exponential function.
@inproceedings{oh_evaluation_2022, address = {Gangneung, Korea}, title = {Evaluation of {Posit} {Arithmetic} on {Machine} {Learning} based on {Approximate} {Exponential} {Functions}}, isbn = {978-1-66545-971-6}, url = {https://ieeexplore.ieee.org/document/10031524/}, doi = {10.1109/ISOCC56007.2022.10031524}, booktitle = {{International} {SoC} {Design} {Conference} ({ISOCC})}, publisher = {IEEE}, author = {Oh, Hyun Woo and Jeong, Won Sik and Lee, Seung Eun}, month = oct, year = {2022}, pages = {358--359}, }
JICCE
An Edge AI Device based Intelligent Transportation System

Youngwoo Jeong, Hyun Woo Oh, Soohee Kim, and Seung Eun Lee

Journal of Information and Communication Convergence Engineering, Sep 2022, pp. 166–173

Abstract BibTeX HTML

Recently, studies have been conducted on intelligent transportation systems (ITS) that provide safety and convenience to humans. Systems that compose the ITS adopt architectures that applied the cloud computing which consists of a highperformance general-purpose processor or graphics processing unit. However, an architecture that only used the cloud computing requires a high network bandwidth and consumes much power. Therefore, applying edge computing to ITS is essential for solving these problems. In this paper, we propose an edge artificial intelligence (AI) device based ITS. Edge AI which is applicable to various systems in ITS has been applied to license plate recognition. We implemented edge AI on a fieldprogrammable gate array (FPGA). The accuracy of the edge AI for license plate recognition was 0.94. Finally, we synthesized the edge AI logic with Magnachip/Hynix 180nm CMOS technology and the power consumption measured using the Synopsys’s design compiler tool was 482.583mW.
@article{jeong_edge_2022, title = {An {Edge} {AI} {Device} based {Intelligent} {Transportation} {System}}, volume = {20}, issn = {2234-8883}, doi = {10.56977/jicce.2022.20.3.166}, number = {3}, journal = {Journal of Information and Communication Convergence Engineering}, author = {Jeong, Youngwoo and Oh, Hyun Woo and Kim, Soohee and Lee, Seung Eun}, month = sep, year = {2022}, pages = {166--173}, }
ICFICE
Intelligent Transportation System based on an Edge AI

Young Woo Jeong, Hyun Woo Oh, Su Yeon Jang, and Seung Eun Lee

International Conference on Future Information & Communication Engineering, Jeju, Korea, Jan 2022, pp. 202–206

Abstract BibTeX HTML

An intelligent transportation system (ITS) is a future system that combines various technologies to provide safety and convenience to humans. In order to implement ITS, previous systems applied an architecture that contains a large number of data centers with a high-performance general-purpose processor and graphics processing unit to collect the information of vehicles. However, this architecture not only requires a high network bandwidth but also causes the system to decrease power efficiency and makes security weak. In this paper, we propose an ITS based on an edge AI device which solves problems with the existing structure. We applied the edge AI device which is applicable to various systems in ITS to license plate recognition and the highest accuracy was 0.94. We implemented the edge AI device on a field programmable gate array (FPGA) and verified the feasibility of the entire system with the proposed edge AI device.
@inproceedings{jeong_intelligent_2022, address = {Jeju, Korea}, title = {Intelligent {Transportation} {System} based on an {Edge} {AI}}, volume = {13}, url = {https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE11036311}, booktitle = {{International} {Conference} on {Future} {Information} \& {Communication} {Engineering}}, publisher = {The Korea Institute of Information and Communication Engineering}, author = {Jeong, Young Woo and Oh, Hyun Woo and Jang, Su Yeon and Lee, Seung Eun}, month = jan, year = {2022}, pages = {202--206}, }
ICCE
A Local Interconnect Network Controller for Resource-Constrained Automotive Devices

Kwonneung Cho, Hyun Woo Oh, Jeongeun Kim, Young Woo Jeong, and Seung Eun Lee

IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, Jan 2022, pp. 1–3

Abstract BibTeX HTML

As the amount of data for automotive systems is increased, a dedicated communication controller for in-vehicle networks is required. This paper proposes a local interconnect network (LIN) controller for resource-constrained devices. The designed LIN controller efficiently reduces the workload of target devices by processing the LIN frame header, data response, and protocol errors. To demonstrate the feasibility of design, a Cortex-M0 is employed as a main processor and connected to the LIN controller. We implemented a LIN node by programming the processor, and the functionality of LIN controller was verified with a LIN frame analyzer and hardware scope. In addition, we analyzed the affection of communication loads on the processor and evaluated the benefits of LIN controller.
@inproceedings{cho_local_2022, address = {Las Vegas, NV, USA}, title = {A {Local} {Interconnect} {Network} {Controller} for {Resource}-{Constrained} {Automotive} {Devices}}, isbn = {978-1-66544-154-4}, url = {https://ieeexplore.ieee.org/document/9730493/}, doi = {10.1109/ICCE53296.2022.9730493}, booktitle = {{IEEE} {International} {Conference} on {Consumer} {Electronics} ({ICCE})}, publisher = {IEEE}, author = {Cho, Kwonneung and Oh, Hyun Woo and Kim, Jeongeun and Jeong, Young Woo and Lee, Seung Eun}, month = jan, year = {2022}, pages = {1--3}, }

2021

Micromachines
A Multi-Core Controller for an Embedded AI System Supporting Parallel Recognition

Suyeon Jang, Hyun Woo Oh, Young Hyun Yoon, Dong Hyun Hwang, Won Sik Jeong, and Seung Eun Lee

Micromachines, Jul 2021, pp. 852

Abstract BibTeX HTML PDF

Recent advances in artiﬁcial intelligence (AI) technology encourage the adoption of AI systems for various applications. In most deployments, AI-based computing systems adopt the architecture in which the central server processes most of the data. This characteristic makes the system use a high amount of network bandwidth and can cause security issues. In order to overcome these issues, a new AI model called federated learning was presented. Federated learning adopts an architecture in which the clients take care of data training and transmit only the trained result to the central server. As the data training from the client abstracts and reduces the original data, the system operates with reduced network resources and reinforced data security. A system with federated learning supports a variety of client systems. To build an AI system with resource-limited client systems, composing the client system with multiple embedded AI processors is valid. For realizing the system with this architecture, introducing a controller to arbitrate and utilize the AI processors becomes a stringent requirement. In this paper, we propose an embedded AI system for federated learning that can be composed ﬂexibly with the AI core depending on the application. In order to realize the proposed system, we designed a controller for multiple AI cores and implemented it on a ﬁeld-programmable gate array (FPGA). The operation of the designed controller was veriﬁed through image and speech applications, and the performance was veriﬁed through a simulator.
@article{jang_multi-core_2021, title = {A {Multi}-{Core} {Controller} for an {Embedded} {AI} {System} {Supporting} {Parallel} {Recognition}}, volume = {12}, issn = {2072-666X}, url = {https://www.mdpi.com/2072-666X/12/8/852}, doi = {10.3390/mi12080852}, number = {8}, journal = {Micromachines}, author = {Jang, Suyeon and Oh, Hyun Woo and Yoon, Young Hyun and Hwang, Dong Hyun and Jeong, Won Sik and Lee, Seung Eun}, month = jul, year = {2021}, pages = {852}, }
Micromachines
ASimOV: A Framework for Simulation and Optimization of an Embedded AI Accelerator

Dong Hyun Hwang, Chang Yeop Han, Hyun Woo Oh, and Seung Eun Lee

Micromachines, Jul 2021, pp. 838

Abstract BibTeX HTML PDF

Artiﬁcial intelligence algorithms need an external computing device such as a graphics processing unit (GPU) due to computational complexity. For running artiﬁcial intelligence algorithms in an embedded device, many studies proposed light-weighted artiﬁcial intelligence algorithms and artiﬁcial intelligence accelerators. In this paper, we propose the ASimOV framework, which optimizes artiﬁcial intelligence algorithms and generates Verilog hardware description language (HDL) code for executing intelligence algorithms in ﬁeld programmable gate array (FPGA). To verify ASimOV, we explore the performance space of k-NN algorithms and generate Verilog HDL code to demonstrate the k-NN accelerator in FPGA. Our contribution is to provide the artiﬁcial intelligence algorithm as an end-to-end pipeline and ensure that it is optimized to a speciﬁc dataset through simulation, and an artiﬁcial intelligence accelerator is generated in the end.
@article{hwang_asimov_2021, title = {{ASimOV}: {A} {Framework} for {Simulation} and {Optimization} of an {Embedded} {AI} {Accelerator}}, volume = {12}, issn = {2072-666X}, shorttitle = {{ASimOV}}, url = {https://www.mdpi.com/2072-666X/12/7/838}, doi = {10.3390/mi12070838}, number = {7}, journal = {Micromachines}, author = {Hwang, Dong Hyun and Han, Chang Yeop and Oh, Hyun Woo and Lee, Seung Eun}, month = jul, year = {2021}, pages = {838}, }
Electronics
The Design of a 2D Graphics Accelerator for Embedded Systems

Hyun Woo Oh, Ji Kwang Kim, Gwan Beom Hwang, and Seung Eun Lee

Electronics, Feb 2021, pp. 469

Abstract BibTeX HTML PDF

Recently, advances in technology have enabled embedded systems to be adopted for a variety of applications. Some of these applications require real-time 2D graphics processing running on limited design speciﬁcations such as low power consumption and a small area. In order to satisfy such conditions, including a speciﬁc 2D graphics accelerator in the embedded system is an effective method. This method reduces the workload of the processor in the embedded system by exploiting the accelerator. The accelerator assists the system to perform 2D graphics processing in real-time. Therefore, a variety of applications that require 2D graphics processing can be implemented with an embedded processor. In this paper, we present a 2D graphics accelerator for tiny embedded systems. The accelerator includes an optimized line-drawing operation based on Bresenham’s algorithm. The optimized operation enables the accelerator to deal with various kinds of 2D graphics processing and to perform the line-drawing instead of the system processor. Moreover, the accelerator also distributes the workload of the processor core by removing the need for the core to access the frame buffer memory. We measure the performance of the accelerator by implementing the processor, including the accelerator, on a ﬁeld-programmable gate array (FPGA), and ascertaining the possibility of realization by synthesizing using the 180 nm CMOS process.
@article{oh_design_2021, title = {The {Design} of a {2D} {Graphics} {Accelerator} for {Embedded} {Systems}}, volume = {10}, issn = {2079-9292}, url = {https://www.mdpi.com/2079-9292/10/4/469}, doi = {10.3390/electronics10040469}, number = {4}, journal = {Electronics}, author = {Oh, Hyun Woo and Kim, Ji Kwang and Hwang, Gwan Beom and Lee, Seung Eun}, month = feb, year = {2021}, pages = {469}, }
Micromachines
Lossless Decompression Accelerator for Embedded Processor with GUI

Gwan Beom Hwang, Kwon Neung Cho, Chang Yeop Han, Hyun Woo Oh, Young Hyun Yoon, and Seung Eun Lee

Micromachines, Jan 2021, pp. 145

Abstract BibTeX HTML PDF

The development of the mobile industry brings about the demand for high-performance embedded systems in order to meet the requirement of user-centered application. Because of the limitation of memory resource, employing compressed data is efﬁcient for an embedded system. However, the workload for data decompression causes an extreme bottleneck to the embedded processor. One of the ways to alleviate the bottleneck is to integrate a hardware accelerator along with the processor, constructing a system-on-chip (SoC) for the embedded system. In this paper, we propose a lossless decompression accelerator for an embedded processor, which supports LZ77 decompression and static Huffman decoding for an inﬂate algorithm. The accelerator is implemented on a ﬁeld programmable gate array (FPGA) to verify the functional suitability and fabricated in a Samsung 65 nm complementary metal oxide semiconductor (CMOS) process. The performance of the accelerator is evaluated by the Canterbury corpus benchmark and achieved throughput up to 20.7 MB/s at 50 MHz system clock frequency.
@article{hwang_lossless_2021, title = {Lossless {Decompression} {Accelerator} for {Embedded} {Processor} with {GUI}}, volume = {12}, issn = {2072-666X}, url = {https://www.mdpi.com/2072-666X/12/2/145}, doi = {10.3390/mi12020145}, number = {2}, journal = {Micromachines}, author = {Hwang, Gwan Beom and Cho, Kwon Neung and Han, Chang Yeop and Oh, Hyun Woo and Yoon, Young Hyun and Lee, Seung Eun}, month = jan, year = {2021}, pages = {145}, }
ICCE
Vision-based Parking Occupation Detecting with Embedded AI Processor

Kwon Neung Cho, Hyun Woo Oh, and Seung Eun Lee

IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, Jan 2021, pp. 1–2

Abstract BibTeX HTML

Recently, as the interest of smart parking system is increasing, the various methods for detecting parking occupation are under study. In this paper, we present a vision-based parking occupation detection with embedded AI processor. By employing a fisheye lens camera, multiple parking slot states are identified in one device. We measure the recognition rate of the AI processor in the proposed system and determine the optimized configuration with software simulator. The highest recognition rate is measured at 94.48% in the configuration of 64 number of training data with 256 bytes data size.
@inproceedings{cho_vision-based_2021, address = {Las Vegas, NV, USA}, title = {Vision-based {Parking} {Occupation} {Detecting} with {Embedded} {AI} {Processor}}, isbn = {978-1-72819-766-1}, url = {https://ieeexplore.ieee.org/document/9427661/}, doi = {10.1109/ICCE50685.2021.9427661}, booktitle = {{IEEE} {International} {Conference} on {Consumer} {Electronics} ({ICCE})}, publisher = {IEEE}, author = {Cho, Kwon Neung and Oh, Hyun Woo and Lee, Seung Eun}, month = jan, year = {2021}, pages = {1--2}, }

2020

ISOCC Poster
Design of 32-bit Processor for Embedded Systems

Hyun Woo Oh, Kwon Neung Cho, and Seung Eun Lee

International SoC Design Conference (ISOCC), Yeosu, Korea, Oct 2020, pp. 306–307

Abstract BibTeX HTML PDF Poster

In this paper, we propose a 32-bit processor for the embedded system. In order to provide less area and low power operation, we adopt MIPS instruction set architecture (ISA) to our processor. The processor consists of five pipeline stages to reduce the critical path. In order to solve the data hazard in pipeline stages, we design the data forwarding unit and stall unit with optimized bubble insertion. The processor is implemented on a field programmable gate array (FPGA), and we verify the functionality of the processor and measure the performance by using the Dhrystone benchmark. The Dhrystone MIPS (DMIPS) is measured at 27.71 at 50 MHz operation.
@inproceedings{oh_design_2020, address = {Yeosu, Korea}, title = {Design of 32-bit {Processor} for {Embedded} {Systems}}, isbn = {978-1-72818-331-2}, url = {https://ieeexplore.ieee.org/document/9332944/}, doi = {10.1109/ISOCC50952.2020.9332944}, booktitle = {{International} {SoC} {Design} {Conference} ({ISOCC})}, publisher = {IEEE}, author = {Oh, Hyun Woo and Cho, Kwon Neung and Lee, Seung Eun}, month = oct, year = {2020}, pages = {306--307}, }