Rowel Atienza, PhD | Professor · University of the Philippines

ECCV 2022 2022 Top Cited NASA ISS

Scene Text Recognition with Permuted Autoregressive Sequence Models (PARSeq)

Darwin Bautista, Rowel Atienza

We propose PARSeq, an approach using permuted autoregressive sequence models for scene text recognition. PARSeq learns an ensemble of internal AR language models with shared weights using Permutation Language Modeling, unifying context-free non-AR and context-aware AR inference and iterative refinement using bidirectional context. Achieves SOTA results: 91.9% accuracy on benchmarks, 96.0% when trained on real data. Deployed by NASA on the Astrobee robot aboard the International Space Station.

ICDAR 2021 2021 Top Cited

Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Rowel Atienza

We propose ViTSTR, a single-stage STR model built on a compute and parameter-efficient Vision Transformer (ViT). Our small ViTSTR achieves 82.6% accuracy (84.2% with augmentation) at 2.4× speed up over TRBA, using only 43.4% of the parameters and 42.2% FLOPs. With data augmentation, base ViTSTR outperforms TRBA at 85.2% accuracy at 2.3× speed. Integrated into Intel OpenVINO and PaddlePaddle.

ICCVW 2021 2021 Top Cited

Data Augmentation for Scene Text Recognition (STRAug)

Rowel Atienza

We introduce STRAug, 36 image augmentation functions designed for Scene Text Recognition (STR). Each function mimics text image properties found in natural scenes, caused by camera sensors, or induced by signal processing. Applied with RandAugment, STRAug increases STR model accuracy by up to 2.10% on Rosetta, with consistent gains across CRNN, RARE, TRBA and other models.

CVPRW 2021 2021 Top Cited

GOO: A Dataset for Gaze Object Prediction in Retail Environments

Henri Tomas, Marcus Reyes, Raimarc Dionido, Mark Ty, Jonric Mirando, Joel Casimiro, Rowel Atienza, Richard Guinto

We introduce GOO, a novel dataset annotating not just the pixel being looked at but the boundaries of the gaze object in retail environments. Addresses the gap in gaze-related datasets by providing full object-level annotation, enabling more advanced gaze object prediction research.

WACV 2022 2022 Top Cited

Improving Model Generalization by Agreement of Learned Representations from Data Augmentation (AGMax)

Rowel Atienza

We propose AGMax, a training method that improves model generalization by maximizing the agreement of learned representations from differently augmented views of the same image. The approach combines agreement-based objectives with standard supervised training to achieve better generalization on downstream tasks.

CVPRW 2019 2019 Top Cited

A Conditional Generative Adversarial Network for Rendering Point Clouds (pc2pix)

Rowel Atienza

We propose pc2pix, a conditional GAN that directly renders a point cloud given azimuth and elevation camera viewpoint angles. pc2pix renders point clouds with higher class similarity to ground truth compared to surface reconstruction, while being significantly faster, more robust to noise, and operative on fewer points.

ICASSP 2023 2023 Top Cited

EfficientSpeech: An On-Device Text-to-Speech Model

Rowel Atienza

We introduce EfficientSpeech, a lightweight text-to-speech model designed for on-device inference. The model achieves competitive speech quality while dramatically reducing compute and memory requirements, making real-time TTS practical on edge devices with limited resources.

arXiv 2018 2018 Top Cited

Fast Disparity Estimation using Dense Networks (DenseMapNet)

Rowel Atienza

We present DenseMapNet, a compact CNN for stereo disparity estimation inspired by dense networks. With only 290k parameters it runs at 30Hz+ on full-resolution color stereo images, while accuracy remains comparable to significantly larger CNN-based methods.

ICASSP 2022 2022 Top Cited

Depth Pruning with Auxiliary Networks for TinyML

Josen Daniel De Leon, Rowel Atienza

We propose a depth pruning method using an auxiliary network as a new head of the pruned model. Achieves 93% parameter reduction on MLPerfTiny Visual Wake Words with only 0.65% accuracy cost. After quantization on Cortex-M0, the pruned network gained 1% accuracy while reducing model size by 4.7× and latency by 1.6×.

ICCVW 2025 2025 2025 Best of ICCV

Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents

Janika Deborah Gajo, Gerarld Paul Merales, Jerome Escarcha, Rowel Atienza et al.

We present Sari Sandbox, a high-fidelity, photorealistic 3D retail store simulation for benchmarking embodied agents against human performance in shopping tasks. Features 250+ interactive grocery items across three store configurations. Selected as one of the "Best of ICCV" from over 12,000 global submissions — the first comprehensive retail store simulation environment for embodied AI agent training.

arXiv 2025 2025 2025

Interpretable Open-Vocabulary Referring Object Detection with Reverse Contrast Attention (RCA)

Drandreb Earl Juanico, Rowel Atienza, Jeffrey Kenneth Go

We propose Reverse Contrast Attention (RCA), a plug-in method that enhances object localization in vision-language transformers without retraining. RCA reweights final-layer attention by suppressing extremes and amplifying mid-level activations to let semantically relevant but subdued tokens guide predictions. Evaluated on Open Vocabulary Referring Object Detection (OV-RefOD), RCA improves FitAP in 11 out of 15 open-source VLMs, with gains up to +26.6%.

ICIP 2023 2023 Recent

Scene Text Recognition Models Explainability Using Local Features

Mark Vincent Ty, Rowel Atienza

We investigate the explainability of scene text recognition models using local feature attribution methods. The study sheds light on which image regions and features drive predictions in state-of-the-art STR models, providing interpretability tools for the research community.

TENCON 2023 2023 Recent

Fast Data Augmentation for Scene Text Recognition Using CUDA

David Angelo Piscasio, Rowel Atienza

We propose FastSTRAug, a CUDA-based library of 36 augmentation functions designed for STR, significantly faster than its CPU-based counterpart STRAug while maintaining the same augmentation diversity and quality improvements for scene text recognition models.

ECCVW 2020 2020 Top Cited

Next-Best View Policy for 3D Reconstruction

Daryl Peralta, Joel Casimiro, Aldrin Michael Nilles, Justine Aletta Aguilar, Rowel Atienza, Rhandley Cajote

ICMI 2003 2003 Foundational

Intuitive Human-Robot Interaction Through Active 3D Gaze Tracking

Rowel Atienza, Alexander Zelinsky

Rowel O.
Atienza

Background & Research

Research Interests

Education

Affiliations

Highlights

Notable Achievements

Selected Publications

Authored Books

Advanced Deep Learning with TensorFlow 2 and Keras (2nd Ed.)

Advanced Deep Learning with Keras (1st Ed.)

Get in Touch

Rowel O.Atienza

Background & Research

Research Interests

Education

Affiliations

Highlights

Notable Achievements

Selected Publications

Authored Books

Advanced Deep Learning with TensorFlow 2 and Keras (2nd Ed.)

Advanced Deep Learning with Keras (1st Ed.)

Get in Touch

Rowel O.
Atienza