Rowel Atienza, PhD | Professor · University of the Philippines

01 — About

Background & Research

AI, Computer Vision, Robotics and beyond

Research Interests

Computer Vision & Scene Text Recognition
Robotics & Human-Robot Interaction
Speech Synthesis & Signal Processing
Embodied AI & Autonomous Agents
Deep Learning & Model Efficiency
Point Cloud & 3D Vision

Education

PhD in Robotics — Australian National University, 2008
MEng — National University of Singapore, 1997

Affiliations

Professor, EEEI — University of the Philippines Diliman
Ubiquitous Computing Laboratory
AI Graduate Program, UP Diliman

Highlights

Inventor of ViTSTR and PARSeq, state-of-the-art scene text recognition models integrated into Intel OpenVINO, PaddlePaddle, and deployed by NASA on the Astrobee robot aboard the International Space Station.

Recognized in the Stanford/Elsevier Top 2% Scientists Worldwide ranking (2025). Publishes and reviews at top venues: ECCV, ICRA, ICASSP, ICDAR, CVPR.

Notable Achievements

Best of ICCV 2025 — Sari Sandbox
NASA ISS deployment — PARSeq on Astrobee
Intel OpenVINO integration — ViTSTR
PaddlePaddle integration — PARSeq
Author of Packt bestseller Advanced Deep Learning

02 — New Publications

Latest Work

Most recent papers — 2025 & 2026

ICCVW 2025 2025 2025 Best of ICCV

Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents

Janika Deborah Gajo, Gerarld Paul Merales, Jerome Escarcha, Brenden Ashley Molina, Gian Nartea, Emmanuel Maminta, Juan Carlos Roldan, Rowel Atienza

A high-fidelity, photorealistic 3D retail store simulation for benchmarking embodied agents against human performance in shopping tasks. Features over 250 interactive grocery items across three store configurations, controlled via an API. Supports both VR for human interaction and a VLM-powered embodied agent. Introduces SariBench, a dataset of annotated human demonstrations across varied task difficulties. Selected as one of the "Best of ICCV" from over 12,000 global submissions — the first comprehensive retail store simulation for embodied AI agent training.

CVPRW 2026 2026 2026

A Survey of Spatial Memory Representations for Efficient Robot Navigation

Ma. Madecheen S. Pangaliman, Steven S. Sison, Erwin P. Quilloy, Rowel Atienza

A comprehensive survey of spatial memory efficiency for vision-based robot navigation, examining 88 references spanning 52 systems from 1989–2025 — from occupancy grids to neural implicit representations. Introduces α = M_peak/M_map, the ratio of peak runtime memory to saved map size, exposing the gap between published map sizes and actual deployment cost. Profiling on an NVIDIA A100 GPU reveals α spans two orders of magnitude within neural methods alone (2.3 for Point-SLAM to 215 for NICE-SLAM). Proposes a standardized evaluation protocol and an α-aware budgeting algorithm for assessing deployment feasibility on embedded platforms (8–16 GB, <30 W). Accepted at the Women in Computer Vision (WiCV) Workshop at CVPR 2026.

arXiv 2025 2025 2025

Interpretable Open-Vocabulary Referring Object Detection with Reverse Contrast Attention (RCA)

Drandreb Earl Juanico, Rowel Atienza, Jeffrey Kenneth Go

A plug-in method that enhances object localization in vision-language transformers without retraining. RCA reweights final-layer attention by suppressing extremes and amplifying mid-level activations to let semantically relevant but subdued tokens guide predictions. Evaluated on Open Vocabulary Referring Object Detection (OV-RefOD), RCA improves FitAP in 11 out of 15 open-source VLMs, with gains up to +26.6%.

03 — Publications

Selected Publications

Prioritizing top-cited works and recent 2025–2026 papers · Citation counts from Google Scholar

ECCV 2022 2022 Top Cited NASA ISS

Scene Text Recognition with Permuted Autoregressive Sequence Models (PARSeq)

Darwin Bautista, Rowel Atienza

We propose PARSeq, an approach using permuted autoregressive sequence models for scene text recognition. PARSeq learns an ensemble of internal AR language models with shared weights using Permutation Language Modeling, unifying context-free non-AR and context-aware AR inference and iterative refinement using bidirectional context. Achieves SOTA results: 91.9% accuracy on benchmarks, 96.0% when trained on real data. Deployed by NASA on the Astrobee robot aboard the International Space Station.

ICDAR 2021 2021 Top Cited

Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Rowel Atienza

We propose ViTSTR, a single-stage STR model built on a compute and parameter-efficient Vision Transformer (ViT). Our small ViTSTR achieves 82.6% accuracy (84.2% with augmentation) at 2.4× speed up over TRBA, using only 43.4% of the parameters and 42.2% FLOPs. With data augmentation, base ViTSTR outperforms TRBA at 85.2% accuracy at 2.3× speed. Integrated into Intel OpenVINO and PaddlePaddle.

ICCVW 2021 2021 Top Cited

Data Augmentation for Scene Text Recognition (STRAug)

Rowel Atienza

We introduce STRAug, 36 image augmentation functions designed for Scene Text Recognition (STR). Each function mimics text image properties found in natural scenes, caused by camera sensors, or induced by signal processing. Applied with RandAugment, STRAug increases STR model accuracy by up to 2.10% on Rosetta, with consistent gains across CRNN, RARE, TRBA and other models.

CVPRW 2021 2021 Top Cited

GOO: A Dataset for Gaze Object Prediction in Retail Environments

Henri Tomas, Marcus Reyes, Raimarc Dionido, Mark Ty, Jonric Mirando, Joel Casimiro, Rowel Atienza, Richard Guinto

We introduce GOO, a novel dataset annotating not just the pixel being looked at but the boundaries of the gaze object in retail environments. Addresses the gap in gaze-related datasets by providing full object-level annotation, enabling more advanced gaze object prediction research.

WACV 2022 2022 Top Cited

Improving Model Generalization by Agreement of Learned Representations from Data Augmentation (AGMax)

Rowel Atienza

We propose AGMax, a training method that improves model generalization by maximizing the agreement of learned representations from differently augmented views of the same image. The approach combines agreement-based objectives with standard supervised training to achieve better generalization on downstream tasks.

CVPRW 2019 2019 Top Cited

A Conditional Generative Adversarial Network for Rendering Point Clouds (pc2pix)

Rowel Atienza

We propose pc2pix, a conditional GAN that directly renders a point cloud given azimuth and elevation camera viewpoint angles. pc2pix renders point clouds with higher class similarity to ground truth compared to surface reconstruction, while being significantly faster, more robust to noise, and operative on fewer points.

ICASSP 2023 2023 Top Cited

EfficientSpeech: An On-Device Text-to-Speech Model

Rowel Atienza

We introduce EfficientSpeech, a lightweight text-to-speech model designed for on-device inference. The model achieves competitive speech quality while dramatically reducing compute and memory requirements, making real-time TTS practical on edge devices with limited resources.

arXiv 2018 2018 Top Cited

Fast Disparity Estimation using Dense Networks (DenseMapNet)

Rowel Atienza

We present DenseMapNet, a compact CNN for stereo disparity estimation inspired by dense networks. With only 290k parameters it runs at 30Hz+ on full-resolution color stereo images, while accuracy remains comparable to significantly larger CNN-based methods.

ICASSP 2022 2022 Top Cited

Depth Pruning with Auxiliary Networks for TinyML

Josen Daniel De Leon, Rowel Atienza

We propose a depth pruning method using an auxiliary network as a new head of the pruned model. Achieves 93% parameter reduction on MLPerfTiny Visual Wake Words with only 0.65% accuracy cost. After quantization on Cortex-M0, the pruned network gained 1% accuracy while reducing model size by 4.7× and latency by 1.6×.

ICIP 2023 2023 Recent

Scene Text Recognition Models Explainability Using Local Features

Mark Vincent Ty, Rowel Atienza

We investigate the explainability of scene text recognition models using local feature attribution methods. The study sheds light on which image regions and features drive predictions in state-of-the-art STR models, providing interpretability tools for the research community.

TENCON 2023 2023 Recent

Fast Data Augmentation for Scene Text Recognition Using CUDA

David Angelo Piscasio, Rowel Atienza

We propose FastSTRAug, a CUDA-based library of 36 augmentation functions designed for STR, significantly faster than its CPU-based counterpart STRAug while maintaining the same augmentation diversity and quality improvements for scene text recognition models.

ECCVW 2020 2020 Top Cited

Next-Best View Policy for 3D Reconstruction

Daryl Peralta, Joel Casimiro, Aldrin Michael Nilles, Justine Aletta Aguilar, Rowel Atienza, Rhandley Cajote

ICMI 2003 2003 Foundational

Intuitive Human-Robot Interaction Through Active 3D Gaze Tracking

Rowel Atienza, Alexander Zelinsky

View All Publications on Google Scholar ↗

Rowel O.
Atienza

Background & Research

Research Interests

Education

Affiliations

Highlights

Notable Achievements

Latest Work

Selected Publications

Authored Books

Advanced Deep Learning with TensorFlow 2 and Keras (2nd Ed.)

Advanced Deep Learning with Keras (1st Ed.)

Get in Touch

Rowel O.Atienza

Background & Research

Research Interests

Education

Affiliations

Highlights

Notable Achievements

Latest Work

Selected Publications

Authored Books

Advanced Deep Learning with TensorFlow 2 and Keras (2nd Ed.)

Advanced Deep Learning with Keras (1st Ed.)

Get in Touch

Rowel O.
Atienza