ECCV 2022
2022
Top Cited
NASA ISS
Scene Text Recognition with Permuted Autoregressive Sequence Models (PARSeq)
Darwin Bautista, Rowel Atienza
We propose PARSeq, an approach using permuted autoregressive sequence models for scene text recognition. PARSeq learns an ensemble of internal AR language models with shared weights using Permutation Language Modeling, unifying context-free non-AR and context-aware AR inference and iterative refinement using bidirectional context. Achieves SOTA results: 91.9% accuracy on benchmarks, 96.0% when trained on real data. Deployed by NASA on the Astrobee robot aboard the International Space Station.
ICDAR 2021
2021
Top Cited
Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)
Rowel Atienza
We propose ViTSTR, a single-stage STR model built on a compute and parameter-efficient Vision Transformer (ViT). Our small ViTSTR achieves 82.6% accuracy (84.2% with augmentation) at 2.4× speed up over TRBA, using only 43.4% of the parameters and 42.2% FLOPs. With data augmentation, base ViTSTR outperforms TRBA at 85.2% accuracy at 2.3× speed. Integrated into Intel OpenVINO and PaddlePaddle.
ICCVW 2021
2021
Top Cited
Data Augmentation for Scene Text Recognition (STRAug)
Rowel Atienza
We introduce STRAug, 36 image augmentation functions designed for Scene Text Recognition (STR). Each function mimics text image properties found in natural scenes, caused by camera sensors, or induced by signal processing. Applied with RandAugment, STRAug increases STR model accuracy by up to 2.10% on Rosetta, with consistent gains across CRNN, RARE, TRBA and other models.
CVPRW 2021
2021
Top Cited
GOO: A Dataset for Gaze Object Prediction in Retail Environments
Henri Tomas, Marcus Reyes, Raimarc Dionido, Mark Ty, Jonric Mirando, Joel Casimiro, Rowel Atienza, Richard Guinto
We introduce GOO, a novel dataset annotating not just the pixel being looked at but the boundaries of the gaze object in retail environments. Addresses the gap in gaze-related datasets by providing full object-level annotation, enabling more advanced gaze object prediction research.
WACV 2022
2022
Top Cited
Improving Model Generalization by Agreement of Learned Representations from Data Augmentation (AGMax)
Rowel Atienza
We propose AGMax, a training method that improves model generalization by maximizing the agreement of learned representations from differently augmented views of the same image. The approach combines agreement-based objectives with standard supervised training to achieve better generalization on downstream tasks.
CVPRW 2019
2019
Top Cited
A Conditional Generative Adversarial Network for Rendering Point Clouds (pc2pix)
Rowel Atienza
We propose pc2pix, a conditional GAN that directly renders a point cloud given azimuth and elevation camera viewpoint angles. pc2pix renders point clouds with higher class similarity to ground truth compared to surface reconstruction, while being significantly faster, more robust to noise, and operative on fewer points.
ICASSP 2023
2023
Top Cited
EfficientSpeech: An On-Device Text-to-Speech Model
Rowel Atienza
We introduce EfficientSpeech, a lightweight text-to-speech model designed for on-device inference. The model achieves competitive speech quality while dramatically reducing compute and memory requirements, making real-time TTS practical on edge devices with limited resources.
arXiv 2018
2018
Top Cited
Fast Disparity Estimation using Dense Networks (DenseMapNet)
Rowel Atienza
We present DenseMapNet, a compact CNN for stereo disparity estimation inspired by dense networks. With only 290k parameters it runs at 30Hz+ on full-resolution color stereo images, while accuracy remains comparable to significantly larger CNN-based methods.
ICASSP 2022
2022
Top Cited
Depth Pruning with Auxiliary Networks for TinyML
Josen Daniel De Leon, Rowel Atienza
We propose a depth pruning method using an auxiliary network as a new head of the pruned model. Achieves 93% parameter reduction on MLPerfTiny Visual Wake Words with only 0.65% accuracy cost. After quantization on Cortex-M0, the pruned network gained 1% accuracy while reducing model size by 4.7× and latency by 1.6×.
ICCVW 2025
2025
2025
Best of ICCV
Sari Sandbox: A Virtual Retail Store Environment for Embodied AI Agents
Janika Deborah Gajo, Gerarld Paul Merales, Jerome Escarcha, Rowel Atienza et al.
We present Sari Sandbox, a high-fidelity, photorealistic 3D retail store simulation for benchmarking embodied agents against human performance in shopping tasks. Features 250+ interactive grocery items across three store configurations. Selected as one of the "Best of ICCV" from over 12,000 global submissions — the first comprehensive retail store simulation environment for embodied AI agent training.
arXiv 2025
2025
2025
Interpretable Open-Vocabulary Referring Object Detection with Reverse Contrast Attention (RCA)
Drandreb Earl Juanico, Rowel Atienza, Jeffrey Kenneth Go
We propose Reverse Contrast Attention (RCA), a plug-in method that enhances object localization in vision-language transformers without retraining. RCA reweights final-layer attention by suppressing extremes and amplifying mid-level activations to let semantically relevant but subdued tokens guide predictions. Evaluated on Open Vocabulary Referring Object Detection (OV-RefOD), RCA improves FitAP in 11 out of 15 open-source VLMs, with gains up to +26.6%.
ICIP 2023
2023
Recent
Scene Text Recognition Models Explainability Using Local Features
Mark Vincent Ty, Rowel Atienza
We investigate the explainability of scene text recognition models using local feature attribution methods. The study sheds light on which image regions and features drive predictions in state-of-the-art STR models, providing interpretability tools for the research community.
TENCON 2023
2023
Recent
Fast Data Augmentation for Scene Text Recognition Using CUDA
David Angelo Piscasio, Rowel Atienza
We propose FastSTRAug, a CUDA-based library of 36 augmentation functions designed for STR, significantly faster than its CPU-based counterpart STRAug while maintaining the same augmentation diversity and quality improvements for scene text recognition models.
ECCVW 2020
2020
Top Cited
Next-Best View Policy for 3D Reconstruction
Daryl Peralta, Joel Casimiro, Aldrin Michael Nilles, Justine Aletta Aguilar, Rowel Atienza, Rhandley Cajote
ICMI 2003
2003
Foundational
Intuitive Human-Robot Interaction Through Active 3D Gaze Tracking
Rowel Atienza, Alexander Zelinsky