Matthew Gwilliam
I am a fifth year Ph.D student in the department of Computer Science at the University of
Maryland (UMD), advised by Professor Abhinav Shrivastava. I
am studying computer vision.
I completed my B.S. in Computer Science at Brigham Young University in 2019. During
undergrad I worked part-time at Qualtrics, and after graduation I
worked there full-time before I started my Ph.D in 2020.
While at BYU, I was fortunate to work with Ryan Farrell, who
helped me grow as a researcher and a person and ultimately decide to pursue my
graduate degree.
During my PhD, I have enjoyed opportunities to work as an intern at Amazon, SRI, and NVIDIA.
Email  / 
CV  / 
Google
Scholar
|
|
Research
I am interested in computer vision models that learn without labels.
More specifically, I am interested in methods that can learn universal image
representations in an unsupervised manner.
Currently, that work focuses on models based on diffusion and implicit neural
representation (but mostly on INR).
I am working with the sorts of tasks that these models are useful for: video
retrieval, compression, generation; image classification, clustering, etc.
|
|
Do Text-free Diffusion Models Learn Discriminative Visual Representations?
Matthew Gwilliam*,
Soumik Mukhopadhyay*,
Yosuke
Yamaguchi✝,
Vatsal
Agarwal✝,
Namitha Padmanabhan,
Archana Swaminathan,
Tianyi Zhou,
Abhinav Shrivastava
European Conference on Computer Vision ECCV 2024 (ECCV) ,
2024
Project Page |Paper
Explore diffusion models as unified unsupervised image representation learning
models for many recognition tasks. Propose DifFormer and DifFeed, novel mechanisms for fusing diffusion features
for image classification.
|
|
Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics
Shishira R Maiya*,
Anubhav Gupta*,
Matthew Gwilliam,
Max Ehrlich,
Abhinav Shrivastava
European Conference on Computer Vision ECCV 2024 (ECCV) ,
2024
Project Page |Paper
Develop implicit neural video models that perform well not only for compression, but also for retrieval, chat, and more.
|
|
Explaining the Implicit Neural Canvas (XINC): Connecting Pixels to Neurons by Tracing their Contributions
Namitha Padmanabhan*,
Matthew Gwilliam*,
Pulkit Kumar,
Shishira Maiya,
Max Ehrlich,
Abhinav Shrivastava
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ,
2024
Project Page |Paper |Code
XINC dissects Implicit Neural Representation (INR) models to understand how neurons represent images and videos and to
reveal the inner workings of INRs.
|
|
Elusive Images: Beyond Coarse Analysis for Fine-Grained Recognition
Connor Anderson
Matthew Gwilliam,
Evelyn Gaskin
Ryan Farrell
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) ,
2024
Paper
Identify error types and image difficulty among state-of-the-art models for fine-grained visual categorization.
|
|
A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval
Matthew Gwilliam,
Michael Cogswell,
Meng Ye,
Karan Sikka,
Abhinav Shrivastava,
Ajay Divakaran
Under Review
Project Page |Paper
Propose an alternative to paragraphs for long video retrieval, specifically, the 10k Words problem: every video should be able to be matched
with any valid description, so we generate many descriptions for every video (for 3 long video datasets), evaluate accordingly, and introduce novel finetuning for better
performance.
|
|
Diffusion Models Beat GANs on Image Classification
Matthew Gwilliam*,
Soumik Mukhopadhyay*,
Vatsal
Agarwal,
Namitha Padmanabhan,
Archana Swaminathan,
Tianyi Zhou,
Abhinav Shrivastava
preprint only
Project Page |Paper
Show the potential of diffusion models as unified unsupervised image representation learners.
|
|
HNeRV: A Hybrid Neural Representation for Videos
Hao Chen,
Matthew Gwilliam,
Ser-Nam Lim,
Abhinav Shrivastava
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ,
2023
Project Page | Paper | Code
Combine the strengths of implicit (NeRV) and explicit (autoencoder)
representation to create a hybrid neural
representation for video with good properties for representation, compression,
and editing.
|
|
CNeRV: Content-adaptive Neural Representation for Visual Data
Hao Chen,
Matthew Gwilliam,
Bo He,
Ser-Nam Lim,
Abhinav Shrivastava
British Machine Vision Conference (BMVC),
2022 (ORAL)
Project Page | Paper
Make implicit video representation networks generalize to unseen data by
swapping time embedding for content-aware embedding
that is computed as a unique summary of each frame.
|
|
Beyond Supervised vs. Unsupervised: Representative Benchmarking and Analysis of Image Representation Learning
Matthew Gwilliam,
Abhinav Shrivastava
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ,
2022
Project Page | Paper | Code
Examine, compare, and contrast popular unsupervised image representation learning methods,
showing that there are significant differences based on specific algorithm used,
and "supervised vs. unsupervised" comparisons which neglect these differences
tend to over-generalize.
|
|
Rethinking Common Assumptions to Mitigate Racial Bias in Face Recognition Datasets
Matthew Gwilliam,
Srinidhi Hegde
Lade Tinubu
Alex Hanson
IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) ,
2021
Paper | Code
Reveal the role of data in racial bias for facial recognition systems, and the flaws
underlying the assumption that balanced data results in fair performance.
|
|
Fair Comparison: Quantifying Variance in Results for Fine-grained Visual Categorization
Matthew Gwilliam,
Adam Teuscher
Connor Anderson
Ryan Farrell
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) ,
2021
Paper
Uncover the large, often-ignored amount of variance in FGVC systems across training runs,
on the dataset level, but more
particularly in terms of the classification performance for individual classes.
|
|
Intelligent Image Collection: Building the Optimal Dataset
Matthew Gwilliam,
Ryan Farrell
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) ,
2020
Paper
Propose smart practices to optimize image curation,
such that classification accuracy is maximized
for a given constrained dataset size.
|
|