Tyler L. Hayes Personal Website

Tyler L. Hayes

About Me

Tyler Hayes

Hello! I recently defended my PhD in the Chester F. Carlson Center for Imaging Science at the Rochester Institute of Technology (RIT) in Rochester, NY. At RIT, I worked in the Machine and Neuromorphic Perception Laboratory (a.k.a. kLab) under the direction of my advisor, Dr. Christopher Kanan. My current research interests include lifelong machine learning, computer vision, and computational mathematics. I am a current board member of the ContinualAI non-profit organization. I spent Summer 2021 as a Research Intern at Facebook AI Research (FAIR) working with Dr. Arthur Szlam and Dr. Ludovic Denoyer. Previously, I earned a BS in Applied Mathematics from RIT in 2014 and an MS in Applied and Computational Mathematics from RIT in 2017. I have an Erdős number of 3!


Jun 2022: NEW! Served on a panel and gave an invited talk at the CLVISION Workshop at CVPR 2022 on Real-World Applications of Continual Learning!

May 2022: Our paper "Online Continual Learning for Embedded Devices" was accepted for poster presentation at CoLLAs 2022!

Mar 2022: Joined the board of the Continual AI non-profit organization!

Mar 2022: Successfully defended my doctoral dissertation!

Dec 2021: Our paper "Disentangling Transfer and Interference in Multi-Domain Learning" was accepted to the AAAI 2022 Workshop on Practical Deep Learning in the Wild!

Oct 2021: Our paper "Self-Supervised Training Enhances Online Continual Learning" was accepted for poster presentation at BMVC 2021! (36.2% acceptance rate)

Jun 2021: Gave an invited talk at the Continual AI Reading Group on our paper "Replay in Deep Learning: Current Approaches and Missing Biological Elements"!

May 2021: Our paper "Replay in Deep Learning: Current Approaches and Missing Biological Elements" was accepted for publication in the MIT Press journal of Neural Computation!

May 2021: Started working as a Research Intern at Facebook AI Research (FAIR)!

Apr 2021: Our paper "Selective Replay Enhances Learning in Online Continual Analogical Reasoning" was accepted for oral presentation at the CVPR 2021 Workshop on Continual Learning! An extended abstract of the paper was also accepted to the CVPR 2021 Workshop on Women in Computer Vision (WiCV)!

Nov 2020: Gave an invited talk at the Continual AI Meetup on Benchmarks and Evaluation for Continual Learning!

Sep 2020: Our paper "Are Open Set Classification Methods Effective on Large-Scale Datasets?" was published in PLoS ONE!

Aug 2020: Successfully defended my dissertation proposal and advanced to candidacy.

Jul 2020: Our paper "RODEO: Replay for Online Object Detection" was accepted for poster presentation at BMVC 2020! (29.1% acceptance rate)

Jul 2020: Our paper "Improved Robustness to Open Set Inputs via Tempered Mixup" was accepted to the ECCV 2020 Workshop on Adversarial Robustness in the Real World!

Jul 2020: Our paper "REMIND Your Neural Network to Prevent Catastrophic Forgetting" was accepted for poster presentation at ECCV 2020! (27.1% acceptance rate)

Jun 2020: Our paper on Deep SLDA won the Best Paper Award at the CVPR 2020 Workshop on Continual Learning!

May 2020: Gave an invited talk at the Continual AI Meetup on Continual Learning with Sequential Streaming Data!

May 2020: Won a travel grant to attend the CVPR Workshop on Women in Computer Vision (WiCV)!

Apr 2020: My research journey was featured in an RIT News article!

Apr 2020: Our extended abstract "REMIND Your Neural Network to Prevent Catastrophic Forgetting" was accepted to the CVPR 2020 Workshop on Women in Computer Vision (WiCV)!

Apr 2020: Our paper "Stream-51: Streaming Classification and Novelty Detection from Videos" was accepted for poster presentation at the CVPR 2020 Workshop on Continual Learning! See our project webpage here.

Apr 2020: Our paper "Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis" was accepted for oral presentation at the CVPR 2020 Workshop on Continual Learning!

Apr 2019: Gave an invited talk at the RIT CHAI AI Seminar Series on "Memory Efficient Experience Replay for Mitigating Catastrophic Forgetting."

Jan 2019: Our paper "Memory Efficient Experience Replay for Streaming Learning" was accepted for poster presentation at ICRA 2019! (44.0% acceptance rate)

Feb 2018: Our paper "Compassionately Conservative Balanced Cuts for Image Segmentation" was accepted for poster presentation at CVPR 2018! (29.6% acceptance rate)

Nov 2017: Our paper "Measuring Catastrophic Forgetting in Neural Networks" was accepted for a spotlight presentation at AAAI 2018! (24.6% acceptance rate)

Jun 2017: Passed my PhD Qualification Exam.

Jun 2017: Started working as a Research Intern in the Navy Center for Applied Research in Artificial Intelligence at the US Naval Research Laboratory (NRL).

Mar 2017: Successfully defended my MS thesis.


CoLLAs 2022: Online Continual Learning for Embedded Devices

Tyler L. Hayes & Christopher Kanan

arXiv 2022

Real-time on-device continual learning is needed for new applications such as home robots, user personalization on smartphones, and augmented/virtual reality headsets. However, this setting poses unique challenges: embedded devices have limited memory and compute capacity and conventional machine learning models suffer from catastrophic forgetting when updated on non-stationary data streams. While several online continual learning models have been developed, their effectiveness for embedded applications has not been rigorously studied. In this paper, we first identify criteria that online continual learners must meet to effectively perform real-time, on-device learning. We then study the efficacy of several online continual learning methods when used with mobile neural networks. We measure their performance, memory usage, compute requirements, and ability to generalize to out-of-domain inputs.

arXiv 2022: Can I see an Example? Active Learning the Long Tail of Attributes and Relations

Tyler L. Hayes, Maximilian Nickel, Christopher Kanan, Ludovic Denoyer, Arthur Szlam

arXiv 2022

There has been significant progress in creating machine learning models that identify objects in scenes along with their associated attributes and relationships; however, there is a large gap between the best models and human capabilities. One of the major reasons for this gap is the difficulty in collecting sufficient amounts of annotated relations and attributes for training these systems. While some attributes and relations are abundant, the distribution in the natural world and existing datasets is long tailed. In this paper, we address this problem by introducing a novel incremental active learning framework that asks for attributes and relations in visual scenes. While conventional active learning methods ask for labels of specific examples, we flip this framing to allow agents to ask for examples from specific categories. Using this framing, we introduce an active sampling method that asks for examples from the tail of the data distribution and show that it outperforms classical active learning methods on Visual Genome.

AAAIW 2022: Disentangling Transfer and Interference in Multi-Domain Learning

Yipeng Zhang, Tyler L. Hayes, Christopher Kanan

arXiv 2021

Humans are incredibly good at transferring knowledge from one domain to another, enabling rapid learning of new tasks. Likewise, transfer learning has enabled enormous success in many computer vision problems using pretraining. However, the benefits of transfer in multi-domain learning, where a network learns multiple tasks defined by different datasets, has not been adequately studied. Learning multiple domains could be beneficial or these domains could interfere with each other given limited network capacity. In this work, we decipher the conditions where interference and knowledge transfer occur in multi-domain learning. We propose new metrics disentangling interference and transfer and set up experimental protocols. We further examine the roles of network capacity, task grouping, and dynamic loss weighting in reducing interference and facilitating transfer. We demonstrate our findings on the CIFAR-100, MiniPlaces, and Tiny-ImageNet datasets.

Neural Computation 2021: Replay in Deep Learning: Current Approaches and Missing Biological Elements

Tyler L. Hayes, Giri P. Krishnan, Maxim Bazhenov, Hava T. Siegelmann, Terrence J. Sejnowski, Christopher Kanan

arXiv 2021

Replay is the reactivation of one or more neural patterns, which are similar to the activation patterns experienced during past waking experiences. Replay was first observed in biological neural networks during sleep, and it is now thought to play a critical role in memory formation, retrieval, and consolidation. Replay-like mechanisms have been incorporated into deep artificial neural networks that learn over time to avoid catastrophic forgetting of previous knowledge. Replay algorithms have been successfully used in a wide range of deep learning methods within supervised, unsupervised, and reinforcement learning paradigms. In this paper, we provide the first comprehensive comparison between replay in the mammalian brain and replay in artificial neural networks. We identify multiple aspects of biological replay that are missing in deep learning systems and hypothesize how they could be utilized to improve artificial neural networks.

BMVC 2021: Self-Supervised Training Enhances Online Continual Learning

Jhair Gallardo, Tyler L. Hayes, Christopher Kanan

arXiv 2021

In continual learning, a system must incrementally learn from a non-stationary data stream without catastrophic forgetting. Recently, multiple methods have been devised for incrementally learning classes on large-scale image classification tasks, such as ImageNet. State-of-the-art continual learning methods use an initial supervised pre-training phase, in which the first 10% - 50% of the classes in a dataset are used to learn representations in an offline manner before continual learning of new classes begins. We hypothesize that self-supervised pre-training could yield features that generalize better than supervised learning, especially when the number of samples used for pre-training is small. We test this hypothesis using the self-supervised MoCo-V2 and SwAV algorithms. On ImageNet, we find that both outperform supervised pre-training considerably for online continual learning, and the gains are larger when fewer samples are available. Our findings are consistent across three continual learning algorithms. Our best system achieves a 14.95% relative increase in top-1 accuracy on class incremental ImageNet over the prior state of the art for online continual learning.

CVPRW 2021: Selective Replay Enhances Learning in Online Continual Analogical Reasoning

Tyler L. Hayes & Christopher Kanan

arXiv 2021

In continual learning, a system learns from non-stationary data streams or batches without catastrophic forgetting. While this problem has been heavily studied in supervised image classification and reinforcement learning, continual learning in neural networks designed for abstract reasoning has not yet been studied. Here, we study continual learning of analogical reasoning. Analogical reasoning tests such as Raven's Progressive Matrices (RPMs) are commonly used to measure non-verbal abstract reasoning in humans, and recently offline neural networks for the RPM problem have been proposed. In this paper, we establish experimental baselines, protocols, and forward and backward transfer metrics to evaluate continual learners on RPMs. We employ experience replay to mitigate catastrophic forgetting. Prior work using replay for image classification tasks has found that selectively choosing the samples to replay offers little, if any, benefit over random selection. In contrast, we find that selective replay can significantly outperform random selection for the RPM task.

PLoS ONE 2020: Are Open Set Classification Methods Effective on Large-Scale Datasets?

Ryne Roady, Tyler L. Hayes, Ronald Kemker, Ayesha Gonzales, Christopher Kanan

arXiv 2019

Supervised classification methods often assume the train and test data distributions are the same and that all classes in the test set are present in the training set. However, deployed classifiers often require the ability to recognize inputs from outside the training set as unknowns. This problem has been studied under multiple paradigms including out-of-distribution detection and open set recognition. For convolutional neural networks, there have been two major approaches: 1) inference methods to separate knowns from unknowns and 2) feature space regularization strategies to improve model robustness to novel inputs. Up to this point, there has been little attention to exploring the relationship between the two approaches and directly comparing performance on large-scale datasets that have more than a few dozen categories. Using the ImageNet ILSVRC-2012 large-scale classification dataset, we identify novel combinations of regularization and specialized inference methods that perform best across multiple open set classification problems of increasing difficulty level. We find that input perturbation and temperature scaling yield significantly better performance on large-scale datasets than other inference methods tested, regardless of the feature space regularization strategy. Conversely, we find that improving performance with advanced regularization schemes during training yields better performance when baseline inference techniques are used; however, when advanced inference methods are used to detect open set classes, the utility of these combersome training paradigms is less evident.

BMVC 2020: RODEO: Replay for Online Object Detection

Manoj Acharya, Tyler L. Hayes, Christopher Kanan

BMVC 2020

Humans can incrementally learn to do new visual detection tasks, which is a huge challenge for today's computer vision systems. Incrementally trained deep learning models lack backwards transfer to previously seen classes and suffer from a phenomenon known as catastrophic forgetting. In this paper, we pioneer online streaming learning for object detection, where an agent must learn examples one at a time with severe memory and computational constraints. In object detection, a system must output all bounding boxes for an image with the correct label. Unlike earlier work, the system described in this paper can learn how to do this task in an online manner with new classes being introduced over time. We achieve this capability by using a novel memory replay mechanism that replays entire scenes in an efficient manner. We achieve state-of-the-art results on both the PASCAL VOC 2007 and MS COCO datasets.

ECCVW 2020: Improved Robustness to Open Set Inputs via Tempered Mixup

Ryne Roady, Tyler L. Hayes, Christopher Kanan

ECCVW 2020

Supervised classification methods often assume that evaluation data is drawn from the same distribution as training data and that all classes are present for training. However, real-world classifiers must handle inputs that are far from the training distribution including samples from unknown classes. Open set robustness refers to the ability to properly label samples from previously unseen categories as novel and avoid high-confidence, incorrect predictions. Existing approaches have focused on either novel inference methods, unique training architectures, or supplementing the training data with additional background samples. Here, we propose a simple regularization technique easily applied to existing CNN architectures that improves open set robustness without a background dataset. Our method achieves state-of-the-art results on open set classification baselines and easily scales to large-scale open set classification problems.

ECCV 2020: REMIND Your Neural Network to Prevent Catastrophic Forgetting

Tyler L. Hayes*, Kushal Kafle*, Robik Shrestha*, Manoj Acharya, Christopher Kanan

* denotes equal contribution.

arXiv 2019

People learn throughout life. However, incrementally updating conventional neural networks leads to catastrophic forgetting. A common remedy is replay, which is inspired by how the brain consolidates memory. Replay involves fine-tuning a network on a mixture of new and old instances. While there is neuroscientific evidence that the brain replays compressed memories, existing methods for convolutional networks replay raw images. Here, we propose REMIND, a brain-inspired approach that enables efficient replay with compressed representations. REMIND is trained in an online manner, meaning it learns one example at a time, which is closer to how humans learn. Under the same constraints, REMIND outperforms other methods for incremental class learning on the ImageNet ILSVRC-2012 dataset. We probe REMIND's robustness to data ordering schemes known to induce catastrophic forgetting. We demonstrate REMIND's generality by pioneering online learning for Visual Question Answering (VQA).

CVPRW 2020: Lifelong Machine Learning with Deep Streaming Linear Discriminant Analysis

Best Paper Award at the CVPR 2020 Workshop on Continual Learning in Computer Vision

Tyler L. Hayes & Christopher Kanan

arXiv 2019

When an agent acquires new information, ideally it would immediately be capable of using that information to understand its environment. This is not possible using conventional deep neural networks, which suffer from catastrophic forgetting when they are incrementally updated, with new knowledge overwriting established representations. A variety of approaches have been developed that attempt to mitigate catastrophic forgetting in the incremental batch learning scenario, where a model learns from a series of large collections of labeled samples. However, in this setting, inference is only possible after a batch has been accumulated, which prohibits many applications. An alternative paradigm is online learning in a single pass through the training dataset on a resource constrained budget, which is known as streaming learning. Streaming learning has been much less studied in the deep learning community. In streaming learning, an agent learns instances one-by-one and can be tested at any time, rather than only after learning a large batch. Here, we revisit streaming linear discriminant analysis, which has been widely used in the data mining research community. By combining streaming linear discriminant analysis with deep learning, we are able to outperform both incremental batch learning and streaming learning algorithms on both ImageNet ILSVRC-2012 and CORe50, a dataset that involves learning to classify from temporally ordered samples.

CVPRW 2020: Stream-51: Streaming Classification and Novelty Detection from Videos

Ryne Roady*, Tyler L. Hayes*, Hitesh Vaidya, Christopher Kanan

* denotes equal contribution.

arXiv 2019

Deep neural networks are popular for visual perception tasks such as image classification and object detection. Once trained and deployed in a real-time environment, these models struggle to identify novel inputs not initially represented in the training distribution. Further, they cannot be easily updated on new information or they will catastrophically forget previously learned knowledge. While there has been much interest in developing models capable of overcoming forgetting, most research has focused on incrementally learning from common image classification datasets broken up into large batches. Online streaming learning is a more realistic paradigm where a model must learn one sample at a time from temporally correlated data streams. Although there are a few datasets designed specifically for this protocol, most have limitations such as few classes or poor image quality. In this work, we introduce Stream-51, a new dataset for streaming classification consisting of temporally correlated images from 51 distinct object categories and additional evaluation classes outside of the training distribution to test novelty recognition. We establish unique evaluation protocols, experimental metrics, and baselines for our dataset in the streaming paradigm.

arXiv 2020: Do We Need Fully Connected Output Layers in Convolutional Networks?

Zhongchao Qian, Tyler L. Hayes, Kushal Kafle, Christopher Kanan

arXiv 2019

Traditionally, deep convolutional neural networks consist of a series of convolutional and pooling layers followed by one or more fully connected (FC) layers to perform the final classification. While this design has been successful, for datasets with a large number of categories, the fully connected layers often account for a large percentage of the network's parameters. For applications with memory constraints, such as mobile devices and embedded platforms, this is not ideal. Recently, a family of architectures that involve replacing the learned fully connected output layer with a fixed layer has been proposed as a way to achieve better efficiency. In this paper we examine this idea further and demonstrate that fixed classifiers offer no additional benefit compared to simply removing the output layer along with its parameters. We further demonstrate that the typical approach of having a fully connected final output layer is inefficient in terms of parameter count. We are able to achieve comparable performance to a traditionally learned fully connected classification output layer on the ImageNet-1K, CIFAR-100, Stanford Cars-196, and Oxford Flowers-102 datasets, while not having a fully connected output layer at all.

ICRA 2019: Memory Efficient Experience Replay for Streaming Learning

Tyler L. Hayes, Nathan D. Cahill, Christopher Kanan

ICRA 2019

In supervised machine learning, an agent is typically trained once and then deployed. While this works well for static settings, robots often operate in changing environments and must quickly learn new things from data streams. In this paradigm, known as streaming learning, a learner is trained online, in a single pass, from a data stream that cannot be assumed to be independent and identically distributed (iid). Streaming learning will cause conventional deep neural networks (DNNs) to fail for two reasons: 1) they need multiple passes through the entire dataset; and 2) non-iid data will cause catastrophic forgetting. An old fix to both of these issues is rehearsal. To learn a new example, rehearsal mixes it with previous examples, and then this mixture is used to update the DNN. Full rehearsal is slow and memory intensive because it stores all previously observed examples, and its effectiveness for preventing catastrophic forgetting has not been studied in modern DNNs. Here, we describe the ExStream algorithm for memory efficient rehearsal and compare it to alternatives. We find that full rehearsal can eliminate catastrophic forgetting in a variety of streaming learning settings, with ExStream performing well using far less memory and computation.

CVPRW 2018: New Metrics and Experimental Paradigms for Continual Learning

Tyler L. Hayes, Ronald Kemker, Nathan D. Cahill, Christopher Kanan

CVPRW 2018

In order for a robotic agent to learn successfully in an uncontrolled environment, it must be able to immediately alter its behavior. Deep neural networks are the dominant approach for classification tasks in computer vision, but typical algorithms and architectures are incapable of immediately learning new tasks without catastrophically forgetting previously acquired knowledge. There has been renewed interest in solving this problem, but there are limitations to existing solutions, including poor performance compared to offline models, large memory footprints, and learning slowly. In this abstract, we formalize the continual learning paradigm and propose new benchmarks for assessing continual learning agents.

CVPR 2018: Compassionately Conservative Balanced Cuts for Image Segmentation

Nathan D. Cahill, Tyler L. Hayes, Renee T. Meinhold, John F. Hamilton

CVPR 2018

The Normalized Cut (NCut) objective function, widely used in data clustering and image segmentation, quantifies the cost of graph partitioning in a way that biases clusters or segments that are balanced towards having lower values than unbalanced partitionings. However, this bias is so strong that it avoids any singleton partitions, even when vertices are very weakly connected to the rest of the graph. Motivated by the Buhler-Hein family of balanced cut costs, we propose the family of Compassionately Conservative Balanced (CCB) Cut costs, which are indexed by a parameter that can be used to strike a compromise between the desire to avoid too many singleton partitions and the notion that all partitions should be balanced. We show that CCB-Cut minimization can be relaxed into an orthogonally constrained lτ -minimization problem that coincides with the problem of computing Piecewise Flat Embeddings (PFE) for one particular index value, and we present an algorithm for solving the relaxed problem by iteratively minimizing a sequence of reweighted Rayleigh quotients (IRRQ). Using images from the BSDS500 database, we show that image segmentation based on CCB-Cut minimization provides better accuracy with respect to ground truth and greater variability in region size than NCut-based image segmentation.

AAAI 2018: Measuring Catastrophic Forgetting in Neural Networks

Ronald Kemker, Angelina Abitino, Marc McClure, Tyler L. Hayes, Christopher Kanan

AAAI 2018

Deep neural networks are used in many state-of-the-art systems for machine perception. Once a network is trained to do a specific task, e.g., bird classification, it cannot easily be trained to do new tasks, e.g., incrementally learning to recognize additional bird species or learning an entirely different task such as flower recognition. When new tasks are added, typical deep neural networks are prone to catastrophically forgetting previous tasks. Networks that are capable of assimilating new information incrementally, much like how humans form new memories over time, will be more efficient than retraining the model from scratch each time a new task needs to be learned. There have been multiple attempts to develop schemes that mitigate catastrophic forgetting, but these methods have not been directly compared, the tests used to evaluate them vary considerably, and these methods have only been evaluated on small-scale problems (e.g., MNIST). In this paper, we introduce new metrics and benchmarks for directly comparing five different mechanisms designed to mitigate catastrophic forgetting in neural networks: regularization, ensembling, rehearsal, dual-memory, and sparse-coding. Our experiments on real-world images and sounds show that the mechanism(s) that are critical for optimal performance vary based on the incremental training paradigm and type of data being used, but they all demonstrate that the catastrophic forgetting problem has yet to be solved.


Peer-Reviewed Papers


Conference Papers