Advanced neural networks and their interpretation

Thumbnail Image
Fan, Feng-Lei
Issue Date
Electronic thesis
Biomedical engineering
Research Projects
Organizational Units
Journal Issue
Alternative Title
Deep learning has recently achieved huge successes in many applications including natural language processing, computer vision, medical imaging, and so on. In these cases, deep learning can outperform or compete with human. It is widely recognized that machine learning, especially deep learning, is a paradigm shift in many fields. However, there are still unresolved challenges ahead. On the one hand, although existing models have been demonstrating preferable results, methodologically speaking, there is no one-size-fit-all approach. More powerful and tailored deep learning models are always in demand to fit the characteristics of complicated real-world applications. On the other hand, despite that deep learning performs well, it is difficult to explain its underlying mechanism and understand its behaviors. The success of deep learning is not well underpinned by the effective theory. Lacking interpretability has become a primary obstacle to the wide-spread translation and further development of deep learning techniques. In this dissertation, we mainly address these two challenges by prototyping advanced models and developing their interpretation from four key aspects: 1) aggregation function, 2) nonlinear activation function, 3) a sparse network topology, and 4) width-depth relationship. Specifically, over the past years, major efforts have been dedicated to architecture innovations in the field of neural networks, leading to many advanced models as backbones. Although deep learning is inspired by the bio-computation of the neural system, current deep learning systems fall short of reflecting neuronal diversity. We propose quadratic neurons to address the neuronal diversity problem in deep learning, where inner products (linear operations) are replaced with quadratic counterparts whose non-linearity enhances the expressive ability of the neuron. Then, we build a quadratic autoencoder for low-dose CT tasks. Finally, we theoretically show the approximation ability of quadratic networks and explain a quadratic network as a fuzzy logic gate system by regarding each quadratic neuron as a fuzzy logic gate. Furthermore, we propose to use the soft thresholding function to replace ReLU activation for signal processing tasks. Aided by such a modification, we construct a so-called soft-autoencoder, where soft-thresholding functions are used in the encoder part and linear activation functions are used in the decoder part. A soft-autoencoder can be mapped into a traditional cascaded wavelet shrinkage algorithm. Consequently, Soft-AE can be naturally interpreted as a learned cascaded wavelet shrinkage system. Our denoising experiments demonstrate that Soft-AE not only is interpretable but also offers a competitive performance relative to its counterparts. Furthermore, we propose a generalized linear unit (GenLU) to make an autoencoder more adaptive in nonlinearly filtering images and data, such as the simultaneous denoising and deblurring task. Next, in the successful network architectures, shortcut connections are well established to take the outputs of earlier layers as additional inputs to later layers, which have produced excellent results. Despite the extraordinary effectiveness of shortcuts, there remain important questions on the underlying mechanism and associated functionalities. For example, why are shortcuts powerful? Why do shortcuts generalize well? To address these questions, we propose a sparse shortcut topology for deep learning, in contrast to the densely connected topology. With the help of theoretical deep learning, expressibility and generalizability of the proposed topology are mathematically justified. Therefore, the proposed topology is more accountable than its counterparts that are designed by heuristics. We corroborate our theoretical analyses with comprehensive experiments on well-known benchmarks. Lastly, while classic studies proved that wide networks allow universal approximation, recent research and successes of deep learning demonstrate the power of deep networks. Based on a symmetric consideration, we investigate if the design of artificial neural networks should have a directional preference, and what the mechanism of interaction is between the width and depth of a network. Inspired by the De Morgan law, we address this fundamental question by establishing a quasi-equivalence between the width and depth of ReLU networks in two aspects. First, we formulate two transforms, mapping an arbitrary ReLU network to a wide network and a deep network respectively for either regression or classification so that an essentially same capability of the original network can be implemented. Then, we replace the mainstream artificial neuron type with a quadratic counterpart, and utilize the factorization and continued fraction representations of the same polynomial function to construct a wide network and a deep network respectively. Based on our findings, a deep network has a wide equivalent, and vice versa, subject to an arbitrarily small error.
August 2021
School of Engineering
Full Citation
Rensselaer Polytechnic Institute, Troy, NY
Terms of Use
PubMed ID