Show simple item record

dc.rights.licenseRestricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.
dc.contributorBennett, Kristin P.
dc.contributorKramer, Peter Roland, 1971-
dc.contributorXu, Yangyang
dc.contributorBequette, B. Wayne
dc.contributor.authorVargas, Andrés
dc.date.accessioned2021-11-03T09:14:43Z
dc.date.available2021-11-03T09:14:43Z
dc.date.created2020-06-15T15:34:10Z
dc.date.issued2019-08
dc.identifier.urihttps://hdl.handle.net/20.500.13015/2475
dc.descriptionAugust 2019
dc.descriptionSchool of Science
dc.description.abstractResults on the PI control simulated data indicate that our framework defeats two anomaly detection benchmarks by a large margin if the theoretical shape signatures are known a priori. Though they will not be known in practice, this establishes a very good theoretical performance limit that can be approached by improving the shape signature estimation algorithm. When the shape signatures are not known, our framework defeats the benchmarks when sensor noise is low, and is more robust variance in the underlying control system. The results also establish that the framework is robust to the size of the initial data data set used to learn the non-anomalous probability measure, and that it is better at detecting anomalies occurring in sensors than those that occur in the parameters of the underlying feedback control system. We also achieve similar but slightly weaker results with a variation of our framework that incorporates slightly less grey-box domain knowledge. This reinforces the foundational assumption of our work, namely, that algorithmic performance can always be improved by injecting more domain knowledge into the procedure.
dc.description.abstractIn addition, we simulate a different type of manufacturing system that reflects a real-life "stirred tank heater'' scenario. In this case, we apply a digital PI controller, which is more realistic than the continuous one used in previous simulations because it accounts for the fact that control systems do not operate instantaneously i.e. control action is delayed. Results of these simulations indicate that the anomaly score is able to capture anomalies in both the underlying process flow and sensor noise.
dc.description.abstractOur grey-box anomaly detection framework addresses the first three challenges of anomaly detection: anomaly concept ambiguity, class imbalance, and anomaly isolation. Anomaly concept ambiguity is addressed through a a novel anomaly scoring function. An anomaly scoring function is mathematical function that takes observations as input and outputs a real-valued quantity, called the anomaly score, that measures the anomalousness of the input. The anomaly scoring function defines the anomaly concept and must be carefully constructed to be appropriate, both mathematically and for the domain at hand. We define our anomaly score in a fashion that is both mathematically and domain appropriate. It is mathematically appropriate because it is defined to be the probability of the input observation occurring under a probability measure that defines non-anomalous data. This probability measure is learned by using an initial set of data that is assumed to be composed of non-anomalies. Our anomaly scoring function is domain-appropriate because it is constructed as a linear combination of a term that encodes sensor anomalousness and a term that encodes process anomalousness. Sensor anomalousness occurs when sensors are behaving differently from expected and process anomalousness occurs when the underlying data generating process is behaving differently from expected. Class imbalance is resolved by framing the anomaly detection task as a hypothesis test.
dc.description.abstractWe present shape signatures, which are non-trivial summary statistics for each unique combination of recipe, tool, sensor, step, and wafer. Shape signatures are based on the fact that manufacturing data almost always exhibits damped oscillations due to the underlying feedback control mechanism. Damped oscillations are physically described by the damped linearly driven oscillator mathematical equation. Shape signatures are defined to be the parameters of this equation that fit the data in each (recipe, tool, sensor, step, wafer) 5-tuple. Hence, shape signatures contain deep information about the underlying feedback control system. We then incorporate shape signatures into a novel anomaly detection framework. This implies that our anomaly detection framework is a grey-box.
dc.description.abstractFeedback control theory is utilized in modern industrial manufacturing facilities to maintain optimal manufacturing conditions. Our work is primarily motivated by the semiconductor manufacturing industry. Semiconductors are electronically conducive materials, notably silicon, that form the building blocks of modern electronic devices. They are used to produce wafers, which are tiny discs produced by machines, or tools, in manufacturing facilities. These wafers are then used in electronic circuits that make up technological devices: from smart phones to T.V.s to car radios. Each manufacturing facility may contain hundreds of tools. Each of these tools applies a recipe, composed of multiple steps, to raw materials in order to produce wafers. Many recipes and hundreds of steps may be performed by each tool on hundreds of thousands of wafers on a daily basis. During each step, in each tool, for each recipe, for each wafer, hundreds of sensors collect data. Anomaly detection and isolation can be applied to this data to identify sensors, steps, and tools that are behaving strangely.
dc.description.abstractThis thesis presents a grey-box statistical anomaly detection and isolation framework that is applicable to general batch manufacturing settings. Anomaly detection strives to identify observations in data that deviate from other observations. Anomaly isolation is the act of determining influential factors of an anomaly once it has been detected. Grey-box models combine partial theoretical structure of the underlying data-generating process with a data-driven algorithm. Batch manufacturing is an industrial manufacturing paradigm that converts raw materials into an end product by processing the raw materials in groups, or lots. Depending on the type of manufacturing, the end product can be anything from toys, to glass, to semiconductors.
dc.description.abstractWe dig deeper into the grey-box by mathematically deriving the damped linearly driven oscillator equation from a set of equations that describes a proportional integral, or PI, control system. PI control systems are employed ubiquitously in the manufacturing industries due to their effectiveness, interpretability, robustness, and ease of tuning. We leverage these derivations to simulate batch manufacturing data, including anomalies. By offering the capability of manually injecting anomalies into the data, this produces data that contains the crucial labels of anomalies that are missing from many anomaly detection data sets. We leverage the simulations to evaluate the performance of our anomaly detection framework.
dc.description.abstractRecent approaches to anomaly detection have leveraged big data and deep learning to achieve high accuracy. However, these methods do not usually attempt to model the underlying data generating process. If more attention is given to incorporating the data generating process into the anomaly detection mechanism, then once anomalies are detected, it is easier to isolate the root cause of the anomaly, and what (if any) corrective measures should be taken. As of late, there has been a call in the engineering and data mining literature for the integration of data-driven and model-based approaches for anomaly detection and isolation. This work provides a response to this call with a grey-box anomaly detection framework based on the principals of feedback control theory.
dc.description.abstractAnomaly detection is notoriously difficult due to four main challenges. First of all, the meaning, or concept, of an anomaly is ambiguous, and it changes depending on the context or domain. Secondly, anomalies are by definition rare, and non-anomalous data points usually vastly outnumber anomalous data points. An anomaly detection algorithm must account for this imbalance between the anomaly and non-anomaly classes. Thirdly, it is usually necessary to perform anomaly isolation in addition to anomaly detection. Thus, any useful anomaly detection algorithm should provide isolation capabilities. Lastly, many anomaly detection data sets do not contain information on which observations are anomalous and which observations are not. That is, anomalies are unlabeled. Lack of labels is a serious problem, since it makes it difficult to evaluate whether or not an anomaly detection algorithm will actually work in practice. Our work addresses each of these challenges.
dc.description.abstractHence, anomalies and non-anomalies are no longer considered as pertaining to two different classes. Instead, anomalies are simply thought of as observations that don't belong to the distribution specified by the null hypothesis. Anomaly isolation is resolved by deconstructing the gradient of the anomaly score into its constituent components. Large elements of the gradient can then serve as heuristics for underlying process pathology. As mentioned previously, the two main components of the anomaly score encode sensor anomalousness and process anomalousness. The gradient can be further deconstructed into subcomponents that indicate the sensors or process parameters that led to the anomaly. Hence, we can isolate the contributing factors, and in practice, knowledge of the isolated contributions can be used to determine the appropriate action in the wake of anomalies.
dc.language.isoENG
dc.publisherRensselaer Polytechnic Institute, Troy, NY
dc.relation.ispartofRensselaer Theses and Dissertations Online Collection
dc.subjectMathematics
dc.titleAnomaly detection for batch manufacturing via greybox feedback control models
dc.typeElectronic thesis
dc.typeThesis
dc.digitool.pid179895
dc.digitool.pid179896
dc.digitool.pid179897
dc.rights.holderThis electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
dc.description.degreePhD
dc.relation.departmentDept. of Mathematical Sciences


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record