Show simple item record

dc.rights.licenseRestricted to current Rensselaer faculty, staff and students. Access inquiries may be directed to the Rensselaer Libraries.
dc.contributorFox, Peter A.
dc.contributorHendler, James A.
dc.contributorBennett, Kristin P.
dc.contributorKuruzovich, Jason N.
dc.contributor.authorLi, Congrui
dc.date.accessioned2021-11-03T09:09:42Z
dc.date.available2021-11-03T09:09:42Z
dc.date.created2019-09-18T22:37:27Z
dc.date.issued2019-05
dc.identifier.urihttps://hdl.handle.net/20.500.13015/2395
dc.descriptionMay 2019
dc.descriptionSchool of Science
dc.description.abstractThe first case study focuses on the stage before the training of seq2seq model. We introduce a generative chatbot in Chinese language trained with data on a finer level of granularity. Based on the evaluation of A/B testing results by multiple human evaluators, we conclude that the character-level model can still maintain the performance of the word-level benchmark.
dc.description.abstractLots of challenges exist while dealing with language text sequence data directly on document level. The sequence-to-sequence (seq2seq) model is an ideal tool for this task. A basic sequence-to-sequence model consists of two recurrent networks: an encoder that processes the input and a decoder that generates the output. To allow the decoder’s more direct access to the input, an attention mechanism was introduced by researchers so that the decoder can peek into the input at every decoding step. To improve the long-term dependencies, more sophisticated neuron cell structures, such as Long Short-Term Memory and Gated Recurrent Unit, were also developed by researchers. The task of Neural Machine Translation was the very first testbed for seq2seq models with wild success, and then followed by the task of chatbot applications in various domains.
dc.description.abstractThis thesis introduces three innovative case studies using variants of seq2seq model, and each of them focuses on a different stage of the model's training process.
dc.description.abstractThe second case study focuses on the stage during the training of seq2seq model. We introduce an unsupervised information retrieval (IR) model using sequence autoencoder which is competitive with multiple existing techniques, including Jaccard similarity, bag-of-words cosine similarity, tf-idf cosine similarity, as well as the recent neural network approaches such as Doc2Vec and Skip-Thoughts.
dc.description.abstractThe third case study focuses on the stage after the training of seq2seq model. We explore mergers and acquisitions in the domain of business analytics. We further demonstrate the effectiveness of the IR model in the previous case study for measuring business proximity, and also investigate the capability of the IR model’s output as pre-trained input for a downstream supervised task, to prediction acquisitions. For the subsequent task, we compare the variations of models with two different types of inputs as well as three different types of network structure.
dc.description.abstractSophisticated data preprocessing techniques are carried out for each experiment to improve the quality of the training data. Bidirectional seq2seq models with GRU cells and Luong attention are used for all tasks.
dc.description.abstractIn conclusion, research is conducted before, during, and after the training of seq2seq model so that improvements or discoveries are made in each case study to more effectively encode natural language text sequence data at the document level to obtain responses/answers/trends for various training corpora.
dc.language.isoENG
dc.publisherRensselaer Polytechnic Institute, Troy, NY
dc.relation.ispartofRensselaer Theses and Dissertations Online Collection
dc.subjectComputer science
dc.titleTraining and architecting sequence to sequence language models for applications in varied domains
dc.typeElectronic thesis
dc.typeThesis
dc.digitool.pid179649
dc.digitool.pid179651
dc.digitool.pid179653
dc.rights.holderThis electronic version is a licensed copy owned by Rensselaer Polytechnic Institute, Troy, NY. Copyright of original work retained by author.
dc.description.degreePhD
dc.relation.departmentDept. of Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record