DSpace@RPI

DSpace@RPI is a repository of Rensselaer Polytechnic Institute's theses and dissertations which are available in digital format, largely from 2006 to present, along with other selected resources.

Recent Submissions

  • Item
    Effective Data Distillation for Tabular Datasets
    (AAAI, 2024-02-24) Kang, Inwon; Ram, Parikshit; Zhou, Yi; Samulowitz, Horst; Seneviratne, Oshani
    Data distillation is a technique of reducing a large dataset into a smaller dataset. The smaller dataset can then be used to train a model which can perform comparably to a model trained on the full dataset. Past works have examined this approach for image datasets, focusing on neural networks as target models. However, tabular datasets pose new challenges not seen in images. A sample in tabular dataset is a one dimensional vector unlike the two (or three) dimensional pixel grid of images, and Non-NN models such as XGBoost can often outperform neural network (NN) based models. Our contribution in this work is two-fold: 1) We show in our work that data distillation methods from images do not translate directly to tabular data; 2) We propose a new distillation method that consistently outperforms the baseline for multiple different models, including non-NN models such as XGBoost.
  • Item
    Deciphering Crypto Twitter
    (ACM, 2024-05-01) Kang, Inwon; Ahmed Mridul, Maruf; Sanders, Abraham; Ma, Yao; Munasinghe, Thilanka; Gupta, Aparna; Seneviratne, Oshani
    Cryptocurrency is a fast-moving space, with a continuous influx of new projects every year. However, an increasing number of incidents in the space, such as hacks and security breaches, threaten the growth of the community and the development of technology. This dynamic and often tumultuous landscape is vividly mirrored and shaped by discussions within “Crypto Twitter,” a key digital arena where investors, enthusiasts, and skeptics converge, revealing real-time sentiments and trends through social media interactions. We present our analysis on a Twitter dataset collected during a formative period of the cryptocurrency landscape. We collected 40 million tweets using keywords related to cryptocurrency and performed a nuanced analysis that involved grouping the tweets by semantic similarity and constructing a tweet and user network. We used sentence-level embeddings and autoencoders to create K-means clusters of tweets. We identified six groups of tweets and their topics to examine different cryptocurrency-related interests and the change in sentiment over time. For example, we identified different groups of tweets demonstrating coordinated behavior in the market or expressing distrust in centralized cryptocurrency exchanges. Moreover, we discovered sentiment indicators that point to real-life incidents in the crypto world, such as the FTX incident of November 2022. We also constructed and analyzed different networks of tweets and users in our dataset by considering the reply and quote relationships and analyzed the largest components of each network. Our networks reveal a structure of bot activity in Crypto Twitter and suggest that they can be detected and handled using a network-based approach. Our work sheds light on the potential of social media signals to detect and understand crypto events, benefiting investors, regulators, and curious observers alike, as well as the potential for bot detection in Crypto Twitter using a network-based approach.
  • Item
    The use of creative analogies in a complex problem situation
    (Springer, 2014-08-01) Damaskinos, Melanie; Lutsevich, Alexander; Do¨rner, Dietrich; Schmid, Ute; Gu¨ss, C. Dominik
  • Item
    Reducing the Cognitive Load of Visual Analytics of Networks Using Concentrically Arranged Multi-surface Projections Focusing Immersive Real-time Exploration
    (2018-06-01) Ameres, Eric
    The analysis of “Big Data” stretches traditional visualization to its breaking point. This is especially true of highly interconnected relational data that pervades the field. Seemingly in response to that Visual Analytics (VA) and the graphical visualization of data in general are often used only with the goal of simplification and presentation rather than as a tool for rich study and discovery. To maximize the use of visualization as an effective tool for generating new knowledge from complex data, we must understand and address issues of design based on human sensing, perception and overall cognitive processing especially with regard to learning. The Campfire and the visualization paradigm I have developed based on its form (Concentrical-ly Arranged Multi-surface Projections Focusing Immersive Real-time Exploration aka “CAMPFIRE”) are novel, and provide a form and affordances that inspire new methods for the exploration and structured visualization of data that are immersive and visuo-spatially rich. However, novelty is not a measure of effectiveness. Instructional media, instructional methods and their associated cognitive tasks such as evaluating a graph, chart or other visual information carry loading costs of different types that need to be mitigated and moderated by informed design. The proposal was that through the application of Cognitive Load Theory and by specifically designing with careful attention to the effect of split attention and increasing the use of spatiality as a distinct modality, it is possible to reduce cognitive load for certain types of visual analytics tasks. It should be possible to promote the efficacy and engagement of old and new methods of visual analytics by re-imagining their use of space and form, and by applying these theories and practices with that in mind. This is especially true for the radial visualization technique that relies heavily on form, and for display paradigms that can potentially instill a sense of spatial 3-dimensionality in the user. This thesis demonstrates and tests the Campfire style of visualization on a simulated network (graph) visualization type task (e.g., visually inspecting and comparing nodes in a connected graph). It shows that it is possible to better engage the user through heightened dimensionality vs. traditional flat display by creating affordances that offload certain types of spatial processing (rotation and translation) from the user back into the visualization system. This thesis also provides design-based analysis of a variety of cases to give insight into “best practices” and design recommendations with regard to Campfire style visual analytics. It also demonstrates the connections and parallels that this method has to other traditional visualization and information and statistical graphic theory and practice.
  • Item
    Evolving a rapid prototyping environment for visually and analytically exploring large-scale Linked Open Data
    (2011-12-01) Downie, Mark; Kaiser, Paul; Enloe, Dylan; Fox, Peter; Hendler, James A.; Ameres, Eric; Goebel, Johannes
    The lack of development environments for interdisciplinary research conducted on large-scale datasets hampers research at every stage. Projects incur large startup costs as disparate infrastructure is assembled; experimentation slows when software components and environment are mismatched for specific research tasks; and findings are disseminated in forms that are hard to examine, learn from, and reuse. Behind these problems is a common cause - the lack of good tools. When large, heterogeneous and distributed data is added to the equation, further frustration, at the least, ensues. As a result using existing platforms, the programmers of 21 st century interactive visualizations are reduced to working in the same fashion with the same tools as 20 th century database programmers. Our contribution is to bring the tools of digital artists to bear on the aforementioned data analysis and visualization challenges. Here we report on the current state of progress in adapting Field for large-scale, web-based scientific data analysis and visualization with an emphasis on Linked Open Data [1] and especially the current data hosted by RPI [2].

Communities in DSpace@RPI

Select a community to browse its collections.

Now showing 1 - 3 of 3