Tools

  • TGCL. TGCL is an advanced spaced repetition framework designed to enhance the training efficacy of GNNs and understand their learning dynamics. It schedules examples for training, optimize the timing and sequence (order) based on the evolving complexity of training data. To achieve this, it uses a combination of multiview graph and text complexity formalisms. TGCL can effectively tailor the curriculum to the unique learning dynamics of each model and can learn curricula that are transferable across different GNN models and datasets.

  • Ling-CL. The linguistic curriculum learning algorithm has three features. a) Estimating the importance of linguistic indices using a data-driven approach, b) The application of a “linguistic curriculum” to enhance the model's performance from a linguistic perspective, and c) Identifying the core set of linguistic indices needed to learn a task. This tool also evaluates the model's ability to handle different linguistic indices.

  • MCCL. Multiview Competence-based Curriculum Learning (MCCL) is a new curriculum learning framework for GNNs that builds on graph complexity formalisms (as difficulty criteria) and model competence during training. The model consists of a scheduling scheme which derives curricula by accounting for different views of sample difficulty. It effectively leverages complexity formalisms of graph data, taking into account multiview difficulty of training data samples and model competency.

  • HuCurl. A framework for automatically finding an optimal curriculum for a given dataset and model.

  • GTNN. Graph Text Neural Network (GTNN) is a generic and trend-aware curriculum learning approach that effectively integrates textual and structural information in text graphs for relation extraction between entities, which we consider as node pairs in graphs. The proposed model extends existing curriculum learning approaches by incorporating sample-level loss trends to better discriminate easier from harder samples and schedule them for training.

  • AMNN. Attentive Multiveiw Neural Network (AMNN) model is a framework that can combine different views (representations) of the same input through effective data fusion and attention strategies for ranking purposes. We developed the model to find the most probable diseases that match with clinical descriptions of patients, using data from the Undiagnosed Diseases Network.

  • RbF Curriculum Learner. This tool demonstrates spaced repetition for training any neural network. RbF is inspired by the broad evidence in psychology that shows human ability to retain information improves with repeated exposure and exponentially decays with delay since last exposure. RBF works based on spaced repetition in which training instances are repeatedly presented to the network on a schedule determined by a spaced repetition algorithm. RbF shortens or lengthens review intervals for training instances with respect to a few indicators.

  • Leitner system. Implementation of Leitner system for simultaneously training a given neural network and identifying spurious instances (those with wrong labels) in its input dataset.

  • Twitter Crawler. A crawler for searching on Twitter and obtaining tweets and user networks.

Datasets

  • Cancer Type Dataset: This dataset is developed to obtain population-level statistics of cancer patients. It contains 3.8k Reddit posts annotated by at least three annotators for relevance to specific cancer types. If you use this dataset, please cite (Elgaar et al., 2023).

  • Alcohol Risk Dataset: This dataset is developed to obtain population-level statistics of alcohol use reports through social media. It contains more than 9k tweet, annotated by at least three workers for report of first-person alcohol use, intensity of the drinking (light vs. heavy), context of drinking (social vs. individual), and time of drinking (past, present, or future). If you use this dataset, please cite (Amiri et al., 2018).

  • Churn Dataset: The dataset contains labeled tweets about three telco brands: Verizon, AT&T, and T-Mobile. Tweet are labeled as churny or not-churny, where churny tweets indicate a high risk of canceling the brand's service. Labels are obtained through crowdsourcing and each tweet is labeled by at least three annotators. Fleiss’ kappa is 0.62, which indicates substantial agreement among annotators. Cohen's kappa computed over 1073 instances related to T-Mobile, that were independently labeled by one of our team members and compared against the aggregation of the three annotators judgments over these instances, is 0.93, which indicates substantial annotation agreement as well. If you use this dataset, please cite (Amiri et al., 2015).