Software
NoticIA A LLM finetuning and LLM evaluation library for the NoticIA dataset. The dataset consisting of 850 Spanish news articles featuring prominent clickbait headlines, each paired with high-quality, single-sentence generative summarizations written by humans. |
Clickbait Fighter An AI that generates one-sentence summaries of sensational and clickbait news articles, which is used daily by Spanish users. I crafted the training dataset by hand. I trained the model on 8 A100 GPUs, and the demo runs on the OmegaAI cloud, utilizing vLLM and Ray. User feedback is used to continuously improve the model. |
GoLLIE We present GoLLIE, a Large Language Model trained to follow annotation guidelines. GoLLIE outperforms previous approaches on zero-shot Information Extraction and allows the user to perform inferences with annotation schemas defined on the fly. Different from previous approaches, GoLLIE is able to follow detailed definitions and does not only rely on the knowledge already encoded in the LLM. |
T-Projection T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets. The code is built on top of 🤗HuggingFace's Transformers and 🤗HuggingFace's Accelerate library. |
Sequence Labeling with LLMs Sequence Labelling with LLMs is a library code for performing Sequence Labelling with Language Models (LLMs) as a Text2Text constrained generation task. The code is built on top of 🤗HuggingFace's Transformers and 🤗HuggingFace's Accelerate library. |
LM Contamination Index The LM Contamination Index is a manually created database of contamination evidences for LMs. Please |
Easy-Translate Easy-Translate is a script for translating large text files with a SINGLE COMMAND. Easy-Translate is designed to be as easy as possible for beginners and as seamlesscustomizable and as possible for advanced users. |
Easy Label Projection Easy Label Projection is a library that allows to easily project labels from one dataset into another. You can automatically generate datasets for languages for which you do not have any labelled data using mGiza, FastAlign, SimALign or AWESOME. |
Context-enriched multilingual named entity recognition using knowledge bases. A NER frameworks that (1) identifies possible entity candidates by analyzing the input sentence structure, (2) links the candidate to an existing updated knowledge base if possible, and (3) performs the fine-grained classification using the input sentence plus the retrieved information from the KB about the entity. |
MetaVec A monolingual and cross-lingual meta-embedding generation and evaluation framework. |
Self Driving Car in Video Games A supervised deep neural network that learns how to drive in video games. The main objective of this project is to achieve a model that can drive in Grand Theft Auto V. Given a waypoint, the model is expected to reach the destination as fast as possible avoiding other cars, humans and obstacles. The model is trained using human labelled data. We record the game and key inputs of humans while they play the game, this data is used to train the model. |