The readme contains description of experiment notebooks in this repository.
- QA_extraction_generation.ipynb: This notebook uses Hugging Face's transformers package to experiment with extractive and generative question answering with ROSA docs.
- haystack-qa.ipynb explores the Haystack framework for question answering, both extractive and generative (with two different versions for that: a local seq2seq model and one based in the OpenAI API).
- create-validation-dataset.ipynb: reads the FAQ questionnaire from the ROSA workshop documents and creates a fine-tuning or validation dataset for text generation models.
- langchain-openai.ipynb explores the langchain framework that helps develop with large language models for various tasks including question answering.
- langchain-pipeline-rosa.ipynb illustrates the same LangChain worflow of prompt-engineering for ROSA question answering, but using a Hugging Face model running locally, instead of accessing a remote model via an API.
- langchain-api-client.ipynb defines a simple wrapper around a model exposed by the text-generation-webui API and uses LangChain to prompt-engineer it for ROSA question answering.
- rosa-demo-open-ai-wfaq.ipynb is a general demo of a Question Answering (QA) workflow for ROSA documentation, including vector embeddings for document search and generative answers orchestrated with LangChain. Prompt engineering is also covered. This demo uses the OpenAI API for embeddings and language modeling.
- gpu-footprint.ipynb computes gpu requirements of models with different number of model parameters.
- flan-t5-3B-general-tasks: finetunes the flan t5 model for sentiment analysis task and text summarization task.
- flan-t5-3B-RosaQA: finetunes the flan t5 model for question answering with ROSA service documentation. It then shows before and after finetuning quality of model answers.
- QA_evaluation_metrics_demo.ipynb explores evaluation metrics for LLMs in Question-Answering (QA) tasks. It covers metrics for QA as well as metrics related to model complexity and human evaluation.
- langchain-evaluation.ipynb explores how we can use specific criteria to evaluate model outputs, focussing on detailed examination within the Langchain framework.
- retriever-evaluation.ipynb explores evaluation metrics for document retrieval techniques.
- Model Serving with Ray provides a demo for how to serve an LLM on OpenShift using a multi-GPU ray cluster.
- Argilla 101 shows how to connect with and use the Argilla instance on the cluster.
- "Collect application feedback using Argilla shows how to create a feedback dataset and involve annotators for your project.