adjust branch trigger

magdalenakuhn17 · Apr 4, 2024 · c9d4557 · c9d4557
1 parent 1ca5634
commit c9d4557
Show file tree

Hide file tree

Showing 2 changed files with 52 additions and 45 deletions.
diff --git a/.github/workflows/fetch_contributors.yaml b/.github/workflows/fetch_contributors.yaml
@@ -1,9 +1,8 @@
 name: Update Contributors List
 
 on:
-  push:
-    branches:
-      - NOREF_adjust_structure
+  workflow_dispatch:
+  push: main
 
 jobs:
   update-contributors:
@@ -21,11 +20,18 @@ jobs:
       - name: Update README with contributors
         run: python src/fetch_contributors.py
       - name: Commit and push if changed
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
         run: |
           git config --global user.email "[email protected]"
           git config --global user.name "GitHub Action"
+          PR_BRANCH="auto-pr-${GITHUB_SHA}"
+          git checkout -b $PR_BRANCH
           git add README.md
           git commit -m "Update contributors list" || exit 0
-          git push
+          git push origin $PR_BRANCH
+          echo "PR_BRANCH=${PR_BRANCH}" >> $GITHUB_ENV
+      - name: Create Pull Request
+        id: create_pr
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          gh pr create --title "NOREF_update_list_of_contributors" --body "Fetch latest contributors"
diff --git a/README.md b/README.md
@@ -8,100 +8,101 @@
 </div>
 <br/>
 <p align="center">
-  Cost reduction tools and techniques for LLM based systems <br> <br>
-  <img src="images/Screenshot%202024-04-04%20at%2007.41.00.png" alt="Alt text" title="Expectation vs. Reality">
+  <img src="images/Screenshot%202024-04-04%20at%2007.41.00.png" alt="Alt text" title="Expectation vs. Reality"> <br>
+  Cost reduction tools and techniques for LLM based systems 
 </p>
 
 
 :point_right: Let’s make sure that your LLM application doesn’t burn a hole in your pocket. <br>
 :point_right: Let’s instead make sure your LLM application generates a positive ROI for you, your company and your users.
 
-# Tools & frameworks to reduce costs
+# Techniques to reduce costs
 
-## 1) :blue_book: Model family and type 
+## 1) :blue_book: Choose model family and type 
 Selecting a suitable model or combination of models builds the foundation of building const-sensible LLM applications.
 
-### In-depth papers that explain underlying concepts
+### Papers 
 * Naveed, Humza, et al. ["A comprehensive overview of large language models."](https://arxiv.org/abs/2307.06435?utm) arXiv preprint arXiv:2307.06435 (2023).
 * Minaee, Shervin, et al. ["Large Language Models: A Survey."](https://arxiv.org/abs/2402.06196) arXiv preprint arXiv:2402.06196 (2024).
-* :speaking_head: call-for-contributions :speaking_head:
-### Tools & frameworks that help with selecting the correct model
-* Hugging face open leaderboard
-### Hands-on blog posts & courses with step by step guide
+### Tools & frameworks
+* [MTEB (Massive Text Embedding Benchmark) Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) by Huggingface
+* [Models](https://huggingface.co/models) by Huggingface
+### Blog posts & courses
 * [How to Evaluate, Compare, and Optimize LLM Systems](https://wandb.ai/ayush-thakur/llm-eval-sweep/reports/How-to-Evaluate-Compare-and-Optimize-LLM-Systems--Vmlldzo0NzgyMTQz?utm)
 * :speaking_head: call-for-contributions :speaking_head: 
 
-## 2) :blue_book: Model size 
+## 2) :blue_book: Reducing model size 
 After chosing the suitable model family, you should consider models with fewer parameters and other techniques that reduce model size.
-* Selection of model parameter size 
+* Model parameter size 
 * Quantization of models
 * Higher degree of model customization (i.e. through RAG or fine-tuning) can achieve the same performance as a bigger model
 * Distillation 
 
-### In-depth papers that explain underlying concepts
+### Papers 
 * :speaking_head: call-for-contributions :speaking_head:
 
-### Tools & frameworks that help reducing model size 
+### Tools & frameworks
 * [LoRA](https://huggingface.co/docs/diffusers/training/lora#lora) and [QLoRA](https://medium.com/@dillipprasad60/qlora-explained-a-deep-dive-into-parametric-efficient-fine-tuning-in-large-language-models-llms-c1a4794b1766) make training large models more efficient
 * :speaking_head: call-for-contributions :speaking_head:
 
-### Hands-on blog posts & courses with step by step guide
+### Blog posts & courses
 * :speaking_head: call-for-contributions :speaking_head: 
-## 3) :blue_book: Open source vs. proprietary models
+## 3) :blue_book: Use open source models
 Consider self-hosting models instead of using proprietary models if you have capable developers in house. Still, have an oversight of Total Cost of Ownership, when benchmarking managed LLMs vs. setting up everything on your own. 
 
-### In-depth papers that explain underlying concepts
+### Papers  
 * :speaking_head: call-for-contributions :speaking_head: 
-### Tools & frameworks that help with self-hosting
-* Huggingface
-* LocalAI
-* Ollama 
-* vLLM
+### Tools & frameworks
+* [LocalAI](https://github.com/mudler/LocalAI)
+* [Ollama ](https://github.com/ollama/ollama)
+* [vLLM](https://github.com/vllm-project/vllm)
+* [llama.cpp](https://github.com/ggerganov/llama.cpp)
 * :speaking_head: call-for-contributions :speaking_head: 
-### Hands-on blog posts & courses with step by step guide
+### Blog posts & courses
 * :speaking_head: call-for-contributions :speaking_head: 
-## 4) :blue_book: Input/Output tokens
-A key cost driver is the amount of input token (user prompt + context) and output token you allow for your LLM. Different techniques to reduce the amount of tokens help in saving costs.
+## 4) :blue_book: Reduce input/output tokens
+A key cost driver is the amount of input tokens (user prompt + context) and output tokens you allow for your LLM. Different techniques to reduce the amount of tokens help in saving costs.
 * Compression
 * Summarization
 
-### In-depth papers that explain underlying concepts
+### Papers  
 * :speaking_head: call-for-contributions :speaking_head: 
-### Tools & frameworks that help with reducing tokens
+### Tools & frameworks 
+* [LLMLingua](https://github.com/microsoft/LLMLingua) by Microsoft for input prompt compression
 * :speaking_head: call-for-contributions :speaking_head: 
-### Hands-on blog posts & courses with step by step guide
+### Blog posts & courses
 * :speaking_head: call-for-contributions :speaking_head: 
 ## 5) :blue_book: Prompt and model routing 
-Add automatic checks to route all incoming user prompts to a suitable model. Follow Least-Model-Principle, which means to by default use the simplest possible logic or LM to answer a users question and only route to more complex LMs if necessary (aka. "LLM Cascading"). This can result to answering certain questions with a predefined response, using SLMs for simple questions and LLMs for complex questions. 
+Add automatic checks to route incoming user prompts to a suitable model. Follow Least-Model-Principle, which means to by default use the simplest possible logic or LM to answer a users question and only route to more complex LMs if necessary (aka. "LLM Cascading"). 
 
-### Tools & frameworks that help with routing
+### Tools & frameworks
 * Native implementation in Python of your custom logic 
-* **Nemo guardrails** to detect and Route based on intent 
+* [Nemo guardrails](https://github.com/NVIDIA/NeMo-Guardrails) to detect and route based on intent 
 * :speaking_head: call-for-contributions :speaking_head: 
-### Hands-on blog posts & courses with step by step guide
+### Blog posts & courses
 * :speaking_head: call-for-contributions :speaking_head: 
 ## 6) :blue_book: Caching 
-If your users tend to send very similar prompts to your LLM system, you can reduce costs by using different cachin techniques:
+If your users tend to send very similar prompts to your LLM system, you can reduce costs by using different cachin techniques.
 * :speaking_head: call-for-contributions :speaking_head: 
-### In-depth papers that explain underlying concepts
+### Papers  
 * :speaking_head: call-for-contributions :speaking_head: 
-### Tools & frameworks that help with caching
+### Tools & frameworks 
 * :speaking_head: call-for-contributions :speaking_head: 
-### Hands-on blog posts & courses with step by step guide
+### Blog posts & courses
 * :speaking_head: call-for-contributions :speaking_head: 
 ## 7) :blue_book: Rate limiting 
 Make sure one single customer is not able to penetrate your LLM and skyrocket your bill. Track amount of prompts per month per user and either hard limit to max amount of prompts or reduce response time when a user is hitting the limit. In addition, detect unnatural/sudden spikes in user requests (similar to DDOS attacks, users/competitors can harm your business by sending tons of requests to your model).
-### Tools & frameworks that help with rate limiting:
+### Tools & frameworks 
 * Simple tracking logic can be implemented in native Python 
 * :speaking_head: call-for-contributions :speaking_head: 
-### Hands-on blog posts & courses with step by step guide
+### Blog posts & courses 
 * :speaking_head: call-for-contributions :speaking_head: 
 ## 8) :blue_book: Cost tracking  
 "You can't improve what you don't measure" --> Make sure to know where your costs are coming from. Is it super active users? Is it a premium model? etc.
-### Tools & frameworks that help with cost tracking
+### Tools & frameworks 
 * Simple tracking logic can be implemented in native Python 
 * :speaking_head: call-for-contributions :speaking_head: 
-### Hands-on blog posts & courses with step by step guide
+### Blog posts & courses
 * :speaking_head: call-for-contributions :speaking_head: 
 ## 9) :blue_book: During development time 
 * Make sure to not send endless API calls to your LLM during development and manual testing.