You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A brief rundown of potential ideas to explore for this project.
If you are interested in working on any of the tasks below (or LLMs/AI in general), I recommend reaching out to Matthew Artz.
Fine-tuning
Currently, prompt engineering is the only method used for fine-tuning the model. Here are some methods that I'd look into:
Breaking down the screenshot into smaller chunks/tokens
Training the model with Mural screenshots
OCR (Optical Character Recognition) tool integration
Confidence scores
Issue with small models
This initial prototype uses LLaVa:13b, which is only 8 GB! I have attempted using LLaVa:34b, which is 20 GB, but it runs incredibly slowly (to the point of unusable) due to the hardware of our CMS macbooks. I believe using a larger model like LLaVa:34b would significantly improve the performance of the LLM. The 8GB model can have a general idea of the Mural screenshot, but I believe it struggles with small text. This then causes it to hallucinate and make up information.
I think a larger model would make it be able to analyze larger images that have more detail or smaller text. This would require further testing.
This then brings me to my next suggestion:
AWS Cloud Compute (Cloud GPUs)
CMS has AWS cloud GPUs which would offer significantly greater computational power. This would allow us to utilize much larger models.
The text was updated successfully, but these errors were encountered:
Future steps for future fellows
A brief rundown of potential ideas to explore for this project.
If you are interested in working on any of the tasks below (or LLMs/AI in general), I recommend reaching out to Matthew Artz.
Fine-tuning
Currently, prompt engineering is the only method used for fine-tuning the model. Here are some methods that I'd look into:
Issue with small models
This initial prototype uses LLaVa:13b, which is only 8 GB! I have attempted using LLaVa:34b, which is 20 GB, but it runs incredibly slowly (to the point of unusable) due to the hardware of our CMS macbooks. I believe using a larger model like LLaVa:34b would significantly improve the performance of the LLM. The 8GB model can have a general idea of the Mural screenshot, but I believe it struggles with small text. This then causes it to hallucinate and make up information.
I think a larger model would make it be able to analyze larger images that have more detail or smaller text. This would require further testing.
This then brings me to my next suggestion:
AWS Cloud Compute (Cloud GPUs)
CMS has AWS cloud GPUs which would offer significantly greater computational power. This would allow us to utilize much larger models.
The text was updated successfully, but these errors were encountered: