I do. I have a PC I set up for it with two video cards. I use ollama and a few different models depending on what I'm working on. I plan on fine tuning one at some point but I'm trying to get the data I want to feed it tagged well.
Awesome - do you mind sharing some generic details about your setup? Do you need to run the cards in an SLI type setup? When the models use RAM, is it the video card mem or actual RAM?
Like I said in another post, I need to skill up, lol
I run Arch and in this case completely headless. It's running an i7-12700, RTX 3060 and a Tesla P40. I don't have a lot of memory in it because I just tossed in what I had laying around which is 32GB. The M.2 in there isn't huge either, but I have what is essentially a NAS that I store anything I don't want to lose on, so space isn't really a big deal.
Ollama was smart and I didn't need to do any kind of special SLI set up or physically link the cards. It just uses them both, filling the memory on the first card (in my case the Tesla) and then using the second card as needed. I've forwarded a port via ssh so that I can use novelcrafter on my laptop or desktop and have it access the local LLM.
I'm hoping to get some fine-tuning done soon since what I want to use it for is quite specific. That reminds me that I came across a model on huggingface I need to see if I can get running well in ollama locally. I use Gemini and Claude (both online) a lot, though Claude is limited in queries. I don't like the censorship and moralizing on them and it's quite funny to see from update to update the behavior changes.
I'm hoping to get a completely uncensored one up and running, but tagging my data is brutal and I've been trying to script that out using ML, but I'm no coder, so it's slow going. I'm having to pick up python as I go.
Thats awesome, thanks for the details! Are you at liberty to tell what you use it for? Again, just curious. I am totally new to all of it but want to learn more.
(post is archived)