EdgeLlama_GPU

An offline ChatGPT-like software integrated to Visual Studios (GPU edition). EdgeLlama is able to run your llama models (alpaca, vicuna and codellama) directly on your pc without need for internet connection. Data kept within your PC and safe for use within your organization. This GPU edition allows edge devices with a GPU core to utilize on GPU instead of CPU. This allows the PC to still function properly while Edgellama is doing the inference.

[BACKGROUND]

This is a Visual Studios Professional Port of llama.cpp which aims to bring LLM to edge devices.
This version of edgellama works on Intel Based CPU with AVX-2 support / CuBLAS running on Cuda for Nvidia

[INSTALLATION]

Download a pre-existing Llama.cpp GGUF model from the internet. An example here: codellama-13b.Q2_K.gguf
Copy the model into any folder.
Install EdgeLlamaNet.vsix file
Launch Visual Studios under "Administrative Mode" for the first time to select the newly downloaded model.
Edgellama is found under View Tab > Other Windows > Edgellama
Query your questions in the text field provided and click on the "Ask" button.
Subsequent launch of Visual Studios does not require administrative access.

[CUDA INSTALLATION] Cuda Toolkit is required for CuBLAS (Cuda Basic Linear Algorithm Support) for Llama.cpp to function properly

Install Cuda Toolkit
During installation process, remember to select "custom install". Uncheck NSight VSE, NSight System, NSight Compute and Visual Studio Integration from the installation before installation. If the step is missed and the installation failed, you will need to uninstall Cuda Toolkit before attempting to reinstall Cuda Toolkit.

OrangeBear