Click the model picker and click "Manage Models...".
Select "Hugging Face" provider.
Provide your Hugging Face Token, you can get one in your settings page. You only need to give it the inference.serverless permissions.
Choose the models you want to add to the model picker. 🥳
Each model entry also offers cheapest and fastest mode for each model. fastest selects the provider with highest throughput and cheapest selects the provider with lowest price per output token.
Single API to switch between multiple providers: Cerebras, Cohere, Fireworks AI, Groq, HF Inference, Hyperbolic, Nebius, Novita, Nscale, SambaNova, Together AI, and more. See the full list of partners in the Inference Providers docs.
Built for high availability (across providers) and low latency.
Transparent pricing: what the provider charges is what you pay.
💡 The free Hugging Face user tier gives you a small amount of monthly inference credits to experiment. Upgrade to Hugging Face PRO or Enterprise for $2 in monthly credits plus pay-as-you-go access across all providers!
Requirements
VS Code 1.104.0 or higher.
Hugging Face access token with inference.serverless permissions.
🛠️ Development
git clone https://github.com/huggingface/huggingface-vscode-chat
cd huggingface-vscode-chat
npm install
npm run compile