v1.75.8-stable - Team Member Rate Limits
Deploy this versionβ
- Docker
 - Pip
 
docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:v1.75.8-stable
pip install litellm
pip install litellm==1.75.8
Key Highlightsβ
- Team Member Rate Limits - Individual rate limiting for team members with JWT authentication support.
 - Performance Improvements - New experimental HTTP handler flag for 100+ RPS improvement on OpenAI calls.
 - GPT-5 Model Family Support - Full support for OpenAI's GPT-5 models with 
reasoning_effortparameter and Azure OpenAI integration. - Azure AI Flux Image Generation - Support for Azure AI's Flux image generation models.
 
Team Member Rate Limitsβ
LiteLLM MCP Architecture: Use MCP tools with all LiteLLM supported models
This release adds support for setting rate limits on individual members (including machine users) within a team. Teams can now give each agent its own rate limitsβso that heavy-traffic agents donβt impact other agents or human users.
Agents can authenticate with LiteLLM using JWT and the same team role as human users, while still enforcing per-agent rate limits.
New Models / Updated Modelsβ
New Model Supportβ
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features | 
|---|---|---|---|---|---|
| Azure AI | azure_ai/FLUX-1.1-pro | - | - | $40/image | Image generation | 
| Azure AI | azure_ai/FLUX.1-Kontext-pro | - | - | $40/image | Image generation | 
| Vertex AI | vertex_ai/deepseek-ai/deepseek-r1-0528-maas | 65k | $1.35 | $5.4 | Chat completions + reasoning | 
| OpenRouter | openrouter/deepseek/deepseek-chat-v3-0324 | 65k | $0.14 | $0.28 | Chat completions | 
Featuresβ
- OpenAI
- Added 
reasoning_effortparameter support for GPT-5 model family - PR #13475, Get Started - Support for 
reasoningparameter in Responses API - PR #13475, Get Started 
 - Added 
 - Azure OpenAI
- GPT-5 support with max_tokens and 
reasoningparameter - PR #13510, Get Started 
 - GPT-5 support with max_tokens and 
 - AWS Bedrock
- Streaming support for bedrock gpt-oss model family - PR #13346, Get Started
 /messagesendpoint compatibility withbedrock/converse/<model>- PR #13627- Cache point support for assistant and tool messages - PR #13640
 
 - Azure AI
- New Azure AI Flux Image Generation provider - PR #13592, Get Started
 - Fixed Content-Type header for image generation - PR #13584
 
 - CometAPI
- New provider support with chat completions and streaming - PR #13458
 
 - SambaNova
- Added embedding model support - PR #13308, Get Started
 
 - Vertex AI
 - hosted_vllm
- Added 
reasoning_effortparameter support - PR #13620, Get Started 
 - Added 
 
Bugsβ
- OCI
- Fixed streaming issues - PR #13437
 
 - Ollama
- Fixed GPT-OSS streaming with 'thinking' field - PR #13375
 
 - VolcEngine
- Fixed thinking disabled parameter handling - PR #13598
 
 - Streaming
- Consistent 'finish_reason' chunk indexing - PR #13560
 
 
LLM API Endpointsβ
Featuresβ
Bugsβ
- Real-time API
- Fixed endpoint for no intent scenarios - PR #13476
 
 - Responses API
- Fixed 
stream=True+background=Truewith Responses API - PR #13654 
 - Fixed 
 
MCP Gatewayβ
Featuresβ
- Access Control & Configuration
- Enhanced MCPServerManager with access groups and description support - PR #13549
 
 
Bugsβ
- Authentication
- Fixed MCP gateway key authentication - PR #13630
 
 
Management Endpoints / UIβ
Featuresβ
- Team Management
- Team Member Rate Limits implementation - PR #13601
 - JWT authentication support for team member rate limits - PR #13601
 - Show team member TPM/RPM limits in UI - PR #13662
 - Allow editing team member RPM/TPM limits - PR #13669
 - Allow unsetting TPM and RPM in Teams Settings - PR #13430
 - Team Member Permissions Page access column changes - PR #13145
 
 - Key Management
 - UI Improvements
 - Credentials
- Added CredentialDeleteModal component and integration with CredentialsPanel - PR #13550
 
 - Admin & Permissions
- Allow routes for admin viewer - PR #13588
 
 
Bugsβ
- SCIM Integration
- Fixed SCIM Team Memberships metadata handling - PR #13553
 
 - Authentication
- Fixed incorrect key info endpoint - PR #13633
 
 
Logging / Guardrail Integrationsβ
Featuresβ
- Langfuse OTEL
 - MLflow
- Updated MLflow logger usage span attributes - PR #13561
 
 
Bugsβ
- Security
 
Performance / Loadbalancing / Reliability improvementsβ
Featuresβ
- HTTP Performance
- New 'EXPERIMENTAL_OPENAI_BASE_LLM_HTTP_HANDLER' flag for +100 RPS improvement on OpenAI calls - PR #13625
 
 - Database Monitoring
- Added DB metrics to Prometheus - PR #13626
 
 - Error Handling
- Added safe divide by 0 protection to prevent crashes - PR #13624
 
 
Bugsβ
- Dependencies
- Updated boto3 to 1.36.0 and aioboto3 to 13.4.0 - PR #13665
 
 
General Proxy Improvementsβ
Featuresβ
- Database
- Removed redundant 
use_prisma_migrateflag - now default - PR #13555 
 - Removed redundant 
 - LLM Translation
 
New Contributorsβ
- @TensorNull made their first contribution in PR #13458
 - @MajorD00m made their first contribution in PR #13577
 - @VerunicaM made their first contribution in PR #13584
 - @huangyafei made their first contribution in PR #13607
 - @TomeHirata made their first contribution in PR #13561
 - @willfinnigan made their first contribution in PR #13659
 - @dcbark01 made their first contribution in PR #13633
 - @javacruft made their first contribution in PR #13631
 

