Location>code7788 >text

How to construct a big model integration platform similar to One API

Popularity:845 ℃/2025-04-28 13:57:25

As developers in the field of AI, we often need to call multiple different large language models, but in the face of different API specifications and access methods, integration work becomes cumbersome. Building a unified large-model integration platform can greatly simplify this process.

This article will discuss how to implement a large-model integration platform that is compatible with OpenAI API specifications, focusing on/v1/modelsand/v1/chat/completions**Implementation of these two core endpoints.

Architectural design

First, we need to design a clear architecture to unify different large model APIs into a standard interface:

┌─────────────────────┐
                   │ Unified Interface Layer (OpenAI Compatible) │
                   └────────┬───────┘
                              │
                   ┌──────────▼──────────┐
                   │ Routing and load balancing layer │
                   └────────┬───────┘
                              │
          ┌───────────────────┴───────────────────┐
          │ │
 ┌────────▼─────────┐  ┌─────────▼─────────┐ ┌────▼───────────┐
 │ Model Adapter A │ │ Model Adapter B │ Model Adapter C │
 │ (such as OpenAI adapter) │ │ (such as Claude adapter) │ │ (such as local model adapter) │
 └────────┬─────────┘  └─────────┬─────────┘ └────┬───────────┘
          │ │ │
 ┌────────▼─────────┐  ┌─────────▼─────────┐ ┌────▼───────────┐
 │ OpenAI API │ │ Claude API │ │ Local Model │
 └──────────────────┘  └───────────────────┘ └────────────────┘

Core component implementation

We use Python and Flask frameworks to implement key parts of this platform.

1. Project structure

one_api/
 ├── # Main application portal
 ├── # Configuration file
 ├── models/ # Model related
 │ ├── __init__.py
 │ ├── # Model Registration
 │ └── adapters/ # Model adapters
 │ ├── __init__.py
 │ ├── # Basic adapter interface
 │ ├── # OpenAI adapter
 │ ├── # Claude adapter
 │ └── # Local Model Adapter
 ├── api/ # API routing
 │ ├── __init__.py
 │ ├── # /v1/models implementation
 │ └── # /v1/chat/completes implementation
 └── utils/ # Tool function
     ├── __init__.py
     ├── #Certification related
     └── rate_limit.py # rate limit

2. Basic adapter interface

First, define a basic adapter interface, and all model adapters need to implement this interface:

# models/adapters/
 from abc import ABC, abstractmethod
 from typing import Dict, List, Any, Optional

 class BaseModelAdapter(ABC):
     """Base class for all model adapters"""
    
     @abstractmethod
     def list_models(self) -> List[Dict[str, Any]]:
         """Returns to the list of models supported by this adapter"""
         pass
        
     @abstractmethod
     async def generate_completion(self,
                                   model: str,
                                   messages: List[Dict[str, str]],
                                   temperature: Optional[float] = None,
                                   top_p: Optional[float] = None,
                                   max_tokens: Optional[int] = None,
                                   stream: bool = False,
                                   **kwargs) -> Dict[str, Any]:
         """Generate chat completion result""""
         pass
    
     @abstractmethod
     def get_model_info(self, model_id: str) -> Dict[str, Any]:
         """Get details of a specific model"""
         pass

illustrate

  • Design the adapter interface using abstract base class (ABC) to ensure that all subclasses implement the necessary methods
  • list_modelsMethods are used to get a list of models supported by each adapter
  • generate_completionIt is the core method, responsible for calling the actual AI model to generate responses, and using asynchronous design to improve performance
  • get_model_infoUsed to obtain model details, convenient for front-end display and selection

3. Model Registration

Create a central registry to manage all available models and corresponding adapters:

# models/
 from typing import Dict, List, Any, Optional
 from . import BaseModelAdapter
 import logging

 logger = (__name__)

 class ModelRegistry:
     """Central Model Registry, managing all model adapters and routing logic""""
    
     def __init__(self):
         # Adapter Mapping {adapter_name: adapter_instance}
         : Dict[str, BaseModelAdapter] = {}
         # Model map {model_id: adapter_name}
         self.model_mapping: Dict[str, str] = {}
        
     def register_adapter(self, name: str, adapter: BaseModelAdapter) -> None:
         """Register a new model adapter"""
         if name in :
             (f"Adapter '{name}' already exists and will be overwritten")
        
         [name] = adapter
        
         # Register all models supported by this adapter
         for model_info in adapter.list_models():
             model_id = model_info["id"]
             self.model_mapping[model_id] = name
             (f"Registered model: {model_id} -> {name}")
            
     def get_adapter_for_model(self, model_id: str) -> Optional[BaseModelAdapter]:
         """Get the corresponding adapter based on the model ID"""
         adapter_name = self.model_mapping.get(model_id)
         if not adapter_name:
             return None
         return (adapter_name)
    
     def list_all_models(self) -> List[Dict[str, Any]]:
         """List all registered models"""
         all_models = []
         for adapter in ():
             all_models.extend(adapter.list_models())
         return all_models
    
     async def generate_completion(self, model_id: str, **kwargs) -> Dict[str, Any]:
         """Call the result of the specified model generation"""
         adapter = self.get_adapter_for_model(model_id)
         if not adapter:
             raise ValueError(f"Adapter for model '{model_id}' was not found")
        
         return await adapter.generate_completion(model=model_id, **kwargs)

illustrate

  • ModelRegistryAs a central manager, it is responsible for maintaining all model adapters and routing maps
  • passregister_adapterMethod registers a new adapter and automatically obtains all models supported by the adapter
  • model_mappingDictionary stores the mapping of model ID to adapter name for quick search
  • get_adapter_for_modelMethod obtains the corresponding adapter instance based on the model ID
  • list_all_modelsMethod aggregates the model list of all adapters for/v1/modelsEndpoint
  • generate_completionThe method is core routing logic, forwarding requests to the correct adapter for processing

4. OpenAI Adapter Example

The following is an example of an adapter that implements OpenAI:

# models/adapters/
 import aiohttp
 from typing import Dict, List, Any, Optional
 from .base import BaseModelAdapter
 import os
 import logging

 logger = (__name__)

 class OpenAIAdapter(BaseModelAdapter):
     """OpenAI API Adapter"""
    
     def __init__(self, api_key: str, base_url: str = ""):
         self.api_key = api_key
         self.base_url = base_url
         self._models_cache = None
        
     async def _request(self, method: str, endpoint: str, **kwargs) -> Dict[str, Any]:
         """Send a request to OpenAI API"""
         headers = {
             "Authorization": f"Bearer {self.api_key}",
             "Content-Type": "application/json"
         }
        
         url = f"{self.base_url}{endpoint}"
        
         async with () as session:
             async with (
                 method,
                 url,
                 headers=headers,
                 **kwargs
             ) as response:
                 if != 200:
                     error_text = await ()
                     raise Exception(f"OpenAI API Error ({}): {error_text}")
                
                 return await ()
    
     async def _fetch_models(self) -> List[Dict[str, Any]]:
         """Get model list from OpenAI"""
         response = await self._request("GET", "/v1/models")
         return response["data"]
        
     def list_models(self) -> List[Dict[str, Any]]:
         """Returns the list of models supported by OpenAI"""
         if self._models_cache is None:
             # In actual implementation, asynchronous acquisition should be used, and the processing is simplified here
             import asyncio
             self._models_cache = (self._fetch_models())
            
         # Add additional platform-specific information
         for model in self._models_cache:
             model["provider"] = "openai"
            
         return self._models_cache
    
     async def generate_completion(self,
                                  model: str,
                                  messages: List[Dict[str, str]],
                                  temperature: Optional[float] = None,
                                  top_p: Optional[float] = None,
                                  max_tokens: Optional[int] = None,
                                  stream: bool = False,
                                  **kwargs) -> Dict[str, Any]:
         """Call OpenAI API to generate chat complete""""
         payload = {
             "model": model,
             "messages": messages,
             "stream": stream
         }
        
         # Add optional parameters
         If temperature is not None:
             payload["temperature"] = temperature
         if top_p is not None:
             payload["top_p"] = top_p
         if max_tokens is not None:
             payload["max_tokens"] = max_tokens
            
         # Add other passed parameters
         for key, value in ():
             if key not in payload and value is not None:
                 payload[key] = value
                
         # Call OpenAI API
         response = await self._request(
             "POST",
             "/v1/chat/completions",
             json=payload
         )
        
         # Make sure the response format is consistent with our standards
         return self._standardize_response(response)
    
     def _standardize_response(self, response: Dict[str, Any]) -> Dict[str, Any]:
         """Convert OpenAI's response to standard format"""
         # OpenAI already uses standard format, so return directly
         Return response
    
     def get_model_info(self, model_id: str) -> Dict[str, Any]:
         """Get details of a specific model"""
         models = self.list_models()
         for model in models:
             if model["id"] == model_id:
                 return model
         raise ValueError(f"Model '{model_id}' does not exist")

illustrate

  • ImplementedBaseModelAdapterSpecific adapter of the interface, specializing in handling OpenAI API calls
  • useaiohttpConduct asynchronous HTTP requests to improve concurrent processing capabilities
  • _requestPrivate methods encapsulate HTTP request logic to handle authentication and error situations
  • _fetch_modelsMethods get the model list from OpenAI, and the results will be cached when actually used
  • list_modelsImplements the base class interface and adds provider information to facilitate front-end distinction
  • generate_completionIt is the core method, builds the request parameters and calls OpenAI's chat/completes API
  • _standardize_responseNormalize responses into a unified format for easy subsequent processing
  • get_model_infoGet detailed information through model ID

5. API routing implementation

Now, let's implement an API endpoint that complies with OpenAI specifications:

#api/
 from flask import Blueprint, jsonify
 from .. import ModelRegistry

 models_bp = Blueprint('models', __name__)

 def init_routes(registry: ModelRegistry):
     """Initialize Model API Routing"""
    
     @models_bp.route('/v1/models', methods=['GET'])
     async def list_models():
         """List all available models (OpenAI compatible endpoints)"""
         models = registry.list_all_models()
        
         # Return according to OpenAI API format
         return jsonify({
             "object": "list",
             "data": models
         })
    
     @models_bp.route('/v1/models/<model_id>', methods=['GET'])
     async def get_model(model_id):
         """Get specific model details (OpenAI compatible endpoint)"""
         adapter = registry.get_adapter_for_model(model_id)
         if not adapter:
             return jsonify({
                 "error": {
                     "message": f"Model '{model_id}' does not exist",
                     "type": "invalid_request_error",
                     "code": "model_not_found"
                 }
             }), 404
            
         model_info = adapter.get_model_info(model_id)
         return jsonify(model_info)

illustrate

  • Organize routing using Flask's Blueprint for easy modular management
  • init_routesFunctions receive model registry instances and implement dependency injection
  • /v1/modelsEndpoints are fully compatible with OpenAI API specifications, returning all registered models
  • /v1/models/<model_id>Endpoint gets detailed information about the specified model
  • Error situation returns to standard OpenAI error format to ensure client compatibility
#api/
 from flask import Blueprint, request, jsonify, Response, stream_with_context
 import json
 import asyncio
 from .. import ModelRegistry
 from .. import verify_api_key
 from ..utils.rate_limit import check_rate_limit
 import logging

 logger = (__name__)

 chat_bp = Blueprint('chat', __name__)

 def init_routes(registry: ModelRegistry):
     """Initialize chat to complete API routing""""
    
     @chat_bp.route('/v1/chat/completes', methods=['POST'])
     @verify_api_key
     @check_rate_limit
     async def create_chat_completion():
         """Create a chat complete (OpenAI compatible endpoint)"""
         try:
             # parse request data
             data =
             model = ("model")
            
             if not model:
                 return jsonify({
                     "error": {
                         "message": "The 'model' parameter must be specified",
                         "type": "invalid_request_error",
                     }
                 }), 400
            
             adapter = registry.get_adapter_for_model(model)
             if not adapter:
                 return jsonify({
                     "error": {
                         "message": f"Model '{model}' does not exist or is not available",
                         "type": "invalid_request_error",
                         "code": "model_not_found"
                     }
                 }), 404
            
             # Extract parameters
             messages = ("messages", [])
             temperature = ("temperature")
             top_p = ("top_p")
             max_tokens = ("max_tokens")
             stream = ("stream", False)
            
             #Other parameters
             kwargs = {k: v for k, v in () if k not in
                      ["model", "messages", "temperature", "top_p", "max_tokens", "stream"]}
            
             # Streaming output processing
             if stream:
                 async def generate():
                     kwargs["stream"] = True
                     response_iterator = await registry.generate_completion(
                         model_id=model,
                         messages=messages,
                         temperature=temperature,
                         top_p=top_p,
                         max_tokens=max_tokens,
                         **kwargs
                     )
                    
                     # Assume that response_iterator is an asynchronous iterator
                     async for chunk in response_iterator:
                         yield f"data: {(chunk)}\n\n"
                    
                     # End stream
                     yield "data: [DONE]\n\n"
                
                 return Response(
                     stream_with_context(generate()),
                     content_type='text/event-stream'
                 )
            
             # Non-stream output
             response = await registry.generate_completion(
                 model_id=model,
                 messages=messages,
                 temperature=temperature,
                 top_p=top_p,
                 max_tokens=max_tokens,
                 **kwargs
             )
            
             return jsonify(response)
            
         except Exception as e:
             ("Error processing chat/completions request")
             return jsonify({
                 "error": {
                     "message": str(e),
                     "type": "server_error",
                 }
             }), 500

illustrate

  • accomplish/v1/chat/completionsEndpoint, which is the core feature of the OpenAI API
  • Using a decorator@verify_api_keyand@check_rate_limitHandle authentication and current limit
  • Extract model ID, message content, generation parameters and other information from the request
  • Supports stream output and regular output modes
    • Use in streaming modestream_with_contextand SSE (Server-Sent Events) formats
    • The normal mode directly returns the complete JSON response
  • Exception handling mechanism ensures that friendly error messages can be returned even if errors occur.
  • Dynamic parameter processing, allowing the transfer of extra parameters to the underlying model

6. Main application portal

Finally, we integrate all the components into the main application:

#
 from flask import Flask
 from flask_cors import CORS
 from .config import Config
 from . import ModelRegistry
 from . import OpenAIAdapter
 from . import ClaudeAdapter # Assume that it has been implemented
 from . import LocalModelAdapter # Assume that it has been implemented
 from .api import models, chat
 import logging
 import os

 def create_app():
     """Create and configure Flask app""""
     #Configuration log
     (
         level=,
         format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
     )
    
     # Create Flask app
     app = Flask(__name__)
     CORS(app) # Enable cross-domain support
    
     # Load configuration
     .from_object(Config)
    
     # Create a model registry
     registry = ModelRegistry()
    
     # Register a model adapter
     # OpenAI Adapter
     if Config.OPENAI_API_KEY:
         openai_adapter = OpenAIAdapter(
             api_key=Config.OPENAI_API_KEY,
             base_url=Config.OPENAI_BASE_URL
         )
         registry.register_adapter("openai", openai_adapter)
        
     # Claude Adapter
     if Config.CLAUDE_API_KEY:
         claude_adapter = ClaudeAdapter(
             api_key=Config.CLAUDE_API_KEY
         )
         registry.register_adapter("claude", claude_adapter)
        
     # Local model adapter
     if Config.LOCAL_MODELS_ENABLED:
         local_adapter = LocalModelAdapter(
             models_dir=Config.LOCAL_MODELS_DIR
         )
         registry.register_adapter("local", local_adapter)
    
     # Initialize API routing
     models.init_routes(registry)
     chat.init_routes(registry)
    
     # Register a blueprint
     app.register_blueprint(models.models_bp)
     app.register_blueprint(chat.chat_bp)
    
     @('/health', methods=['GET'])
     def health_check():
         ""Health Check Endpoint"""
         return {"status": "healthy"}
    
     return app

 if __name__ == "__main__":
     app = create_app()
     (
         host=("HOST", "0.0.0.0"),
         port=int(("PORT", "8000")),
         debug=("DEBUG", "False").lower() == "true"
     )

illustrate

  • Create Flask applications in factory mode for easy testing and scaling
  • Configure logging system for easy debugging and problem troubleshooting
  • Enable CORS (cross-domain resource sharing) to support front-end cross-domain calls
  • Load settings from configuration files instead of hardcoded
  • Create and initialize the model registry and dynamically register different model adapters according to the configuration
  • Register the adapter conditionally, only the adapter configured with the corresponding API key will be enabled.
  • Initialize API routing and inject the model registry into each routing processing function
  • Added health check endpoints to facilitate monitoring system detection of service status
  • Obtain server startup parameters from environment variables to improve deployment flexibility

7. Configuration file

#
 import os
 from dotenv import load_dotenv

 # Loading environment variables
 load_dotenv()

 class Config:
     """Application Configuration"""
     # API Key
     API_KEYS = ("API_KEYS", "").split(",")
    
     # OpenAI Configuration
     OPENAI_API_KEY = ("OPENAI_API_KEY")
     OPENAI_BASE_URL = ("OPENAI_BASE_URL", "")
    
     # Claude configuration
     CLAUDE_API_KEY = ("CLAUDE_API_KEY")
    
     # Local model configuration
     LOCAL_MODELS_ENABLED = ("LOCAL_MODELS_ENABLED", "False").lower() == "true"
     LOCAL_MODELS_DIR = ("LOCAL_MODELS_DIR", "./models")
    
     # Rate limit configuration
     RATE_LIMIT_ENABLED = ("RATE_LIMIT_ENABLED", "True").lower() == "true"
     RATE_LIMIT_REQUESTS = int(("RATE_LIMIT_REQUESTS", "100")) # Number of requests per minute

illustrate

  • usepython-dotenvload.envEnvironment variables in the file to facilitate the separation of configurations for development and deployment
  • Provide default values ​​to ensure that the application can start normally even if the environment variable is not set
  • Configure multiple API keys through environment variables to support access permissions for different users
  • Configurable OpenAI's basic URL, supporting the use of OpenAI-compatible alternative API services
  • Local model functions can be flexibly configured in different environments through environment variable switches.
  • The rate limiting feature can also be configured to prevent API abuse

Analysis of key technical points

1. Adapter mode

We use adapter mode to unify interface differences between different big model APIs. Each adapter is responsible for converting the API of a specific vendor into our standard interface, which makes it simple to add new models, just implement the corresponding adapter.

2. Asynchronous processing

By usingasync/await, We can handle concurrent requests efficiently, especially for scenarios such as streaming output, asynchronous processing is particularly important.

3. Unified model representation

We unify the representation of the model, ensuring that model capabilities and attributes can be expressed consistently between different adapters, which helps users to switch smoothly between different models.

4. Central Registration Form

ModelRegistryAs a central component, all model adapters are managed and a unified calling interface is provided. It is responsible for core logic such as model routing and adapter selection.

Extended and advanced functions

After implementing basic functions, the following advanced features can be considered:

1. Load balancing and failover

# Add load balancing function in ModelRegistry
 def select_adapter_with_load_balancing(self, model_group: str) -> BaseModelAdapter:
     """Select adapter according to load condition"""
     adapters = self.model_groups.get(model_group, [])
     if not adapters:
         raise ValueError(f"Model group not found '{model_group}'")
    
     # Select the optimal adapter based on various indicators (delay, success rate, etc.)
     # Here is a simplified implementation
     return min(adapters, key=lambda a: self.adapter_metrics[]["latency"])

illustrate

  • In high availability scenarios, multiple adapter instances can be configured for the same model (may point to different regions or different providers)
  • By collecting indicators such as delay and success rate, dynamically select the current optimal adapter
  • When a certain adapter has problems, the system can automatically switch to the backup adapter to achieve failover

2. Cache layer

Add a cache layer for common requests to reduce the frequency of calling backend APIs, reduce costs and improve response speed.

Summarize

By building such a large-model integration platform, we can greatly simplify the complexity of multi-model application development. Developers only need to call the unified OpenAI compatible interface, and the platform will automatically handle all underlying details, including API differences, authentication, routing and other issues.

This architecture is not only suitable for simple calling scenarios, but also serves as an infrastructure for building more complex AI applications, such as solving more complex problems by dynamically selecting the model that is most suitable for a specific task, or implementing collaboration between models.

I hope that the technical ideas and code examples provided in this article can help you build your own large model integration platform and provide more flexible and powerful infrastructure support for AI application development.

Written at the end

If you are interested in the technical details and source code implementation of this article, please follow my WeChat official account【Song Ge Ai Automation】. Every week, I will publish an in-depth technical article on my official account to analyze the implementation principles of various practical tools from the perspective of source code.

Review of the last issue: (Small model tool calling capability activation: Prompt engineering practice taking Qwen2.5 0.5B as an example