1. Purpose of this article
Without any framework, using C# to write an agent to call Alibaba Qianwen's big model to complete the scheduled tasks. At the same time, a prototype of an extensible agent framework is completed.
2. Expected readers
This article assumes that readers have already understood some basic concepts, such as AI, functioncall, c# programming, etc.
3. Main technologies
The following points are mainly used in the process of writing agents:
- Use the C# WebClient to communicate with Qianwen interface.
- Use functioncall function to implement external function calls
4. Problems solved
I encountered some difficulties in achieving the above goals. I will record them here, give my own notes, and also remind future generations.
- Too many tools. We all know that the ability of large models to call external functions is based on a functional functioncall of the large model itself. Later, this function developed into a very popular MCP. No matter which form, the underlying essence is the same. When inputting information into the big model, it also provides some tools (tools) to the big model for the big model to choose to call when needed (in fact, it is said that the big model calls are not suitable here, and the big model should point out which function should be called by the agent). For agents focusing on specific areas, it may not take too many tools to get the job done. However, for some general agents, the number of functions required to assist humans in completing any possible tasks is very large. Imagine there is such an agent who can be your assistant, not only help you at work, but also handle various things for you in life: checking the weather, booking air tickets, reminding, ordering takeout, doing travel planning, etc. The tools needed to implement such a comprehensive agent cannot be imagined. Every time we request a big model, if we pass these tools to the big model, not only will the data be large and the speed be slow, but your big model bill will also increase rapidly, because the big model is charged according to tokens. A few tricks were designed for this to reduce the number of tools each time a request is submitted.
- Keyword method: This method is to monitor the user's requested content. If the specified keyword is included, the corresponding tools will be attached. For example: When the user mentions "weather" and "temperature", submit the tools that query the weather together.
- Grouping controller method: This method is to divide all tools into some major categories, each major category contains a certain number of tools, and write a tool for each major category. In this way, when the big model selects tools, it first selects the tool of the major category, and then dynamically loads other tools under this category in the implementation of the major category tool. For example: we can give "booking things" as a major category, which includes "booking air tickets", "booking hotels", "ordering takeaways", etc. When the big model analyzes that the user's intention is "scheduled", it first calls the tool of this major class, and then dynamically loads the tools of each subclass in the implementation of the major class tools, and then calls the actual tool according to the user's needs, such as "booking air tickets". The advantage of the grouping controller method is that it reduces the sub-category of non-essential tools, and the disadvantage is that it requires a secondary submission to be located on the specific sub-category tools. Optimization will be carried out later to minimize the opportunity for secondary submission.
- A big model can only handle one task at a time: when using the big model function function to make function calls, it is found that the big model can only return one function call per request (if the point is wrong, please give me some advice). For example, the user requested: "Help me check the weather in Beijing tomorrow, and then book a flight ticket from Shenzhen to Beijing tomorrow." In the above example, the user actually gave the big model two tasks, the first was to check the weather and the second was to book the air ticket. Both tasks can be completed through the functioncall function of the big model, but in a request, the big model cannot return 2 calls, and can only return one, such as: getWeather (weather query). To do this, we need to design a task decomposition tool for such a scenario, telling the big model that if there are multiple tasks, we can use methods to decompose the task into a smallest unit that can be executed separately. In this way, the implementation in the task decomposition function can resubmit each individual task, and then wait for all tasks to return, sort out these contents and send them to the user.
- Logical processing: The big model itself has the ability to reason, and how to transform this ability to solve practical problems has become the key in our task. When the task hierarchy method above is used to handle complex tasks, there may be a dependency between each task. For example, the user requested: "If the weather in Beijing is good tomorrow, please book me a flight ticket from Shenzhen to Beijing." It is obvious that the execution of the task of "booking air tickets" depends on the results of "querying Beijing weather". Through task decomposition and task dependence, the reasoning ability of large models can solve this type of problem.
5. Current completed functions
- Implement natural language to conduct statistical analysis of internal data and give graphical reports. For example: Check who is the three people who work the most overtime. The big model will generate the corresponding SQL query statement based on the user's description, execute this query statement to get the result, and then submit the result to a big model to answer the user's questions. Of course, there is a prerequisite here, which is to send the results of the internal system data table to the big model (using the system's propt).
- Personal work assistant: for example, you can remind, notify, meeting minutes, etc.
6. Code description: I plan to package and download the relevant code, but after writing the article, I found that the attachment could not be inserted. In addition, the code itself is not difficult. The code contains some of our own class library calls, so it cannot be run directly. However, these class library calls do not affect reading and understanding. If you replace these class libraries (not many, it is easy to modify) you can run.
Since I cannot download it, I will take the time to explain some of the main code separately and finally put it on github to download it.