MCP和LLM的调用细节

beneliu 收录于 AI应用

2025-04-18 约 800 字预计阅读 4 分钟 - 次阅读

内容简介

介绍MCP和LLM之间的协作细节，讲解LLM是如何利用MCP服务来扩展自己的边际能力的。

在用go做了几个MCP的Demo服务后，又对MCP的架构和协议细节进行了深入地学习，对MCP的理解深刻了很多。但是在开发过程中还是有两个关键的问题未得到解答：

cline插件、我的MCP服务、大模型这三者之间的调用流程是怎样的？
大模型是在什么时候确定使用哪些MCP服务的呢？

这里需要注意下，我使用的是vscode的cline插件，所以这里拿cline举例，但是其实客户端也可以是cursor、cherry studio等其他客户端。

在 MCP 官网为我们提供了一个解释：

客户端将你的问题发送给 Claude
Claude 分析可用的工具，并决定使用哪一个或多个
客户端通过 MCP Server 执行所选的工具
工具的执行结果被送回给 Claude
Claude 结合执行结果生成回答
回应最终展示给用户

从以上的解释可以看出，大模型和MCP服务之间的调用过程是分两步完成的：

由 LLM 确定使用哪些 MCP Server
执行对应的 MCP Server 并对执行结果进行重新处理

所以 MCP Server 是由大模型主动选择并调用的。但是大模型具体又是如何确定该使用哪些工具呢？从 MCP 官方提供的 pyhton client example 中可以得到答案：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
    async def start(self) -> None:
        """Main chat session handler."""
        try:
            for server in self.servers:
                try:
                    await server.initialize()
                except Exception as e:
                    logging.error(f"Failed to initialize server: {e}")
                    await self.cleanup_servers()
                    return

            all_tools = []
            for server in self.servers:
                tools = await server.list_tools()
                all_tools.extend(tools)

            tools_description = "\n".join([tool.format_for_llm() for tool in all_tools])

            system_message = (
                "You are a helpful assistant with access to these tools:\n\n"
                f"{tools_description}\n"
                "Choose the appropriate tool based on the user's question. "
                "If no tool is needed, reply directly.\n\n"
                "IMPORTANT: When you need to use a tool, you must ONLY respond with "
                "the exact JSON object format below, nothing else:\n"
                "{\n"
                '    "tool": "tool-name",\n'
                '    "arguments": {\n'
                '        "argument-name": "value"\n'
                "    }\n"
                "}\n\n"
                "After receiving a tool's response:\n"
                "1. Transform the raw data into a natural, conversational response\n"
                "2. Keep responses concise but informative\n"
                "3. Focus on the most relevant information\n"
                "4. Use appropriate context from the user's question\n"
                "5. Avoid simply repeating the raw data\n\n"
                "Please use only the tools that are explicitly defined above."
            )

            messages = [{"role": "system", "content": system_message}]

            while True:
                try:
                    user_input = input("You: ").strip().lower()
                    if user_input in ["quit", "exit"]:
                        logging.info("\nExiting...")
                        break

                    messages.append({"role": "user", "content": user_input})

                    llm_response = self.llm_client.get_response(messages)
                    logging.info("\nAssistant: %s", llm_response)

                    result = await self.process_llm_response(llm_response)

                    if result != llm_response:
                        messages.append({"role": "assistant", "content": llm_response})
                        messages.append({"role": "system", "content": result})

                        final_response = self.llm_client.get_response(messages)
                        logging.info("\nFinal response: %s", final_response)
                        messages.append(
                            {"role": "assistant", "content": final_response}
                        )
                    else:
                        messages.append({"role": "assistant", "content": llm_response})

                except KeyboardInterrupt:
                    logging.info("\nExiting...")
                    break

        finally:
            await self.cleanup_servers()

            ... # 省略其他代码

从代码可以看出，在和大模型进行交互前，将所有工具的结构化描述放到tools_description中，再添加到system_message中，然后把system_message和用户消息一起发送给模型。当模型分析用户请求后，它会决定是否需要调用工具：

无需工具时：模型直接生成自然语言回复。
需要工具时：模型输出结构化 JSON 格式的工具调用请求。

当回复中包含结构化 JSON 格式的工具调用请求，则客户端会根据这个 json 代码调用对应的工具。以上两点可以在get_response、process_llm_response中看到实现，代码逻辑非常简单：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
    def get_response(self, messages: list[dict[str, str]]) -> str:
        """Get a response from the LLM.

        Args:
            messages: A list of message dictionaries.

        Returns:
            The LLM's response as a string.

        Raises:
            httpx.RequestError: If the request to the LLM fails.
        """
        url = "https://api.groq.com/openai/v1/chat/completions"

        headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {self.api_key}",
        }
        payload = {
            "messages": messages,
            "model": "llama-3.2-90b-vision-preview",
            "temperature": 0.7,
            "max_tokens": 4096,
            "top_p": 1,
            "stream": False,
            "stop": None,
        }

        try:
            with httpx.Client() as client:
                response = client.post(url, headers=headers, json=payload)
                response.raise_for_status()
                data = response.json()
                return data["choices"][0]["message"]["content"]

        except httpx.RequestError as e:
            error_message = f"Error getting LLM response: {str(e)}"
            logging.error(error_message)

            if isinstance(e, httpx.HTTPStatusError):
                status_code = e.response.status_code
                logging.error(f"Status code: {status_code}")
                logging.error(f"Response details: {e.response.text}")

            return (
                f"I encountered an error: {error_message}. "
                "Please try again or rephrase your request."
            )

    # 省略部分代码...

    async def process_llm_response(self, llm_response: str) -> str:
        """Process the LLM response and execute tools if needed.

        Args:
            llm_response: The response from the LLM.

        Returns:
            The result of tool execution or the original response.
        """
        import json

        try:
            tool_call = json.loads(llm_response)
            if "tool" in tool_call and "arguments" in tool_call:
                logging.info(f"Executing tool: {tool_call['tool']}")
                logging.info(f"With arguments: {tool_call['arguments']}")

                for server in self.servers:
                    tools = await server.list_tools()
                    if any(tool.name == tool_call["tool"] for tool in tools):
                        try:
                            result = await server.execute_tool(
                                tool_call["tool"], tool_call["arguments"]
                            )

                            if isinstance(result, dict) and "progress" in result:
                                progress = result["progress"]
                                total = result["total"]
                                percentage = (progress / total) * 100
                                logging.info(
                                    f"Progress: {progress}/{total} "
                                    f"({percentage:.1f}%)"
                                )

                            return f"Tool execution result: {result}"
                        except Exception as e:
                            error_msg = f"Error executing tool: {str(e)}"
                            logging.error(error_msg)
                            return error_msg

                return f"No server found with tool: {tool_call['tool']}"
            return llm_response
        except json.JSONDecodeError:
            return llm_response

至此，MCP服务和LLM之间的调用过程已经很明朗了：

依据以上实现原理可以总结：

MCP 的出现是 prompt engineering 发展的产物。模型是通过 prompt engineering，即提供所有工具的结构化描述来确定该使用哪些工具的。
MCP 工具文档至关重要，模型通过工具描述文本来理解和选择工具，因此精心编写工具的名称、docstring 和参数说明至关重要。
由于 MCP 的选择是基于 prompt 的，所以任何模型其实都适配 MCP。

参考资料

Buy me a coffee

赞赏

支付宝

微信