feat: Refactor AI agent tool syntax to XML format for improved robustness and parsing

This commit is contained in:
Slipstream 2025-05-31 19:18:04 -06:00
parent e75aa2d234
commit 113ce20b2f
Signed by: slipstream
GPG Key ID: 13E498CE010AC6FD
2 changed files with 249 additions and 94 deletions

View File

@ -10,6 +10,7 @@ import datetime # For snapshot naming
import random # For snapshot naming
from typing import Dict, Any, List, Optional, Tuple
from collections import defaultdict # Added for agent_shell_sessions
import xml.etree.ElementTree as ET
# Google Generative AI Imports (using Vertex AI backend)
from google import genai
@ -64,61 +65,77 @@ COMMIT_AUTHOR = "AI Coding Agent Cog <me@slipstreamm.dev>"
AGENT_SYSTEM_PROMPT = """You are an expert AI Coding Agent. Your primary function is to assist the user (bot owner) by directly modifying the codebase of this Discord bot project or performing related tasks. This bot uses discord.py. Cogs placed in the 'cogs' folder are automatically loaded by the bot's main script, so you typically do not need to modify `main.py` to load new cogs you create in that directory. You operate by understanding user requests and then generating specific inline "tool calls" in your responses when you need to interact with the file system, execute commands, or search the web.
**Inline Tool Call Syntax:**
When you need to use a tool, your response should *only* contain the tool call block, formatted exactly as specified below. The system will parse this, execute the tool, and then feed the output back to you in a subsequent message prefixed with "ToolResponse:".
IMPORTANT: Do NOT wrap your tool calls in markdown code blocks (e.g., ```tool ... ``` or ```json ... ```). Output the raw tool syntax directly, starting with the tool name (e.g., `ReadFile:`).
**XML Tool Call Syntax:**
When you need to use a tool, your response should *only* contain the XML block representing the tool call, formatted exactly as specified below. The system will parse this XML, execute the tool, and then feed the output back to you in a subsequent message prefixed with "ToolResponse:".
IMPORTANT: Do NOT wrap your XML tool calls in markdown code blocks (e.g., ```xml ... ``` or ``` ... ```). Output the raw XML directly, starting with the root tool tag (e.g., `<ReadFile>`).
1. **ReadFile:** Reads the content of a specified file.
```tool
ReadFile:
path: <string: path_to_file>
```xml
<ReadFile>
<path>path/to/file.ext</path>
</ReadFile>
```
(System will provide file content or error in ToolResponse)
2. **WriteFile:** Writes content to a specified file, overwriting if it exists, creating if it doesn't.
```tool
WriteFile:
path: <string: path_to_file>
content: |
<string: multi-line_file_content>
```xml
<WriteFile>
<path>path/to/file.ext</path>
<content><![CDATA[
Your multi-line file content here.
Special characters like < & > are fine.
]]></content>
</WriteFile>
```
(System will confirm success or report error in ToolResponse)
3. **ApplyDiff:** Applies a diff/patch to a file. Use standard unidiff format for the diff_block.
```tool
ApplyDiff:
path: <string: path_to_file>
diff_block: |
<string: multi-line_diff_content>
```xml
<ApplyDiff>
<path>path/to/file.ext</path>
<diff_block><![CDATA[
--- a/original_file.py
+++ b/modified_file.py
@@ -1,3 +1,4 @@
line 1
-line 2 old
+line 2 new
+line 3 added
]]></diff_block>
</ApplyDiff>
```
(System will confirm success or report error in ToolResponse)
4. **ExecuteCommand:** Executes a shell command.
```tool
ExecuteCommand:
command: <string: shell_command_to_execute>
```xml
<ExecuteCommand>
<command>your shell command here</command>
</ExecuteCommand>
```
(System will provide stdout/stderr or error in ToolResponse)
5. **ListFiles:** Lists files and directories at a given path.
```tool
ListFiles:
path: <string: path_to_search>
recursive: <boolean: true_or_false (optional, default: false)>
```xml
<ListFiles>
<path>path/to/search</path>
<recursive>true</recursive> <!-- boolean: "true" or "false". If tag is absent or value is not "true", it defaults to false. -->
</ListFiles>
```
(System will provide file list or error in ToolResponse)
6. **WebSearch:** Searches the web for information.
```tool
WebSearch:
query: <string: search_query>
```xml
<WebSearch>
<query>your search query</query>
</WebSearch>
```
(System will provide search results or error in ToolResponse)
7. **TaskComplete:** Signals that the current multi-step task is considered complete by the AI.
```tool
TaskComplete:
message: <string: A brief summary of what was accomplished or the final status.>
```xml
<TaskComplete>
<message>A brief summary of what was accomplished or the final status.</message>
</TaskComplete>
```
(System will acknowledge and stop the current interaction loop.)
@ -452,63 +469,55 @@ class AICodeAgentCog(commands.Cog):
if len(history) > max_history_items:
self.agent_conversations[user_id] = history[-max_history_items:]
async def _parse_and_execute_tool_call(self, ctx: commands.Context, ai_response_text: str) -> Optional[str]:
async def _parse_and_execute_tool_call(self, ctx: commands.Context, ai_response_text: str) -> Tuple[str, Optional[str]]:
"""
Parses AI response for an inline tool call, executes it, and returns the tool's output string.
Parses AI response for an XML tool call, executes it, and returns the tool's output string.
Returns a tuple: (status: str, data: Optional[str]).
Status can be "TOOL_OUTPUT", "TASK_COMPLETE", "NO_TOOL".
Data is the tool output string, completion message, or original AI text.
"""
tool_executed = False # Flag to indicate if any tool was matched and attempted
try:
clean_ai_response_text = ai_response_text.strip()
# Remove potential markdown ```xml ... ``` wrapper
if clean_ai_response_text.startswith("```"):
# More robustly remove potential ```xml ... ``` or just ``` ... ```
clean_ai_response_text = re.sub(r"^```(?:xml)?\s*\n?", "", clean_ai_response_text, flags=re.MULTILINE)
clean_ai_response_text = re.sub(r"\n?```$", "", clean_ai_response_text, flags=re.MULTILINE)
clean_ai_response_text = clean_ai_response_text.strip()
# --- TaskComplete ---
# Check for TaskComplete first as it's a terminal operation for the loop.
task_complete_match = re.search(r"TaskComplete:\s*message:\s*(.*)", ai_response_text, re.IGNORECASE | re.DOTALL)
if task_complete_match:
tool_executed = True
# Ensure we capture the message correctly, even if it's multi-line, up to the end of the tool block or string.
# The (.*) should be greedy enough for simple cases.
completion_message = task_complete_match.group(1).strip()
return "TASK_COMPLETE", completion_message
if not clean_ai_response_text or not clean_ai_response_text.startswith("<") or not clean_ai_response_text.endswith(">"):
return "NO_TOOL", ai_response_text
# --- ReadFile ---
if not tool_executed:
# Path can contain spaces, so .+? is good. Ensure it doesn't grab parts of other tools if text is messy.
read_file_match = re.search(r"ReadFile:\s*path:\s*(.+?)(?:\n|$)", ai_response_text, re.IGNORECASE | re.MULTILINE)
if read_file_match:
tool_executed = True
file_path = read_file_match.group(1).strip()
root = ET.fromstring(clean_ai_response_text)
tool_name = root.tag
parameters = {child.tag: child.text for child in root}
if tool_name == "ReadFile":
file_path = parameters.get("path")
if not file_path:
return "TOOL_OUTPUT", "ToolResponse: Error\n---\nReadFile: Missing 'path' parameter."
tool_output = await self._execute_tool_read_file(file_path)
return "TOOL_OUTPUT", f"ToolResponse: ReadFile\nPath: {file_path}\n---\n{tool_output}"
# --- WriteFile ---
if not tool_executed:
# Content can be multi-line and extensive. DOTALL is crucial.
# Capture path, then content separately.
write_file_match = re.search(r"WriteFile:\s*path:\s*(.+?)\s*content:\s*\|?\s*(.*)", ai_response_text, re.IGNORECASE | re.DOTALL)
if write_file_match:
tool_executed = True
file_path = write_file_match.group(1).strip()
content = write_file_match.group(2).strip()
elif tool_name == "WriteFile":
file_path = parameters.get("path")
content = parameters.get("content") # CDATA content will be in .text
if file_path is None or content is None:
return "TOOL_OUTPUT", "ToolResponse: Error\n---\nWriteFile: Missing 'path' or 'content' parameter."
snapshot_branch = await self._create_programmatic_snapshot()
if not snapshot_branch:
return "TOOL_OUTPUT", "ToolResponse: SystemError\n---\nFailed to create project snapshot. WriteFile operation aborted."
else:
# The notification about snapshot creation is now in the system prompt's description of the workflow.
# We can still send a message to Discord for owner visibility if desired.
await ctx.send(f"AICodeAgent: [Info] Created snapshot: {snapshot_branch} before writing to {file_path}")
tool_output = await self._execute_tool_write_file(file_path, content)
return "TOOL_OUTPUT", f"ToolResponse: WriteFile\nPath: {file_path}\n---\n{tool_output}"
# --- ApplyDiff ---
if not tool_executed:
# Diff block can be multi-line.
apply_diff_match = re.search(r"ApplyDiff:\s*path:\s*(.+?)\s*diff_block:\s*\|?\s*(.*)", ai_response_text, re.IGNORECASE | re.DOTALL)
if apply_diff_match:
tool_executed = True
file_path = apply_diff_match.group(1).strip()
diff_block = apply_diff_match.group(2).strip()
elif tool_name == "ApplyDiff":
file_path = parameters.get("path")
diff_block = parameters.get("diff_block") # CDATA content
if file_path is None or diff_block is None:
return "TOOL_OUTPUT", "ToolResponse: Error\n---\nApplyDiff: Missing 'path' or 'diff_block' parameter."
snapshot_branch = await self._create_programmatic_snapshot()
if not snapshot_branch:
@ -518,40 +527,48 @@ class AICodeAgentCog(commands.Cog):
tool_output = await self._execute_tool_apply_diff(file_path, diff_block)
return "TOOL_OUTPUT", f"ToolResponse: ApplyDiff\nPath: {file_path}\n---\n{tool_output}"
# --- ExecuteCommand ---
if not tool_executed:
# Command can have spaces.
exec_command_match = re.search(r"ExecuteCommand:\s*command:\s*(.+?)(?:\n|$)", ai_response_text, re.IGNORECASE | re.DOTALL)
if exec_command_match:
tool_executed = True
command_str = exec_command_match.group(1).strip()
user_id = ctx.author.id # Get user_id from context
elif tool_name == "ExecuteCommand":
command_str = parameters.get("command")
if not command_str:
return "TOOL_OUTPUT", "ToolResponse: Error\n---\nExecuteCommand: Missing 'command' parameter."
user_id = ctx.author.id
tool_output = await self._execute_tool_execute_command(command_str, user_id)
return "TOOL_OUTPUT", f"ToolResponse: ExecuteCommand\nCommand: {command_str}\n---\n{tool_output}"
# --- ListFiles ---
if not tool_executed:
# Path and optional recursive flag.
list_files_match = re.search(r"ListFiles:\s*path:\s*(.+?)(?:\s*recursive:\s*(true|false))?(?:\n|$)", ai_response_text, re.IGNORECASE | re.MULTILINE)
if list_files_match:
tool_executed = True
file_path = list_files_match.group(1).strip()
recursive_str = list_files_match.group(2) # This might be None
recursive = recursive_str.lower() == 'true' if recursive_str else False
elif tool_name == "ListFiles":
file_path = parameters.get("path")
recursive_str = parameters.get("recursive") # Will be None if tag is missing
recursive = recursive_str.lower() == 'true' if recursive_str else False # Handles None or empty string safely
if not file_path:
return "TOOL_OUTPUT", "ToolResponse: Error\n---\nListFiles: Missing 'path' parameter."
tool_output = await self._execute_tool_list_files(file_path, recursive)
return "TOOL_OUTPUT", f"ToolResponse: ListFiles\nPath: {file_path}\nRecursive: {recursive}\n---\n{tool_output}"
# --- WebSearch ---
if not tool_executed:
# Query can have spaces.
web_search_match = re.search(r"WebSearch:\s*query:\s*(.+?)(?:\n|$)", ai_response_text, re.IGNORECASE | re.DOTALL)
if web_search_match:
tool_executed = True
query_str = web_search_match.group(1).strip()
elif tool_name == "WebSearch":
query_str = parameters.get("query")
if not query_str:
return "TOOL_OUTPUT", "ToolResponse: Error\n---\nWebSearch: Missing 'query' parameter."
tool_output = await self._execute_tool_web_search(query_str)
return "TOOL_OUTPUT", f"ToolResponse: WebSearch\nQuery: {query_str}\n---\n{tool_output}"
return "NO_TOOL", ai_response_text # No tool call found, return original AI text
elif tool_name == "TaskComplete":
message = parameters.get("message", "Task marked as complete by AI.") # Default if message tag is missing or empty
return "TASK_COMPLETE", message if message is not None else "Task marked as complete by AI."
else:
# Unknown tool name found in XML
return "TOOL_OUTPUT", f"ToolResponse: Error\n---\nUnknown tool: {tool_name} in XML: {clean_ai_response_text[:200]}"
except ET.ParseError:
# Not valid XML
# print(f"AICodeAgentCog: XML ParseError for response: {ai_response_text[:200]}") # Debugging
return "NO_TOOL", ai_response_text
except Exception as e: # Catch any other unexpected errors during parsing/dispatch
print(f"AICodeAgentCog: Unexpected error in _parse_and_execute_tool_call: {type(e).__name__} - {e} for response {ai_response_text[:200]}")
# import traceback
# traceback.print_exc() # For more detailed debugging if needed
return "TOOL_OUTPUT", f"ToolResponse: SystemError\n---\nError processing tool call: {type(e).__name__} - {e}"
# --- Tool Execution Methods ---
# (Implementations for _execute_tool_... methods remain the same)

View File

@ -0,0 +1,138 @@
# Project: Refactor AI Agent Tool Syntax to XML
**Date:** 2025-05-31
**Goal:** To refactor the AI agent's tool calling mechanism from a custom regex-parsed format to a standardized XML-based format. This aims to improve robustness, ease of AI generation, parsing simplicity, extensibility, and standardization.
**1. Current State:**
The AI agent in `cogs/ai_code_agent_cog.py` uses a custom, line-based syntax for tool calls, defined in `AGENT_SYSTEM_PROMPT` and parsed using regular expressions in `_parse_and_execute_tool_call`.
**2. Proposed Change: XML-Based Tool Calls**
The AI will be instructed to output *only* an XML block when calling a tool. The root element of the XML block will be the tool's name.
**2.1. New XML Tool Call Syntax Definitions:**
* **ReadFile:**
```xml
<ReadFile>
<path>path/to/file.ext</path>
</ReadFile>
```
* **WriteFile:**
```xml
<WriteFile>
<path>path/to/file.ext</path>
<content><![CDATA[
Your multi-line file content here.
Special characters like < & > are fine.
]]></content>
</WriteFile>
```
*(Using `CDATA` for `content` and `diff_block` is recommended to handle multi-line strings and special XML characters gracefully.)*
* **ApplyDiff:**
```xml
<ApplyDiff>
<path>path/to/file.ext</path>
<diff_block><![CDATA[
--- a/original_file.py
+++ b/modified_file.py
@@ -1,3 +1,4 @@
line 1
-line 2 old
+line 2 new
+line 3 added
]]></diff_block>
</ApplyDiff>
```
* **ExecuteCommand:**
```xml
<ExecuteCommand>
<command>your shell command here</command>
</ExecuteCommand>
```
* **ListFiles:**
```xml
<ListFiles>
<path>path/to/search</path>
<recursive>true</recursive> <!-- boolean: "true" or "false", defaults to false if tag omitted or value is not "true" -->
</ListFiles>
```
* **WebSearch:**
```xml
<WebSearch>
<query>your search query</query>
</WebSearch>
```
* **TaskComplete:**
```xml
<TaskComplete>
<message>A brief summary of what was accomplished or the final status.</message>
</TaskComplete>
```
**3. Parsing Logic in `_parse_and_execute_tool_call` (Python):**
* The method will attempt to parse the `ai_response_text` as XML using `xml.etree.ElementTree`.
* If parsing fails (e.g., `ET.ParseError`), it's considered "NO_TOOL" or an error.
* If successful, the root tag name will determine the `tool_name`.
* Child elements of the root will provide the `parameters`.
* `.text` attribute of child elements will give their values.
* For `recursive` in `ListFiles`, the string "true" (case-insensitive) will be converted to a Python boolean `True`, otherwise `False`.
* `CDATA` sections will be handled transparently by the XML parser for `content` and `diff_block`.
**4. Changes to `AGENT_SYSTEM_PROMPT`:**
The system prompt in `cogs/ai_code_agent_cog.py` will be updated to:
* Remove the old tool syntax definitions.
* Clearly state that tool calls must be XML, with the tool name as the root tag.
* Provide the exact XML examples shown above for each tool.
* Emphasize that the AI's response should *only* be the XML block when a tool is invoked, with no surrounding text or markdown.
**5. Implementation Steps:**
* **Phase 1: Update System Prompt & Define Structures**
* Modify `AGENT_SYSTEM_PROMPT` in `cogs/ai_code_agent_cog.py` with the new XML tool definitions and instructions.
* **Phase 2: Refactor Parsing Logic**
* Rewrite the `_parse_and_execute_tool_call` method in `cogs/ai_code_agent_cog.py` to:
* Import `xml.etree.ElementTree as ET`.
* Use a `try-except ET.ParseError` block to catch invalid XML.
* If XML is valid, get the root element's tag as `tool_name`.
* Extract parameters from child elements.
* Handle boolean conversion for `ListFiles/recursive`.
* Call the respective `_execute_tool_...` methods.
* **Phase 3: Testing**
* Thoroughly test each tool call with various inputs, including edge cases and malformed XML (to ensure robust error handling).
**6. Visual Flow (Mermaid Diagram):**
```mermaid
graph TD
A[AI Response Text] --> B{Attempt to Parse as XML};
B -- Valid XML --> C{Get Root Tag (Tool Name)};
C --> D{Extract Parameters from Child Tags};
D --> E{Switch on Tool Name};
E -- ReadFile --> F[Call _execute_tool_read_file];
E -- WriteFile --> G[Call _execute_tool_write_file];
E -- ApplyDiff --> H[Call _execute_tool_apply_diff];
E -- ExecuteCommand --> I[Call _execute_tool_execute_command];
E -- ListFiles --> J[Call _execute_tool_list_files];
E -- WebSearch --> K[Call _execute_tool_web_search];
E -- TaskComplete --> L[Process TaskComplete];
E -- Unknown Tool --> M[Handle Unknown Tool / Error];
B -- Invalid XML / Not a Tool --> N[Return "NO_TOOL" status];
F --> O[Format ToolResponse];
G --> O;
H --> O;
I --> O;
J --> O;
K --> O;
L --> P[End Interaction];
M --> P;
N --> P;
O --> Q[Add to History, Continue Loop];