feat: Refactor AI agent tool syntax to XML format for improved robustness and parsing

2025-05-31 19:18:04 -06:00 · 2025-05-31 19:18:04 -06:00 · 113ce20b2f
commit 113ce20b2f
parent e75aa2d234
2 changed files with 249 additions and 94 deletions
--- a/cogs/ai_code_agent_cog.py
+++ b/cogs/ai_code_agent_cog.py
@ -10,6 +10,7 @@ import datetime # For snapshot naming
 import random # For snapshot naming
 from typing import Dict, Any, List, Optional, Tuple
 from collections import defaultdict # Added for agent_shell_sessions
+import xml.etree.ElementTree as ET

 # Google Generative AI Imports (using Vertex AI backend)
 from google import genai
@ -64,61 +65,77 @@ COMMIT_AUTHOR = "AI Coding Agent Cog <me@slipstreamm.dev>"

 AGENT_SYSTEM_PROMPT = """You are an expert AI Coding Agent. Your primary function is to assist the user (bot owner) by directly modifying the codebase of this Discord bot project or performing related tasks. This bot uses discord.py. Cogs placed in the 'cogs' folder are automatically loaded by the bot's main script, so you typically do not need to modify `main.py` to load new cogs you create in that directory. You operate by understanding user requests and then generating specific inline "tool calls" in your responses when you need to interact with the file system, execute commands, or search the web.

-**Inline Tool Call Syntax:**
-When you need to use a tool, your response should *only* contain the tool call block, formatted exactly as specified below. The system will parse this, execute the tool, and then feed the output back to you in a subsequent message prefixed with "ToolResponse:".
-IMPORTANT: Do NOT wrap your tool calls in markdown code blocks (e.g., ```tool ... ``` or ```json ... ```). Output the raw tool syntax directly, starting with the tool name (e.g., `ReadFile:`).
+**XML Tool Call Syntax:**
+When you need to use a tool, your response should *only* contain the XML block representing the tool call, formatted exactly as specified below. The system will parse this XML, execute the tool, and then feed the output back to you in a subsequent message prefixed with "ToolResponse:".
+IMPORTANT: Do NOT wrap your XML tool calls in markdown code blocks (e.g., ```xml ... ``` or ``` ... ```). Output the raw XML directly, starting with the root tool tag (e.g., `<ReadFile>`).

 1.  **ReadFile:** Reads the content of a specified file.
-    ```tool
-    ReadFile:
-    path: <string: path_to_file>
+    ```xml
+    <ReadFile>
+      <path>path/to/file.ext</path>
+    </ReadFile>
    ```
    (System will provide file content or error in ToolResponse)

 2.  **WriteFile:** Writes content to a specified file, overwriting if it exists, creating if it doesn't.
-    ```tool
-    WriteFile:
-    path: <string: path_to_file>
-    content: |
-      <string: multi-line_file_content>
+    ```xml
+    <WriteFile>
+      <path>path/to/file.ext</path>
+      <content><![CDATA[
+    Your multi-line file content here.
+    Special characters like < & > are fine.
+      ]]></content>
+    </WriteFile>
    ```
    (System will confirm success or report error in ToolResponse)

 3.  **ApplyDiff:** Applies a diff/patch to a file. Use standard unidiff format for the diff_block.
-    ```tool
-    ApplyDiff:
-    path: <string: path_to_file>
-    diff_block: |
-      <string: multi-line_diff_content>
+    ```xml
+    <ApplyDiff>
+      <path>path/to/file.ext</path>
+      <diff_block><![CDATA[
+    --- a/original_file.py
+    +++ b/modified_file.py
+    @@ -1,3 +1,4 @@
+     line 1
+    -line 2 old
+    +line 2 new
+    +line 3 added
+      ]]></diff_block>
+    </ApplyDiff>
    ```
    (System will confirm success or report error in ToolResponse)

 4.  **ExecuteCommand:** Executes a shell command.
-    ```tool
-    ExecuteCommand:
-    command: <string: shell_command_to_execute>
+    ```xml
+    <ExecuteCommand>
+      <command>your shell command here</command>
+    </ExecuteCommand>
    ```
    (System will provide stdout/stderr or error in ToolResponse)

 5.  **ListFiles:** Lists files and directories at a given path.
-    ```tool
-    ListFiles:
-    path: <string: path_to_search>
-    recursive: <boolean: true_or_false (optional, default: false)>
+    ```xml
+    <ListFiles>
+      <path>path/to/search</path>
+      <recursive>true</recursive> <!-- boolean: "true" or "false". If tag is absent or value is not "true", it defaults to false. -->
+    </ListFiles>
    ```
    (System will provide file list or error in ToolResponse)

 6.  **WebSearch:** Searches the web for information.
-    ```tool
-    WebSearch:
-    query: <string: search_query>
+    ```xml
+    <WebSearch>
+      <query>your search query</query>
+    </WebSearch>
    ```
    (System will provide search results or error in ToolResponse)

 7.  **TaskComplete:** Signals that the current multi-step task is considered complete by the AI.
-    ```tool
-    TaskComplete:
-    message: <string: A brief summary of what was accomplished or the final status.>
+    ```xml
+    <TaskComplete>
+      <message>A brief summary of what was accomplished or the final status.</message>
+    </TaskComplete>
    ```
    (System will acknowledge and stop the current interaction loop.)

@ -452,63 +469,55 @@ class AICodeAgentCog(commands.Cog):
        if len(history) > max_history_items:
            self.agent_conversations[user_id] = history[-max_history_items:]

-    async def _parse_and_execute_tool_call(self, ctx: commands.Context, ai_response_text: str) -> Optional[str]:
+    async def _parse_and_execute_tool_call(self, ctx: commands.Context, ai_response_text: str) -> Tuple[str, Optional[str]]:
        """
-        Parses AI response for an inline tool call, executes it, and returns the tool's output string.
+        Parses AI response for an XML tool call, executes it, and returns the tool's output string.
        Returns a tuple: (status: str, data: Optional[str]).
        Status can be "TOOL_OUTPUT", "TASK_COMPLETE", "NO_TOOL".
        Data is the tool output string, completion message, or original AI text.
        """
-        tool_executed = False # Flag to indicate if any tool was matched and attempted
+        try:
+            clean_ai_response_text = ai_response_text.strip()
+            # Remove potential markdown ```xml ... ``` wrapper
+            if clean_ai_response_text.startswith("```"):
+                # More robustly remove potential ```xml ... ``` or just ``` ... ```
+                clean_ai_response_text = re.sub(r"^```(?:xml)?\s*\n?", "", clean_ai_response_text, flags=re.MULTILINE)
+                clean_ai_response_text = re.sub(r"\n?```$", "", clean_ai_response_text, flags=re.MULTILINE)
+                clean_ai_response_text = clean_ai_response_text.strip()

-        # --- TaskComplete ---
-        # Check for TaskComplete first as it's a terminal operation for the loop.
-        task_complete_match = re.search(r"TaskComplete:\s*message:\s*(.*)", ai_response_text, re.IGNORECASE | re.DOTALL)
-        if task_complete_match:
-            tool_executed = True
-            # Ensure we capture the message correctly, even if it's multi-line, up to the end of the tool block or string.
-            # The (.*) should be greedy enough for simple cases.
-            completion_message = task_complete_match.group(1).strip()
-            return "TASK_COMPLETE", completion_message
+            if not clean_ai_response_text or not clean_ai_response_text.startswith("<") or not clean_ai_response_text.endswith(">"):
+                return "NO_TOOL", ai_response_text

-        # --- ReadFile ---
-        if not tool_executed:
-            # Path can contain spaces, so .+? is good. Ensure it doesn't grab parts of other tools if text is messy.
-            read_file_match = re.search(r"ReadFile:\s*path:\s*(.+?)(?:\n|$)", ai_response_text, re.IGNORECASE | re.MULTILINE)
-            if read_file_match:
-                tool_executed = True
-                file_path = read_file_match.group(1).strip()
+            root = ET.fromstring(clean_ai_response_text)
+            tool_name = root.tag
+            parameters = {child.tag: child.text for child in root}
+
+            if tool_name == "ReadFile":
+                file_path = parameters.get("path")
+                if not file_path:
+                    return "TOOL_OUTPUT", "ToolResponse: Error\n---\nReadFile: Missing 'path' parameter."
                tool_output = await self._execute_tool_read_file(file_path)
                return "TOOL_OUTPUT", f"ToolResponse: ReadFile\nPath: {file_path}\n---\n{tool_output}"

-        # --- WriteFile ---
-        if not tool_executed:
-            # Content can be multi-line and extensive. DOTALL is crucial.
-            # Capture path, then content separately.
-            write_file_match = re.search(r"WriteFile:\s*path:\s*(.+?)\s*content:\s*\|?\s*(.*)", ai_response_text, re.IGNORECASE | re.DOTALL)
-            if write_file_match:
-                tool_executed = True
-                file_path = write_file_match.group(1).strip()
-                content = write_file_match.group(2).strip()
+            elif tool_name == "WriteFile":
+                file_path = parameters.get("path")
+                content = parameters.get("content") # CDATA content will be in .text
+                if file_path is None or content is None:
+                    return "TOOL_OUTPUT", "ToolResponse: Error\n---\nWriteFile: Missing 'path' or 'content' parameter."
                
                snapshot_branch = await self._create_programmatic_snapshot()
                if not snapshot_branch:
                    return "TOOL_OUTPUT", "ToolResponse: SystemError\n---\nFailed to create project snapshot. WriteFile operation aborted."
                else:
-                    # The notification about snapshot creation is now in the system prompt's description of the workflow.
-                    # We can still send a message to Discord for owner visibility if desired.
                    await ctx.send(f"AICodeAgent: [Info] Created snapshot: {snapshot_branch} before writing to {file_path}")
                    tool_output = await self._execute_tool_write_file(file_path, content)
                    return "TOOL_OUTPUT", f"ToolResponse: WriteFile\nPath: {file_path}\n---\n{tool_output}"

-        # --- ApplyDiff ---
-        if not tool_executed:
-            # Diff block can be multi-line.
-            apply_diff_match = re.search(r"ApplyDiff:\s*path:\s*(.+?)\s*diff_block:\s*\|?\s*(.*)", ai_response_text, re.IGNORECASE | re.DOTALL)
-            if apply_diff_match:
-                tool_executed = True
-                file_path = apply_diff_match.group(1).strip()
-                diff_block = apply_diff_match.group(2).strip()
+            elif tool_name == "ApplyDiff":
+                file_path = parameters.get("path")
+                diff_block = parameters.get("diff_block") # CDATA content
+                if file_path is None or diff_block is None:
+                    return "TOOL_OUTPUT", "ToolResponse: Error\n---\nApplyDiff: Missing 'path' or 'diff_block' parameter."

                snapshot_branch = await self._create_programmatic_snapshot()
                if not snapshot_branch:
@ -518,40 +527,48 @@ class AICodeAgentCog(commands.Cog):
                    tool_output = await self._execute_tool_apply_diff(file_path, diff_block)
                    return "TOOL_OUTPUT", f"ToolResponse: ApplyDiff\nPath: {file_path}\n---\n{tool_output}"

-        # --- ExecuteCommand ---
-        if not tool_executed:
-            # Command can have spaces.
-            exec_command_match = re.search(r"ExecuteCommand:\s*command:\s*(.+?)(?:\n|$)", ai_response_text, re.IGNORECASE | re.DOTALL)
-            if exec_command_match:
-                tool_executed = True
-                command_str = exec_command_match.group(1).strip()
-                user_id = ctx.author.id # Get user_id from context
+            elif tool_name == "ExecuteCommand":
+                command_str = parameters.get("command")
+                if not command_str:
+                    return "TOOL_OUTPUT", "ToolResponse: Error\n---\nExecuteCommand: Missing 'command' parameter."
+                user_id = ctx.author.id
                tool_output = await self._execute_tool_execute_command(command_str, user_id)
                return "TOOL_OUTPUT", f"ToolResponse: ExecuteCommand\nCommand: {command_str}\n---\n{tool_output}"

-        # --- ListFiles ---
-        if not tool_executed:
-            # Path and optional recursive flag.
-            list_files_match = re.search(r"ListFiles:\s*path:\s*(.+?)(?:\s*recursive:\s*(true|false))?(?:\n|$)", ai_response_text, re.IGNORECASE | re.MULTILINE)
-            if list_files_match:
-                tool_executed = True
-                file_path = list_files_match.group(1).strip()
-                recursive_str = list_files_match.group(2) # This might be None
-                recursive = recursive_str.lower() == 'true' if recursive_str else False
+            elif tool_name == "ListFiles":
+                file_path = parameters.get("path")
+                recursive_str = parameters.get("recursive") # Will be None if tag is missing
+                recursive = recursive_str.lower() == 'true' if recursive_str else False # Handles None or empty string safely
+                if not file_path:
+                    return "TOOL_OUTPUT", "ToolResponse: Error\n---\nListFiles: Missing 'path' parameter."
                tool_output = await self._execute_tool_list_files(file_path, recursive)
                return "TOOL_OUTPUT", f"ToolResponse: ListFiles\nPath: {file_path}\nRecursive: {recursive}\n---\n{tool_output}"

-        # --- WebSearch ---
-        if not tool_executed:
-            # Query can have spaces.
-            web_search_match = re.search(r"WebSearch:\s*query:\s*(.+?)(?:\n|$)", ai_response_text, re.IGNORECASE | re.DOTALL)
-            if web_search_match:
-                tool_executed = True
-                query_str = web_search_match.group(1).strip()
+            elif tool_name == "WebSearch":
+                query_str = parameters.get("query")
+                if not query_str:
+                    return "TOOL_OUTPUT", "ToolResponse: Error\n---\nWebSearch: Missing 'query' parameter."
                tool_output = await self._execute_tool_web_search(query_str)
                return "TOOL_OUTPUT", f"ToolResponse: WebSearch\nQuery: {query_str}\n---\n{tool_output}"
            
-        return "NO_TOOL", ai_response_text # No tool call found, return original AI text
+            elif tool_name == "TaskComplete":
+                message = parameters.get("message", "Task marked as complete by AI.") # Default if message tag is missing or empty
+                return "TASK_COMPLETE", message if message is not None else "Task marked as complete by AI."
+
+
+            else:
+                # Unknown tool name found in XML
+                return "TOOL_OUTPUT", f"ToolResponse: Error\n---\nUnknown tool: {tool_name} in XML: {clean_ai_response_text[:200]}"
+
+        except ET.ParseError:
+            # Not valid XML
+            # print(f"AICodeAgentCog: XML ParseError for response: {ai_response_text[:200]}") # Debugging
+            return "NO_TOOL", ai_response_text
+        except Exception as e: # Catch any other unexpected errors during parsing/dispatch
+            print(f"AICodeAgentCog: Unexpected error in _parse_and_execute_tool_call: {type(e).__name__} - {e} for response {ai_response_text[:200]}")
+            # import traceback
+            # traceback.print_exc() # For more detailed debugging if needed
+            return "TOOL_OUTPUT", f"ToolResponse: SystemError\n---\nError processing tool call: {type(e).__name__} - {e}"

    # --- Tool Execution Methods ---
    # (Implementations for _execute_tool_... methods remain the same)
--- a/docs/ai_agent_tool_syntax_v2_xml.md
+++ b/docs/ai_agent_tool_syntax_v2_xml.md
@ -0,0 +1,138 @@
+# Project: Refactor AI Agent Tool Syntax to XML
+
+**Date:** 2025-05-31
+
+**Goal:** To refactor the AI agent's tool calling mechanism from a custom regex-parsed format to a standardized XML-based format. This aims to improve robustness, ease of AI generation, parsing simplicity, extensibility, and standardization.
+
+**1. Current State:**
+The AI agent in `cogs/ai_code_agent_cog.py` uses a custom, line-based syntax for tool calls, defined in `AGENT_SYSTEM_PROMPT` and parsed using regular expressions in `_parse_and_execute_tool_call`.
+
+**2. Proposed Change: XML-Based Tool Calls**
+
+The AI will be instructed to output *only* an XML block when calling a tool. The root element of the XML block will be the tool's name.
+
+**2.1. New XML Tool Call Syntax Definitions:**
+
+*   **ReadFile:**
+    ```xml
+    <ReadFile>
+      <path>path/to/file.ext</path>
+    </ReadFile>
+    ```
+
+*   **WriteFile:**
+    ```xml
+    <WriteFile>
+      <path>path/to/file.ext</path>
+      <content><![CDATA[
+    Your multi-line file content here.
+    Special characters like < & > are fine.
+      ]]></content>
+    </WriteFile>
+    ```
+    *(Using `CDATA` for `content` and `diff_block` is recommended to handle multi-line strings and special XML characters gracefully.)*
+
+*   **ApplyDiff:**
+    ```xml
+    <ApplyDiff>
+      <path>path/to/file.ext</path>
+      <diff_block><![CDATA[
+    --- a/original_file.py
+    +++ b/modified_file.py
+    @@ -1,3 +1,4 @@
+     line 1
+    -line 2 old
+    +line 2 new
+    +line 3 added
+      ]]></diff_block>
+    </ApplyDiff>
+    ```
+
+*   **ExecuteCommand:**
+    ```xml
+    <ExecuteCommand>
+      <command>your shell command here</command>
+    </ExecuteCommand>
+    ```
+
+*   **ListFiles:**
+    ```xml
+    <ListFiles>
+      <path>path/to/search</path>
+      <recursive>true</recursive> <!-- boolean: "true" or "false", defaults to false if tag omitted or value is not "true" -->
+    </ListFiles>
+    ```
+
+*   **WebSearch:**
+    ```xml
+    <WebSearch>
+      <query>your search query</query>
+    </WebSearch>
+    ```
+
+*   **TaskComplete:**
+    ```xml
+    <TaskComplete>
+      <message>A brief summary of what was accomplished or the final status.</message>
+    </TaskComplete>
+    ```
+
+**3. Parsing Logic in `_parse_and_execute_tool_call` (Python):**
+
+*   The method will attempt to parse the `ai_response_text` as XML using `xml.etree.ElementTree`.
+*   If parsing fails (e.g., `ET.ParseError`), it's considered "NO_TOOL" or an error.
+*   If successful, the root tag name will determine the `tool_name`.
+*   Child elements of the root will provide the `parameters`.
+    *   `.text` attribute of child elements will give their values.
+    *   For `recursive` in `ListFiles`, the string "true" (case-insensitive) will be converted to a Python boolean `True`, otherwise `False`.
+    *   `CDATA` sections will be handled transparently by the XML parser for `content` and `diff_block`.
+
+**4. Changes to `AGENT_SYSTEM_PROMPT`:**
+    The system prompt in `cogs/ai_code_agent_cog.py` will be updated to:
+    *   Remove the old tool syntax definitions.
+    *   Clearly state that tool calls must be XML, with the tool name as the root tag.
+    *   Provide the exact XML examples shown above for each tool.
+    *   Emphasize that the AI's response should *only* be the XML block when a tool is invoked, with no surrounding text or markdown.
+
+**5. Implementation Steps:**
+
+    *   **Phase 1: Update System Prompt & Define Structures**
+        *   Modify `AGENT_SYSTEM_PROMPT` in `cogs/ai_code_agent_cog.py` with the new XML tool definitions and instructions.
+    *   **Phase 2: Refactor Parsing Logic**
+        *   Rewrite the `_parse_and_execute_tool_call` method in `cogs/ai_code_agent_cog.py` to:
+            *   Import `xml.etree.ElementTree as ET`.
+            *   Use a `try-except ET.ParseError` block to catch invalid XML.
+            *   If XML is valid, get the root element's tag as `tool_name`.
+            *   Extract parameters from child elements.
+            *   Handle boolean conversion for `ListFiles/recursive`.
+            *   Call the respective `_execute_tool_...` methods.
+    *   **Phase 3: Testing**
+        *   Thoroughly test each tool call with various inputs, including edge cases and malformed XML (to ensure robust error handling).
+
+**6. Visual Flow (Mermaid Diagram):**
+
+```mermaid
+graph TD
+    A[AI Response Text] --> B{Attempt to Parse as XML};
+    B -- Valid XML --> C{Get Root Tag (Tool Name)};
+    C --> D{Extract Parameters from Child Tags};
+    D --> E{Switch on Tool Name};
+    E -- ReadFile --> F[Call _execute_tool_read_file];
+    E -- WriteFile --> G[Call _execute_tool_write_file];
+    E -- ApplyDiff --> H[Call _execute_tool_apply_diff];
+    E -- ExecuteCommand --> I[Call _execute_tool_execute_command];
+    E -- ListFiles --> J[Call _execute_tool_list_files];
+    E -- WebSearch --> K[Call _execute_tool_web_search];
+    E -- TaskComplete --> L[Process TaskComplete];
+    E -- Unknown Tool --> M[Handle Unknown Tool / Error];
+    B -- Invalid XML / Not a Tool --> N[Return "NO_TOOL" status];
+    F --> O[Format ToolResponse];
+    G --> O;
+    H --> O;
+    I --> O;
+    J --> O;
+    K --> O;
+    L --> P[End Interaction];
+    M --> P;
+    N --> P;
+    O --> Q[Add to History, Continue Loop];