Building a Multi-Agent AI System with Planning, Tools, and Self-Correction Using OpenAI

By

This guide explains how to construct an advanced agentic AI system that separates concerns into three specialized roles: a planner, a tool-using executor, and a critic. By splitting strategy, action, and quality control, the system can handle complex tasks reliably. We’ll walk through the core components: secure API key handling, a small knowledge base, four built-in tools (calculator, search, JSON extraction, file writing), memory integration, and a self-critique loop. The result is a modular pipeline that can plan, execute, and refine its outputs autonomously using the OpenAI API.

1. What is the overall architecture of this agentic AI system?

The system is designed as a pipeline of three distinct roles: a planner, an executor, and a critic. The planner receives a high-level task and breaks it into a structured sequence of steps, deciding which tools to call and in what order. The executor then carries out those steps, invoking the appropriate tool (e.g., calculator, knowledge search) and collecting intermediate results. Finally, the critic reviews the executor’s output, checking for correctness, completeness, and adherence to quality policies. If the critic finds issues, the planner may be triggered again to revise the plan. This separation ensures that strategic thinking, concrete actions, and quality assurance are handled by specialized modules, making the system more robust and easier to debug.

Building a Multi-Agent AI System with Planning, Tools, and Self-Correction Using OpenAI
Source: www.marktechpost.com

2. How is the OpenAI API key securely handled in this setup?

Security is a top priority when using API keys. In this implementation, the API key is never hardcoded or displayed in the notebook. Instead, the getpass() function from Python’s standard library prompts the user to enter the key in a hidden terminal input. The key is then stored in an environment variable (OPENAI_API_KEY). The code checks if the environment variable already exists (e.g., from a Colab secret) and only prompts if it’s missing. After setting the key, an assertion ensures it’s not empty. This approach keeps the key confidential and prevents accidental exposure in logs or shared notebooks. The OpenAI client is then created with this key, and a model string (e.g., gpt-5.2) is defined once for reuse across the system.

3. What does the built-in knowledge base contain, and how is it structured?

The knowledge base is a small in-memory list of dictionaries, each with a title and text field. It stores curated policies and playbooks that guide the agent’s behavior. For example, one entry titled “Agent Protocol: Execution” advises to use tools only when necessary and verify numeric results. Another “Policy: Output Quality” requires final answers to include steps, checks, and deliverables. A “Playbook: Meeting Follow-up” outlines how to summarize decisions and list action items. The knowledge base can be expanded easily; the system uses a simple scoring mechanism based on word overlap to retrieve the most relevant entries for a given query. This allows the planner and executor to access contextual guidance without needing external databases.

4. What tools are provided, and how do they work?

Four tools are implemented as Python functions, each returning a standardized result dictionary with an ok status flag. Calculator (_safe_calc) evaluates mathematical expressions after sanitizing input to block variables and dangerous characters. It uses eval() with a restricted namespace. Knowledge base search (_kb_search) scores documents by counting overlapping words between the query and each entry’s title+text, returning the top results. JSON extraction (_extract_json) uses a regex to find the first JSON object in a string, parses it, and returns the parsed dict or an error. File writing (_write_file) writes content to a specified path on the local filesystem, enabling the agent to save deliverables. All tools are designed to be simple, deterministic, and easy for the AI to invoke via structured prompts.

Building a Multi-Agent AI System with Planning, Tools, and Self-Correction Using OpenAI
Source: www.marktechpost.com

5. How is memory integrated into the system?

Memory is implemented as a shared context that persists across planning, execution, and critique steps. During each run, the planner receives the task and any previous conversation history. It generates a plan that references prior steps if needed. The executor also sees the full history, so it can avoid redundant work and build on earlier results. The critic reviews the final output with awareness of the original task and all intermediate steps. This short-term memory is maintained within a single session; the system does not yet include long-term memory persistence (e.g., to a database). Future versions could store key insights or tool results for reuse across different tasks. The memory is essentially a list of messages that grows as the pipeline executes, providing a coherent thread for the AI to follow.

6. How does the self-critique mechanism work?

After the executor completes the plan, the critic role is invoked. The critic receives the original task, the executed plan, and the final output. It is prompted to evaluate the output against quality criteria defined in the knowledge base, such as including steps, verifying numeric results, and listing action items. The critic produces a structured critique, noting any errors or omissions. If the critique identifies issues, the planner is re-prompted to revise the plan, incorporating the feedback. This loop continues until the critic approves the output or a maximum iteration limit is reached. The self-critique ensures that the system can catch and correct its own mistakes without human intervention, making the agent more reliable for complex workflows.

7. What does the full workflow look like from start to finish?

The workflow begins when a user provides a high-level task, such as “Calculate the compound interest for $10,000 at 5% for 3 years and save the result to a file.” First, the planner analyzes the task and generates a sequence of sub‑steps: (1) use the calculator tool, (2) retrieve the quality policy from the knowledge base, (3) write the result to a file. The executor then executes each step sequentially, calling the appropriate tools and collecting outputs. After all steps are done, the critic reviews the final output against the knowledge base policy. If everything is correct, the system outputs the final result and a success message. If not, the planner receives the critique and creates a revised plan. This iterative process continues until the critic is satisfied, demonstrating a fully autonomous agent that plans, acts, and self-corrects.

Tags:

Related Articles

Recommended

Discover More

12 Architectural Tweaks to Drastically Cut AI Training ExpensesAlienware 15: A Solid Gaming Laptop That Struggles to Justify Its PriceCombined Webb-Hubble Image of Whirlpool Galaxy Reveals Hidden Secrets of Star Birth7 Critical Insights from Microsoft's Latest 137-Vulnerability Patch BatchVS Code Python Environments Extension: April 2026 Update Boosts Speed and Reliability