G-Memory论文阅读 | 迎风起降的小站

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

瞎逼逼

摘要

Large language model (LLM)-powered multi-agent systems (MAS) have demonstrated cognitive and execution capabilities that far exceed those of single LLM agents, yet their capacity for self-evolution remains hampered by underdeveloped memory architectures. Upon close inspection, we are alarmed to discover that prevailing MAS memory mechanisms (1) are overly simplistic, completely disregarding the nuanced inter-agent collaboration trajectories, and (2) lack crosstrial and agent-specific customization, in stark contrast to the expressive memory developed for single agents. To bridge this gap, we introduce G-Memory, a hierarchical, agentic memory system for MAS inspired by organizational memory theory [1], which manages the lengthy MAS interaction via a three-tier graph hierarchy: insight, query, and interaction graphs. Upon receiving a new user query, G-Memory performs bi-directional memory traversal to retrieve both high-level, generalizable insights that enable the system to leverage cross-trial knowledge, and fine-grained, condensed interaction trajectories that compactly encode prior collaboration experiences. Upon task execution, the entire hierarchy evolves by assimilating new collaborative trajectories, nurturing the progressive evolution of agent teams. Extensive experiments across five benchmarks, three LLM backbones, and three popular MAS frameworks demonstrate that G-Memory improves success rates in embodied action and accuracy in knowledge QA by up to 20.89% and 10.12%, respectively, without any modifications to the original frameworks.

大语言模型（Large Language Models, LLM）驱动的多智能体系统（Multi-Agent Systems, MAS）在认知能力与执行能力方面已显著超越单一LLM智能体。然而，其自我演化能力仍受到记忆架构发展不足的制约。通过深入分析，我们发现当前主流的MAS记忆机制存在两大关键问题：（1）设计过于简单，完全忽视了多智能体协作过程中的复杂交互轨迹；（2）缺乏跨任务（cross-trial）与智能体个体层面的定制能力，这与单智能体系统中已发展出的高表达力记忆机制形成鲜明对比。
为弥补上述不足，本文提出了一种名为G-Memory的分层智能体记忆系统。该系统受到组织记忆理论（organizational memory theory）[1]的启发，通过一个三层图结构层级对MAS中的长程交互进行管理，包括：洞察图（insight graph）、查询图（query graph）以及交互图（interaction graph）。
在接收到新的用户查询时，G-Memory通过双向记忆遍历机制，同时检索：（i）高层次、具备泛化能力的抽象洞察，以支持跨任务知识迁移；以及（ii）细粒度、压缩表达的交互轨迹，用以高效编码历史协作经验。在任务执行过程中，该分层结构会通过吸收新的协作轨迹持续演化，从而促进智能体团队能力的渐进式提升。
在五个基准任务、三种LLM基础模型以及三种主流MAS框架上的大量实验表明，G-Memory在无需修改原有框架的前提下，可将具身行动任务的成功率提升最高达20.89%，并将知识问答任务的准确率提升最高达10.12%。

方法

多智能体形式化定义

将多智能体系统（Multi-Agent System, MAS）定义为一个有向图，其中顶点集合表示智能体的集合，表示智能体的数量；边集合表示智能体之间交互通道（channel）的集合。

每个智能体节点有一个四元组来描述：

符号	含义
	底层的大语言模型实例
	智能体的角色或人设
	记忆状态，包括历史交互或外部知识库
	辅助工具集合（如网页搜索引擎等）

在接收到用户查询后，系统经历了个同步通信周期。

在每个周期中，我们推导出节点的拓扑排序，使得如果存在从到的边，则，这保证了每个智能体仅在所有前驱节点完成操作后才处理其输入。对于中的每个智能体，其在第次迭代中的输出计算为：

符号	含义
	表示智能体在第轮生成的响应（包括推理步骤、中间分析或最终结果）
	全局系统提示，包含整体指令以及各个智能体的角色设定
	智能体的入邻居集合，其输出作为当前输入上下文

Tips: 这段定义可以这样理解

同步通信周期可以理解为整个多智能体系统完成了多少轮协作迭代。
例如查询 Q 输入给智能体 A，A 的输出分别流向 B 和 C，最后由 D 汇总，这就完成了一轮。

[Q] -> A -> B -> D
      \          ^
       -> C -----
  

拓扑排序用于保证处理顺序正确。合法示例：π = [A, B, C, D]、π = [A, C, B, D]；非法示例：π = [B, A, C, D]、π = [C, A, B, D]。非法的原因是 B、C 依赖 A 的输出，却在 A 之前被执行。

当所有的智能体完成响应（acted）后，全局聚合算子将响应集合融合为中间解。

论文中还提到，聚合算子的常见实现有多数投票方案、通过专用聚合智能体进行分层摘要，或直接采用最终智能体的输出作为答案等。
这些迭代持续进行次，直到达到预设限制或满足提前停止标准，从而生成对查询Q的最终响应。

G-Memory 设计

G-Memory 由三层图构成，分别是

交互图 Interaction Graph (Utterance Graph)：记录原始多智能体交互的细粒度轨迹，包含每轮迭代中每个智能体的响应以及全局聚合结果。
查询图 Query Graph：历史查询组成的图，相关的历史查询之间会有边连接。
洞察图 Insight Graph：提取和总结从交互中获得的洞察经验（知识总结）。

交互图 Interaction

👉 粒度最细：记录一次 query 内部的对话过程

其中：

节点集合：

符号	含义
	查询，当前处理的任务
	单个节点第 i 条发言
	发言者（哪个智能体说的），
	发言内容（文本）

边集合：

定义为：

表示存在一条从到的边，当且仅当的产生依赖于。

查询图 Query Graph

查询图存储了先前处理过的查询及其元数据，具体如下：

符号	含义
	查询节点集合，
	单个查询节点，
	原始查询
	任务状态（Failed / Resolved）
	对应的交互图
	查询之间的关系边集合，

洞察图 Insight Graph

实验

AutoGen

AutoGen 是一个由微软在 2023 年提出的一个多智能体框架，强调的是“允许开发者通过多个能够相互对话的智能体来构建大语言模型应用”。

有点像群聊的感觉。这篇论文采用的是上面的 A3 模式，一个 Assistant 和一个 Grounding Agent，在这个论文的代码中叫 solver 和 ground_truth。

我看了代码，AutoGen 这个 MAS 的流程是这样的，当 solver 连续三次给出相同动作时，代码里会切换到 ground_truth 角色尝试打破循环。这个 MAS 就两个智能体，平时 solver 负责解题，ground_truth 负责当 solver 卡住时提供正确的答案或指导。

solver 的 system prompt 如下：

1	You are a smart agent designed to solve problems.

翻译

1	你是一个被设计用来解决问题的智能代理。

ground_truth 的 system prompt 如下：

You are an agent designed to assist the solver agent. When you are called, it means the solver agent has repeatedly output the same incorrect content (It means that the solver agent is stuck in a loop of providing the same incorrect answer or approach).

Your task is to carefully analyze the input and provide the correct answer or guidance to help the solver agent break out of the stuck state and proceed toward the correct solution.

NOTE: ** Your approach must avoid being consistent with the previous output's approach (as the previous output comes from a solver agent that has already fallen into a misconception, making it definitely wrong). **

翻译

你是一个被设计用来辅助“求解代理（solver agent）”的代理。当你被调用时，意味着求解代理已经多次输出相同的错误内容（也就是说，它陷入了不断重复错误答案或错误方法的循环中）。

你的任务是仔细分析输入内容，并提供正确的答案或指导，帮助求解代理摆脱这种卡住的状态，继续朝正确的解决方案推进。

注意：
你的方法必须避免与之前错误输出所采用的方法保持一致（因为之前的输出来自一个已经陷入误解的求解代理，因此其方法必然是错误的）。

瞎逼逼

摘要

方法

多智能体形式化定义

G-Memory 设计

交互图 Interaction

查询图 Query Graph

洞察图 Insight Graph

实验

AutoGen

实验结果

GPT-4o-mini

Qwen2.5-7B-Instruct

Qwen2.5-14B-Instruct