ES-Mem论文阅读 | 迎风起降的小站

ES-Mem: Event Segmentation-Based Memory for Long-Term Dialogue Agents

摘要

Memory is critical for dialogue agents to maintain coherence and enable continuous adaptation in long-term interactions. While existing memory mechanisms offer basic storage and retrieval capabilities, they are hindered by two primary limitations: (1) rigid memory granularity often disrupts semantic integrity, resulting in fragmented and incoherent memory units; (2) prevalent flat retrieval paradigms rely solely on surface-level semantic similarity, neglecting the structural cues of discourse required to navigate and locate specific episodic contexts. To mitigate these limitations, drawing inspiration from Event Segmentation Theory, we propose ES-Mem, a framework incorporating two core components: (1) a dynamic event segmentation module that partitions long-term interactions into semantically coherent events with distinct boundaries; (2) a hierarchical memory architecture that constructs multi-layered memories and leverages boundary semantics to anchor specific episodic memory for precise context localization. Evaluations on two memory benchmarks demonstrate that ES-Mem yields consistent performance gains over baseline methods. Furthermore, the proposed event segmentation module exhibits robust applicability on dialogue segmentation datasets.

记忆对于对话智能体维持连贯性并在长期交互中实现持续适应至关重要。尽管现有的记忆机制提供了基本的存储与检索能力，但仍受到两个主要限制：（1）僵化的记忆粒度常常破坏语义完整性，导致记忆单元碎片化且缺乏连贯性；（2）主流的扁平化检索范式仅依赖表层语义相似性，忽视了话语结构线索，而这些线索对于定位和检索特定情境片段至关重要。为缓解上述问题，我们受到事件分割理论（Event Segmentation Theory）的启发，提出了 ES-Mem 框架。该框架包含两个核心组件：（1）动态事件分割模块，将长期交互划分为具有清晰边界的语义连贯事件；（2）层次化记忆架构，构建多层级记忆，并利用边界语义锚定特定的情景记忆，从而实现精确的上下文定位。在两个记忆基准数据集上的评估结果表明，ES-Mem 相较于基线方法取得了稳定的性能提升。此外，所提出的事件分割模块在对话分割数据集上也表现出良好的通用性。

引言

现状

当前的记忆机制存在两点问题：

1）记忆粒度是固定的，一般都是以一个 turn 为单位

2）检索的方法过于扁平，一般都是依赖于表层语义相似性（就是嵌入计算余弦相似度，然后召回 Top-K 这种）进行记忆检索，没有采用记忆 unit 之间的结构化关联。

动机（故事）

事件分段理论（Event Segmentation Theory, EST）是一种认知科学理论，用于解释人类在感知、理解和记忆连续事件时如何将其划分为有意义的片段。该理论由心理学家 Jeffrey Zacks 等人提出，对影视理解、记忆编码及预测行为具有重要影响。

简单来说就是输入对话流，然后通过事件分段模块将对话流划分为一个个事件单元（event unit），每个事件单元都是一个语义连贯的片段。然后在记忆检索阶段，利用事件边界的语义信息来定位和召回特定的情景记忆，从而实现更精确的上下文定位。

贡献

提出 ES-Mem，一个基于事件分割理论的新型认知启发式记忆框架。通过将记忆粒度从固定的话轮转变为动态事件，ES-Mem 解决了现有方法固有的语义碎片化问题，并确保了话语整体性的保持。
实现了一个动态分割模块，该模块根据主题连贯性和意图变化对连续对话流进行划分。这驱动了一种分层记忆架构，包含多层存储机制，从而支持一种以边界为锚点的精准上下文定位策略.
在两个长期记忆基准测试上系统地评估了 ES-Mem 的性能。实证结果表明，ES-Mem 持续优于各类记忆基线模型。此外，在小型模型场景下，我们的事件分割模块在对话分割任务中表现出强大的适应性.

摘要

引言

现状

动机（故事）

贡献

相关工作

通过EST理论来提升对话智能体记忆能力的相关工作。