Evolve task-specific memory systems for LLM agents as executable Python code. No hand-tuning. No architecture search. Just evolution.
Prior work picks among predefined architectures or tunes NL rules. Engram searches over executable Python — an open space expressing any data structure, SQL schema, or retrieval logic.
Same system, same seeds → structurally different programs. Conversational QA evolves a multi-table SQL index. Embodied tasks evolve a deterministic action cache. No universal design wins.
A memory program is a Python module — dataclass schemas, write/read logic, instruction strings. The task agent is fixed. Only the memory changes.
A complete Python module with three evolvable dimensions: instruction constants, dataclass schemas, and storage/retrieval logic.
# Instruction constants — injected into agent prompts INSTRUCTION_KNOWLEDGE_ITEM = "Extract key facts..." INSTRUCTION_QUERY = "Formulate a search query..." INSTRUCTION_RESPONSE = "Answer using retrieved context..." ALWAYS_ON_KNOWLEDGE = "" @dataclass class KnowledgeItem: summary: str entities: list[str] timestamp: str @dataclass class Query: query_text: str entity_filter: str class KnowledgeBase: def __init__(self, toolkit): self.db = toolkit.db self.chroma = toolkit.chroma def write(self, item, raw_text=""): # Store structured knowledge ... def read(self, query) -> str: # Retrieve relevant context ...
Four string constants steer how the task agent parses observations, formulates queries, and generates answers.
KnowledgeItem and Query dataclass fields define what gets stored and how it's queried. Evolution reshapes fields freely.
write() and read() methods use SQLite, ChromaDB, and a budget-limited LLM. Evolution rewrites the entire algorithm.
Click any node to view its source code. The tree shows how programs evolve through parent-child mutations. Nodes higher = earlier iterations, lower = later. Color = score.