Designing Context for Tasking Agents

View the project on GitHub

The starting point for ark-agents was not only memory. It was a question about what an agentic system should be able to do from a single chat interface.

In this case, the domain was investing.

The product idea was straightforward: let a user research companies, inspect a paper portfolio, delegate deeper analysis, and execute simulated trades from the same interface.

At first, it was tempting to treat this as a chatbot-memory problem. Keep recent messages, add retrieval, maybe add long-term memory, and adapt a conversational context engine to fit.

That works for simple demos. It stops working once the system has to act.

A conversational chatbot mostly needs continuity. A tasking agent needs continuity too, but continuity is not the hard part. The harder problem is assembling the smallest useful context for the next action, deciding when tools should run, preserving the outputs that matter, and determining whether the task is actually complete.

This article describes the design as it exists in ark-agents. The project is built around investing and paper trading, but the broader design problem is not domain-specific. Any agent that needs to inspect state, use tools, delegate work, and continue until completion runs into the same context problem.

Why Chat Context Assumptions Break

The first version of agent context usually looks like chatbot context with more tools attached.

That is not enough.

Three things break quickly.

Conversation continuity is not the same as execution state

A recent-message window helps the system stay coherent. It does not tell the agent which tool already ran, which result still matters, whether specialist work is pending, or whether the goal has been satisfied.

Long-term memory can add noise faster than value

In conversational systems, long-term memory often improves continuity. In tasking systems, broad episodic recall can easily inject stale assumptions, irrelevant history, or outdated task artifacts into a workflow that needs precision.

A tool-using agent needs control, not just recall

Once the system can call tools, inspect outputs, delegate work, and replan, context is only one part of the problem. The system also needs a loop that decides what to do next.

For tasking agents, the real problem is not "how do we remember more." It is "how do we build the right working context for the next step under real constraints."

What a Tasking Context Engine Actually Has to Do

A tasking context engine sits between stored state and live agent decisions.

Its job is not only to preserve information. Its job is to support execution.

In practice, that means it has to:

preserve conversational continuity without replaying the full transcript
make the agent's role, tools, and constraints explicit
preserve active task state and recent execution artifacts
support multi-step tool use and reassessment
keep delegated work attached to the main workflow
retrieve prior history selectively instead of flooding the prompt
fit all of that into a bounded working context

The key distinction is that tasking context is not broad memory by default. It is selective working state plus selective retrieval.

That distinction drives the architecture.

The Design Requirements

For a tasking agent to behave reliably, a few design requirements matter more than almost anything else.

1. Recent conversation should remain visible

The active thread still matters. The system needs enough recent conversation to understand the user's current goal, clarifications, and immediate follow-up.

2. Agent role should be explicit

A coordinator and a specialist should not share the same assumptions. The system needs context that makes each agent's role, strategy, and risk boundaries explicit.

3. Task state should outrank general memory

The most important context is usually the current workflow: active tasks, recent tool outputs, delegated work, and artifacts that still affect the next decision.

4. Tool use should support iteration

A useful agent should not be limited to one tool call and one response. It should be able to call tools, inspect results, reassess, and continue until the task is actually complete or a limit is reached.

5. Delegation has to return structured results

Background work is not enough. Specialist outputs need to come back into the main flow in a form the coordinator can use on the next step.

6. Retrieval has to stay selective

Historical memory still has value, but it should be gated. Retrieval should happen when prior context is genuinely useful, not as a default dump of everything the system has ever seen.

These requirements lead to a simpler design than a general "agent memory platform" might suggest.

The Architecture: Three Context Blocks and One Control Loop

In ark-agents, the system becomes easier to reason about when context is split into three blocks.

Three context blocks feeding into a control loop

Chat context

Chat context handles conversational continuity.

The implementation is intentionally simple:

a recent-message window
recap retrieval from the same chat

Recaps are generated after a configurable number of user turns and can be retrieved when they help recover prior thread context without replaying an entire transcript.

This works well because the current thread usually matters more than distant history, and recap retrieval preserves continuity without turning every prompt into a conversation dump.

Agent context

Agent context answers a different question:

What role is this agent playing right now?

This includes:

agent identity
role-specific instructions
tool or skill availability
strategy guidance
risk guidance

This matters because the coordinator and the specialists do not do the same job. A coordinator needs to route, synthesize, delegate, and decide when the workflow should continue. A specialist needs narrower operating context and narrower tools. That difference should be explicit in context rather than left to prompt drift.

Task context

Task context is the most important block.

This is working memory for the active workflow.

It includes:

active tasks
completed task artifacts
recent tool outputs
specialist outputs returned into the same chat
execution state that still matters for the next decision

This is what should dominate the next action. Task context is not long-term memory in a broader sense. It is scoped working state.

That scoping matters because it keeps the prompt centered on what the system is doing now rather than everything it has ever done.

The control loop on top

These three context blocks are only useful because the agent sits inside a control loop.

ReAct loop: Reason, Act, Observe, Reassess

The coordinator can:

answer directly when no tools are needed
call one or more tools in the same step
inspect the outputs
decide whether the goal has been met
replan if it has not
delegate if the task is better handled asynchronously
turn the result back into user-facing follow-up

That loop is the main difference between a chatbot context engine and a tasking-agent context engine. The system is not only assembling context for one reply. It is assembling working state for repeated action.

Why Long-Term Memory Became Secondary

Long-term memory was built early because it is a natural feature in chat-oriented systems.

In a tasking workflow, it quickly becomes a secondary mechanism rather than the primary one.

The reason is practical.

Broad episodic memory can help with continuity, but it can also:

leak stale assumptions into a current task
crowd out more relevant working state
increase token cost without improving execution
blur the line between historical recall and active workflow state

That does not make long-term memory useless. It still matters for cross-session continuity and for the cases where prior conversations genuinely matter to the task.

But in ark-agents, the default path is narrower:

recent conversation
task-scoped working memory
selective retrieval only when it is justified

For a tasking system, that has been a better design center than broad memory recall.

The Agent Roles in `ark-agents`

The coordination model revolves around three agents.

Ark

Ark is the user-facing coordinator.

It handles live chat, assembles context, decides when tools should run, delegates work when necessary, and turns specialist outputs back into the main flow.

Scout

Scout is the research specialist.

It handles deeper research tasks and returns structured research artifacts rather than text that disappears into a chat transcript.

Archer

Archer is the trading specialist.

It handles trade execution tasks against the paper portfolio and returns structured execution results.

This separation matters because a coordinator should not try to do all work inline. Research and execution are different workflows, and they benefit from explicit handoffs rather than vague internal reasoning.

Tool Surfaces Should Be Designed for Reasoning

One of the more important design choices in ark-agents is that agents do not operate over one flat, unrestricted tool list.

They use constrained tool surfaces shaped around their role.

For Ark, the tool surface is grouped by domain, including areas such as:

portfolio
research
market
tasks
biotech

That structure helps the model reason over the available actions more easily than a long flat list of unrelated functions.

Scout operates over deterministic research blocks such as deep stock research, broader market context, portfolio-aware context, and biotech-specific workflows.

Archer operates over a bounded trading surface such as quotes, validation, execution, portfolio state, and stop-loss updates.

The design goal here is simple: the model should see a tool vocabulary it can reason over, while the runtime handles the actual execution.

Those are different problems. The model needs a clean action surface. The runtime needs dispatch and safety.

Queryable History Matters More Than Replaying Everything

One-shot agents are easy to reason about. Real agents are not.

Once the system can take multiple steps, past actions matter:

which tools were called
which arguments were used
what came back
what the agent concluded from those results

In ark-agents, those actions and summaries are kept as queryable history rather than assumed to live forever inside the active prompt.

That matters for two reasons.

First, it keeps the live context smaller.

Second, it makes prior agent behavior inspectable without forcing every historical action into the next model call.

For tasking systems, that is usually a better pattern than transcript-style replay.

Error Handling Is Part of the Control System

In a tasking agent, error handling is not just a backend implementation detail.

It is part of the reasoning loop.

When a tool fails, the system should not only surface a raw error. It should provide enough guidance for the agent to decide what to do next.

That guidance might indicate:

the service is slow
the service is rate limited
the ticker format is invalid
the requested action is not allowed
the referenced tool or block is unknown

Good error messages support recovery. They help the system retry, choose another tool, ask a clarifying question, or stop safely.

That makes error handling part of coordination, not just infrastructure.

Agentic Systems Burn Tokens Faster Than Expected

Careful context design helps, but it does not make tasking agents cheap.

Even with recap-based chat continuity, scoped task memory, and selective retrieval, multi-step agent loops consume tokens quickly.

That happens because the architecture multiplies model calls:

the system has to decide what to do
tool results have to be interpreted
the goal has to be reassessed
delegated work can add more calls and more context

The result is that a heavy interactive session can consume a surprisingly large number of tokens, and always-on autonomous systems magnify that cost again.

This is partly a memory problem. It is also a control-loop problem.

Every additional loop has a prompt cost.

That makes token economics a first-class architectural constraint.

Model Quality Matters More Once Tools and Loops Arrive

The second practical constraint is model quality.

Small models can look adequate in single-turn chat flows. They tend to degrade faster once the system has to decide:

whether tools are needed
which tools can be called together
whether the current goal has been met
whether another step is justified

In practice, multi-step tool use raises the reliability bar quickly.

That means agent systems often become expensive in two ways at once:

they consume more tokens
they require stronger models than a simpler chatbot would

That combination is one of the main reasons tasking-agent architecture has to stay disciplined.

Why the Architecture Generalizes

Although ark-agents is built around investing and paper trading, the broader coordination pattern is reusable.

The reusable pieces are:

recent-biased chat context
explicit agent context
task-scoped working memory
multi-step tool use with reassessment
durable delegation
structured artifacts
queryable action history
guided error recovery

The same structure can apply to workflows such as:

sourcing leads
filtering lists against criteria
enriching records
finding contact information
preparing outreach
scheduling follow-up work

What changes across domains is not the coordination model itself.

What changes is the tool surface, the artifact schema, the approval model, the execution constraints, and the definition of completion.

That is why ark-agents is better read as a concrete tasking-agent design than as a universal framework.

Closing

The central design shift in ark-agents was simple.

Stop treating the agent primarily as a conversation with memory.

Start treating it as a workflow system that assembles the smallest useful context for the next action.

That shift changes what matters most.

Long-term memory becomes selective rather than dominant. Task context becomes the main working state. Delegation has to return structured artifacts. Error messages become part of the reasoning loop. Tool surfaces need to be designed for model comprehension, not only runtime execution.

That is what makes a tasking-agent context engine different from a chatbot context engine.

The problem is no longer only continuity. It is coordination.