
    +Vj,                      U d Z ddlmZ ddlZddlZddlZddlZddlZddlm	Z	m
Z
mZ ddlmZ ddlmZmZmZmZmZ  ej        e          ZdZdZd	Zd
ZdZdZdZdZdZdZ  e!ee h          Z" e!eee h          Z#ede dediZ$dZ%dZ&dZ'dZ(dZ)dZ*dZ+dZ,dZ-dZ.e	 G d d                      Z/e	 G d  d!                      Z0dd%Z1i Z2d&e3d'<   dd)Z4dd+Z5dd.Z6dd/Z7dd1Z8dd2Z9dd3Z:dd6Z;dd:Z< ej=        d;ej>                  Z?dd>Z@dd@ZAddBZBddDZCdEdFdGdHdIdJdKdLdMdNdOdLdPe dQedOdRdSgdTdUdVZDd&e3dW<   dEdXdYdHdZd[d\dHd7dId]dKid7gdTd^idZgdTdUdVZEd&e3d_<   dEd`dadHd[dbdHdLdcdKdIddgdddIdedKdfg dfdTd^d[dgdHd7dIdhdKid7gdTd^dIdidKdjg djdTdUdVZFd&e3dk<   dNeddlddoZGddqZHdduZIddvZJdwdxddZKedddZLedddZMeedddZNddZO G d d          ZPedddZQg dZRdS )u	  Persistent session goals — the Ralph loop for Hermes.

A goal is a free-form user objective that stays active across turns. After
each turn completes, a small judge call asks an auxiliary model "is this
goal satisfied by the assistant's last response?". If not, Hermes feeds a
continuation prompt back into the same session and keeps working until the
goal is done, turn budget is exhausted, the user pauses/clears it, or the
user sends a new message (which takes priority and pauses the goal loop).

Checklist mode (added 2026-05): when a goal is set, a Phase-A "decompose"
call asks the judge to write an extremely detailed checklist of concrete
completion criteria for that goal. On every subsequent turn (Phase B) the
judge evaluates the agent's most recent output against EACH pending item
and may flip pending → completed | impossible, or append new items it
discovers along the way. The goal is done only when every checklist item
is in a terminal status. This is much harsher than the freeform
"is the goal done?" prompt and gives users a visible, verifiable progress
surface via /subgoal. A bounded read_file tool loop lets the judge inspect
the dumped conversation history when the snippet alone isn't enough to
rule.

State is persisted in SessionDB's ``state_meta`` table keyed by
``goal:<session_id>`` so ``/resume`` picks it up.

Design notes / invariants:

- The continuation prompt is just a normal user message appended to the
  session via ``run_conversation``. No system-prompt mutation, no toolset
  swap — prompt caching stays intact.
- Judge failures are fail-OPEN: ``continue``. A broken judge must not wedge
  progress; the turn budget is the backstop.
- When a real user message arrives mid-loop it preempts the continuation
  prompt and also pauses the goal loop for that turn (we still re-judge
  after, so if the user's message happens to complete the goal the judge
  will say ``done``).
- Stickiness: once an item is marked completed or impossible, only the user
  (via /subgoal undo) can flip it back. Judge updates that try to regress
  terminal items are silently ignored.
- This module has zero hard dependency on ``cli.HermesCLI`` or the gateway
  runner — both wire the same ``GoalManager`` in.

Nothing in this module touches the agent's system prompt or toolset.
    )annotationsN)	dataclassfieldasdict)Path)AnyDictListOptionalTuple   g      N@        i  i }  pending	completed
impossiblez[x]z[!]z[ ]judgeusera  [Continuing toward your standing goal]
Goal: {goal}

Continue working toward this goal. Take the next concrete step. If you believe the goal is complete, state so explicitly and stop. If you are blocked and need input from the user, say so clearly and stop.u  [Continuing toward your standing goal]
Goal: {goal}

Checklist progress ({done}/{total} done):
{checklist}

Work on the unchecked items above. Do not declare items done yourself — a judge marks them based on evidence in your output. If an item is genuinely impossible in this environment, explain why so the judge can mark it impossible. If you are blocked on a remaining item and need user input, say so clearly and stop.ul  You are a strict judge for an autonomous agent. Your first job, before judging anything, is to break the user's stated goal into an EXTREMELY DETAILED checklist of concrete, verifiable completion criteria. Each item must be specific enough that a third party reading the agent's output could decide unambiguously whether that item was achieved.

Be exhaustive. Bias toward MORE items, not fewer. Include sub-items, edge cases, quality bars, deployment steps, verification checks, and anything the user would reasonably expect from a goal of this type. If the user said 'build me a website' you should be enumerating homepage exists, navigation links work, content is non-placeholder, mobile responsive, accessibility tags present, deployed somewhere publicly accessible, domain/URL is functional, etc. Better to over-specify and let a few items get marked impossible than to under-specify and let the agent declare victory early.

Submit your checklist by calling the ``submit_checklist`` tool. Do not reply with prose or JSON in your message body — call the tool. The system will not see anything you write outside the tool call.zGoal:
{goal}

Produce the harshest, most detailed checklist of completion criteria you can. Aim for at least 5 items; more is better when warranted. Each item should be a single verifiable statement of fact about the finished work.u  You are a strict judge evaluating whether an autonomous agent has achieved a user's stated goal. You receive the goal text and the agent's most recent response. Your only job is to decide whether the goal is fully satisfied based on that response.

A goal is DONE only when:
- The response explicitly confirms the goal was completed, OR
- The response clearly shows the final deliverable was produced, OR
- The response explains the goal is unachievable / blocked / needs user input (treat this as DONE with reason describing the block).

Otherwise the goal is NOT done — CONTINUE.

Reply ONLY with a single JSON object on one line:
{"done": <true|false>, "reason": "<one-sentence rationale>"}u  You are a strict judge evaluating an autonomous agent's progress on a user's goal that has a detailed checklist of completion criteria. For EACH currently-pending checklist item, decide whether the available evidence shows the item is satisfied.

Be strict but not absurd. Default to leaving items pending UNLESS evidence is reasonably clear. Reasonable evidence includes:
- The agent's most recent response describing or showing the work
- Tool call results visible in the conversation history (file writes, command output, web requests, etc.)
- A clear statement by the agent that the work was done, when supported by tool output earlier in the conversation

Do NOT require the agent to re-prove items it has already established in earlier turns. If a tool call earlier in the conversation already wrote a file, you do not need fresh `ls` output every turn — once established, it's done.

Flip pending → completed when the response or recent tool calls show the item is satisfied. Flip pending → impossible only when the work demonstrates the item cannot be achieved in this environment (NOT merely that the agent didn't try). Vague intentions ('I will do X next') do NOT count as completion.

STICKINESS: items already marked completed or impossible are frozen. Do not include them in your updates. Only the user can revert them.

TOOLS:
- ``read_file(path, offset, limit)``: inspect the dumped conversation history file whose path is given in the user message. Use this when the snippet alone isn't enough to rule. Each call costs tokens, so only read when needed.
- ``update_checklist(updates, new_items, reason)``: issue your verdict. Call this exactly once per turn when you are ready to rule. Calling it ENDS the evaluation.

You MUST call one of these tools every turn. Do not reply with prose or JSON in your message body — the system will not see anything written outside tool calls. When you cite evidence, reference the agent's actual output specifically.u  Goal:
{goal}

Current checklist (each item is numbered, 1-based — use these exact 1-based numbers as the ``index`` field in your updates):
{checklist_block}

Agent's most recent response (snippet):
{response}

Conversation history file (call read_file on this path if you need more context — pagination supported via offset/limit):
{history_path}

Evaluate each pending item. Cite specific evidence.zNGoal:
{goal}

Agent's most recent response:
{response}

Is the goal satisfied?c                      e Zd ZU dZded<   eZded<   eZded<   dZ	ded<   d	Z
d
ed<   d	Zded<   ddZedd            Zd	S )ChecklistItemz5One concrete completion criterion attached to a goal.strtextstatusadded_by        floatadded_atNzOptional[float]completed_atOptional[str]evidencereturnDict[str, Any]c                     t          |           S N)r   selfs    ./root/.hermes/hermes-agent/hermes_cli/goals.pyto_dictzChecklistItem.to_dict   s    d||    data'ChecklistItem'c                   t          |                    dd                                                    }|sd}t          |                    dt                                                                                              }|t
          vrt          }t          |                    dt                                                                                              }|t          t          fvrt          } | |||t          |                    dd          pd          |                    d          t          |d                   nd |                    d	          
          S )Nr    z(empty item)r   r   r   r   r   r!   )r   r   r   r   r   r!   )	r   getstripITEM_PENDINGlowerVALID_ITEM_STATUSESADDED_BY_JUDGEADDED_BY_USERr   )clsr+   r   r   r   s        r(   	from_dictzChecklistItem.from_dict   s=   488FB''((..00 	"!DTXXh5566<<>>DDFF,,,!Ftxx
N;;<<BBDDJJLLNM:::%Hs488J44;<< 88N++7 d>*+++XXj))
 
 
 	
r*   )r"   r#   )r+   r#   r"   r,   )__name__
__module____qualname____doc____annotations__r1   r   r4   r   r   r   r!   r)   classmethodr7    r*   r(   r   r      s         ??IIIF"H""""H$(L(((("H""""    
 
 
 [
 
 
r*   r   c                  
   e Zd ZU dZded<   dZded<   dZded<   eZded	<   d
Z	ded<   d
Z
ded<   dZded<   dZded<   dZded<   dZded<    ee          Zded<   dZded<   d%dZed&d            Zd'd Zd(d!Zdd"d)d$ZdS )*	GoalStatez+Serializable goal state stored per session.r   goalactiver   r   int
turns_used	max_turnsr   r   
created_atlast_turn_atNr    last_verdictlast_reasonpaused_reasonconsecutive_parse_failures)default_factoryzList[ChecklistItem]	checklistFbool
decomposedr"   c                L    t          |           }t          j        |d          S )NFensure_ascii)r   jsondumps)r'   r+   s     r(   to_jsonzGoalState.to_json(  s"    d||z$U3333r*   raw'GoalState'c                   t          j        |          }|                    d          pg }g }t          |t                    rY|D ]V}t          |t
                    r?	 |                    t                              |                     F# t          $ r Y Rw xY wW | |                    dd          |                    dd          t          |                    dd          pd          t          |                    dt                    pt                    t          |                    d	d
          pd
          t          |                    dd
          pd
          |                    d          |                    d          |                    d          t          |                    dd          pd          |t          |                    dd                              S )NrM   rA   r.   r   rB   rD   r   rE   rF   r   rG   rH   rI   rJ   rK   rO   F)rA   r   rD   rE   rF   rG   rH   rI   rJ   rK   rM   rO   )rS   loadsr/   
isinstancelistdictappendr   r7   	ExceptionrC   DEFAULT_MAX_TURNSr   rN   )r6   rV   r+   raw_checklistrM   items         r(   	from_jsonzGoalState.from_json-  s   z#--3)+	mT** 	!% ! !dD)) !!!(()@)@)F)FGGGG$ ! ! ! !!
 s&"%%88Hh//488L!449::$((;0ABBWFWXXTXXlC88?C@@txx<<CDD.11//((?33'*4884PRS+T+T+YXY'Z'ZDHH\599::
 
 
 	
s   -B
BBTuple[int, int, int, int]c                    t          | j                  }t          d | j        D                       }t          d | j        D                       }||z
  |z
  }||||fS )z/Return (total, completed, impossible, pending).c              3  :   K   | ]}|j         t          k    d V  dS    N)r   ITEM_COMPLETED.0its     r(   	<genexpr>z-GoalState.checklist_counts.<locals>.<genexpr>M  s.      RRbbi>6Q6Q6Q6Q6Q6QRRr*   c              3  :   K   | ]}|j         t          k    d V  dS rf   )r   ITEM_IMPOSSIBLEri   s     r(   rl   z-GoalState.checklist_counts.<locals>.<genexpr>N  s.      TTrryO7S7S7S7S7S7STTr*   )lenrM   sum)r'   totalr   r   r   s        r(   checklist_countszGoalState.checklist_countsJ  sk    DN##RRDNRRRRR	TTT^TTTTT
)#j0iW44r*   c                P    | j         sdS t          d | j         D                       S )zITrue iff at least one item exists and every item is in a terminal status.Fc              3  2   K   | ]}|j         t          v V  d S r%   )r   TERMINAL_ITEM_STATUSESri   s     r(   rl   z)GoalState.all_terminal.<locals>.<genexpr>V  s*      PP229 66PPPPPPr*   )rM   allr&   s    r(   all_terminalzGoalState.all_terminalR  s0    ~ 	5PPPPPPPPr*   numberedry   c               b   | j         sdS g }t          | j         d          D ]y\  }}t                              |j        d          }|r| d| nd| }| d|j         }|j        t          k    r|j        r|d|j         d	z  }|                    |           zd
	                    |          S )Nz(empty)rg   )startz[?]z. z   z (impossible: )
)
rM   	enumerateITEM_MARKERSr/   r   r   rn   r!   r]   join)r'   ry   linesira   markerprefixlines           r(   render_checklistzGoalState.render_checklistX  s    ~ 	9 q999 	 	GAt!%%dk599F)1D%%V%%%}F}}F**ty**D{o--$--99999LLyyr*   r"   r   )rV   r   r"   rW   )r"   rc   r"   rN   )ry   rN   r"   r   )r8   r9   r:   r;   r<   r   rD   r_   rE   rF   rG   rH   rI   rJ   rK   r   r[   rM   rO   rU   r=   rb   rr   rw   r   r>   r*   r(   r@   r@     ss        55IIIFJ&I&&&&JL"&L&&&&!%K%%%%#'M''''&''''' &+U4%@%@%@I@@@@J4 4 4 4
 
 
 
 [
85 5 5 5Q Q Q Q 49                r*   r@   
session_idr   r"   c                    d|  S )Nzgoal:r>   )r   s    r(   	_meta_keyr   k  s    :r*   r#   	_DB_CACHEOptional[Any]c                    	 ddl m}  ddlm} t	           |                       }n3# t
          $ r&}t                              d|           Y d}~dS d}~ww xY wt          	                    |          }||S 	  |            }n3# t
          $ r&}t                              d|           Y d}~dS d}~ww xY w|t          |<   |S )a  Return a SessionDB instance for the current HERMES_HOME.

    SessionDB has no built-in singleton, but opening a new connection per
    /goal call would thrash the file. We cache one instance per
    ``hermes_home`` path so profile switches still pick up the right DB.
    Defensive against import/instantiation failures so tests and
    non-standard launchers can still use the GoalManager.
    r   get_hermes_home)	SessionDBz,GoalManager: SessionDB bootstrap failed (%s)Nz$GoalManager: SessionDB() raised (%s))
hermes_constantsr   hermes_stater   r   r^   loggerdebugr   r/   )r   r   homeexccacheddbs         r(   _get_session_dbr   r  s   444444******??$$%%   CSIIIttttt ]]4  FY[[   ;SAAAttttt IdOIs,   #& 
AAA8
B 
B3B..B3Optional[GoalState]c                   | sdS t                      }|dS 	 |                    t          |                     }n3# t          $ r&}t                              d|           Y d}~dS d}~ww xY w|sdS 	 t                              |          S # t          $ r'}t                              d| |           Y d}~dS d}~ww xY w)z4Load the goal for a session, or None if none exists.Nz GoalManager: get_meta failed: %sz3GoalManager: could not parse stored goal for %s: %s)	r   get_metar   r^   r   r   r@   rb   warning)r   r   rV   r   s       r(   	load_goalr     s     t			B	ztkk)J//00   7===ttttt  t""3'''   LjZ]^^^ttttts-   "; 
A+A&&A+3B 
B>B99B>stateNonec                   | sdS t                      }|dS 	 |                    t          |           |                                           dS # t          $ r&}t
                              d|           Y d}~dS d}~ww xY w)z5Persist a goal to SessionDB. No-op if DB unavailable.Nz GoalManager: set_meta failed: %s)r   set_metar   rU   r^   r   r   )r   r   r   r   s       r(   	save_goalr     s     			B	z>
Ij))5==??;;;;; > > >7=========>s   5A 
A?A::A?c                Z    t          |           }|dS d|_        t          | |           dS )zDMark a goal cleared in the DB (preserved for audit, status=cleared).Ncleared)r   r   r   )r   r   s     r(   
clear_goalr     s6    j!!E}ELj%     r*   Optional[Path]c                 J   	 ddl m}  t           |                       }n3# t          $ r&}t                              d|           Y d}~dS d}~ww xY w	 |dz  }|                    dd           |S # t          $ r&}t                              d|           Y d}~dS d}~ww xY w)	zHReturn ``<HERMES_HOME>/goals`` (created on first use), or None on error.r   r   z*goals dump dir: get_hermes_home failed: %sNgoalsT)parentsexist_okz goals dump dir: mkdir failed: %s)r   r   r   r^   r   r   mkdir)r   r   r   paths       r(   _goals_dump_dirr     s    444444OO%%&&   A3GGGtttttg~

4$
///   7===ttttts,     
AAAA2 2
B"<BB"c                J    t          j        dd| pd          }|dd         pdS )z7Make a session_id safe for use as a filename component.z[^A-Za-z0-9._-]+_unknownN   )resub)r   cleaneds     r(   _safe_session_filenamer     s.    f(#z/FYGGG4C4=%I%r*   c                P    t                      }|dS |t          |            dz  S )z8Where the dumped messages JSON for ``session_id`` lives.Nz.json)r   r   )r   bases     r(   conversation_dump_pathr     s4    D|t+J77>>>>>r*   messagesList[Dict[str, Any]]c                0   | r|sdS t          |           }|dS 	 t          |dd          5 }t          j        ||ddt                     ddd           n# 1 swxY w Y   |S # t
          $ r&}t                              d|           Y d}~dS d}~ww xY w)	zHWrite ``messages`` to the goals/ dump file. Returns the path on success.Nwutf-8)encodingF   )rR   indentdefaultz#dump_conversation: write failed: %s)r   openrS   dumpr   r^   r   r   )r   r   r   fhr   s        r(   dump_conversationr     s    X t!*--D|t $g... 	O"Ihq#NNNN	O 	O 	O 	O 	O 	O 	O 	O 	O 	O 	O 	O 	O 	O 	O   :C@@@ttttts:   A% AA% AA% A A% %
B/BBr   limitrC   c                N    | sdS t          |           |k    r| S | d |         dz   S )Nr.   u   … [truncated])ro   )r   r   s     r(   	_truncater     s9     r
4yyE<+++r*   z\{.*\}rV   Optional[Dict[str, Any]]c                   | sdS |                                  }|                    d          r=|                     d          }|                    d          }|dk    r||dz   d         }	 t          j        |          }nj# t
          $ r] t                              |          }|sY dS 	 t          j        |                    d                    }n# t
          $ r Y Y dS w xY wY nw xY wt          |t                    r|ndS )zLBest-effort extraction of a single JSON object from a possibly-prosey reply.Nz````r~   rg   r   )r0   
startswithfindrS   rY   r^   _JSON_OBJECT_REsearchgrouprZ   r\   )rV   r   nlr+   matchs        r(   _extract_json_objectr     s!    t99;;Du !zz#YYt__88Q=D	z$   &&t,, 	44	:ekk!nn--DD 	 	 	444	 D dD))344t3s6   ,B &C(+'CC(
C"C(!C""C('C(Tuple[bool, str, bool]c                   | sdS t          |           }|ddt          | d          dfS |                    d          }t          |t                    r)|                                                                dv }nt          |          }t	          |                    d	          pd
                                          }|sd}||dfS )a  Parse the freeform judge's reply. Fail-open to ``(False, "<reason>", parse_failed)``.

    Returns ``(done, reason, parse_failed)``. ``parse_failed`` is True when the
    judge returned output that couldn't be interpreted as the expected JSON
    verdict (empty body, prose, malformed JSON). Callers use that flag to
    auto-pause after N consecutive parse failures so a weak judge model
    doesn't silently burn the turn budget.
    )Fjudge returned empty responseTNFjudge reply was not JSON:    Tdone)trueyes1r   reasonr.   no reason provided)r   r   r/   rZ   r   r0   r2   rN   )rV   r+   done_valr   r   s        r(   _parse_judge_responser     s      <;;$$D|J9S#3F3FJJDPPxxH(C   ~~%%''+GGH~~(##)r**0022F &%r*   !Tuple[List[Dict[str, Any]], bool]c                   | sg dfS t          |           }|g dfS |                    d          }t          |t                    sg dfS g }|D ]}t          |t                    rOt          |                    dd                                                    }|r|                    d|i           ft          |t
                    r-|                                }|r|                    d|i           |dfS )z?Parse a Phase-A decompose reply. Returns (items, parse_failed).TNrM   r   r.   F)r   r/   rZ   r[   r\   r   r0   r]   )rV   r+   	raw_itemsoutra   r   s         r(   _parse_decompose_responser   5  s    4x$$D|4x%%Ii&& 4x "C + +dD!! 	+txx++,,2244D +

FD>***c"" 	+::<<D +

FD>***:r*   Tuple[Dict[str, Any], bool]c                   | sg g dddfS t          |           }|g g dt          | d          ddfS |                    d          pg }|                    d          pg }t          |                    d	          pd
                                          pd}g }t          |t                    r|D ]}t          |t                    s	 t          |                    d                    }n# t          t          f$ r Y Ow xY w|dz
  }t          |                    dd
                                                                                    }	|	t          vrt          |                    d          pd
                                          pd}
|                    ||	|
d           g }t          |t                    r|D ]}t          |t                    rOt          |                    dd
                                                    }|r|                    d|i           ft          |t                    r-|                                }|r|                    d|i           |||ddfS )zParse a Phase-B checklist eval reply. Returns (parsed, parse_failed).

    parsed = {"updates": [...], "new_items": [...], "reason": str}
    r   updates	new_itemsr   TNr   r   r   r   r   r.   r   indexrg   r   r!   r   r   r!   r   F)r   r   r/   r   r0   rZ   r[   r\   rC   	TypeError
ValueErrorr2   ru   r]   )rV   r+   r   r   r   norm_updatesupd
idx_1basedidxr   r!   norm_newrk   r   s                 r(   _parse_evaluate_responser   L  s   
  aB:YZZ\```$$D| Nyc7J7JNN 
 
 	
 hhy!!'RG%%+I(##)r**0022J6JFL'4   X 	X 	XCc4((  !!1!122

z*   q.C2..//5577==??F333377:..4"55;;==EH#X V VWWWWH)T"" 	4 	4 	4B"d## 4266&"--..4466 4OOVTN333B$$ 4xxzz 4OOVTN333#(fMMuTTs   "C00DDfunction	read_filezRead a portion of the dumped conversation history JSON file. Use this when the snippet alone isn't enough to rule. Returns lines from the file with 1-based line numbers. Pagination supported via offset and limit. Reads beyond a built-in cap are truncated.objectstringzXAbsolute path to the conversation history file. You were given this in the user message.)typedescriptionintegerz+1-indexed starting line number (default 1).rg   )r   r   r   zMax lines to return (default z).)r   offsetr   r   )r   
propertiesrequired)namer   
parametersr   r   _JUDGE_READ_FILE_TOOL_SCHEMAsubmit_checklistzSubmit the harsh, detailed completion-criteria checklist you decomposed the goal into. Each item is one verifiable completion criterion. Bias toward more items, not fewer.itemsarrayzList of checklist items. Each item is a single verifiable statement of fact about the finished work. Aim for at least 5 items; more is better when warranted.zThe completion-criterion text.)r   r   r  #_JUDGE_SUBMIT_CHECKLIST_TOOL_SCHEMAupdate_checklistu[  Issue your verdict on the current checklist. For each currently-pending item, decide whether the agent's most recent response (and conversation history if you read it) shows the item is satisfied. You may also append new items the original decomposition missed. Call this exactly once when you are ready to rule — calling it ends the evaluation.u
  Per-item rulings. Use the 1-based ``index`` shown in the checklist. ``status`` must be 'completed' (clear evidence the item is done) or 'impossible' (item cannot be achieved in this environment). Items already in a terminal status are frozen — do not include them.z1-based checklist index.)r   enumzkOne-sentence specific citation of why this item is done or impossible. Reference the agent's actual output.r   u   Optional: completion criteria the original decomposition missed. Stay strict — only add items that genuinely belong as completion criteria for this goal.zThe new criterion text.z9One-sentence overall rationale for this round of updates.r   #_JUDGE_UPDATE_CHECKLIST_TOOL_SCHEMAr   r   allowed_pathr   r  c          	        | st          j        ddi          S 	 t          |                                           }n0# t          $ r#}t          j        dd| i          cY d}~S d}~ww xY w|G	 |                                }n# t          $ r |}Y nw xY w||k    rt          j        dd| i          S |                                st          j        dd| i          S 	 t          dt          |pd                    }t          dt          t          |pt                    t                              }n-# t          t          f$ r t          j        ddi          cY S w xY w	 t          |d	d
d          5 }|                                }ddd           n# 1 swxY w Y   n0# t          $ r#}t          j        dd| i          cY d}~S d}~ww xY wt          |          }	|dz
  }
t          |
|z   |	          }||
|         }d                    |          }t          |          t           k    r|dt                    dz   }t          j        t#          |          |	|t          |          ||	k     r|dz   nd|dd          S )u   Bounded read of the dumped conversation file. Returns JSON-serializable text.

    Restricted to ``allowed_path`` when provided — the judge cannot use this
    tool to read arbitrary files.
    errorzpath is requiredzpath resolve failed: Nz@read_file is restricted to the conversation dump path. Allowed: zfile not found: rg   z!offset and limit must be integersrr   replace)r   errorszread failed: r.   u"   
… [truncated by judge read cap])r   total_linesr   returnednext_offsetcontentFrQ   )rS   rT   r   resolver^   existsmaxrC   min_JUDGE_READ_FILE_MAX_LINESr   r   r   	readlinesro   r   _JUDGE_READ_FILE_MAX_CHARSr   )r   r   r   r  targetr   allowedr   r   rq   r{   endslice_linesr   s                 r(   _judge_read_filer$  -  s[     9z7$67888Dd##%% D D Dz7$AC$A$ABCCCCCCCCD 	#"**,,GG 	# 	# 	#"GGG	#W:* '* *    ==?? Bz7$?v$?$?@AAAJQFKa(())As3uB(BCCE_``aaz" J J Jz7$GHIIIIIJ<&#	BBB 	#bLLNNE	# 	# 	# 	# 	# 	# 	# 	# 	# 	# 	# 	# 	# 	# 	# < < <z7$9C$9$9:;;;;;;;;< JJEQJE
eemU
#
#Cc	"K
''+

C
3xx,,,---.1VV:F$$"%++sQww4     s   !< 
A)A$A)$A)/B BB#AD: :'E$#E$(F( ;FF( F  F( #F $F( (
G2G
GGTuple[Optional[Any], str]c                    	 ddl m}  n3# t          $ r&}t                              d|           Y d}~dS d}~ww xY w	  | d          \  }}n3# t          $ r&}t                              d|           Y d}~dS d}~ww xY w||sdS ||fS )z6Return (client, model) or (None, '') when unavailable.r   )get_text_auxiliary_clientz.goal judge: auxiliary client import failed: %sN)Nr.   
goal_judgez0goal judge: get_text_auxiliary_client failed: %s)agent.auxiliary_clientr'  r^   r   r   )r'  r   clientmodels       r(   _get_judge_clientr,  q  s    DDDDDDD   EsKKKxxxxx11,??   GMMMxxxxx ~U~x5=s'   	 
949A 
A<A77A<msgr   	tool_namec                    t          | dd          pg }|D ]v}	 t          |dd          p-t          |t                    r|                    d          ndpd}t          |dd          p+t          |t                    r|                    d          nd}|t          |dd          p+t          |t                    r|                    d          nd}||k    rt          |dd          p+t          |t                    r|                    d          nd}t          |t                    r,	 |rt          j        |          ni }n,# t          $ r i }Y nw xY wt          |t                    r|}ni }|||d	c S # t          $ r Y tw xY wdS )
zFind a tool call by name on a chat-completions message. Returns
    ``{"id", "name", "arguments": <dict>}`` or None.

    Robust to provider shims that return tool_calls as objects or dicts
    and arguments as JSON strings or already-parsed dicts.
    
tool_callsNidtc-?r   r  r.   	arguments)r1  r  r3  )getattrrZ   r\   r/   r   rS   rY   r^   )	r-  r.  r0  tctc_idfnfn_namefn_args_rawargss	            r(   _extract_tool_callr;    s    lD117RJ  	Bd++i
2t@T@T0^tZ^iciEZ..hTVX\I]I]3g266*3E3E3EcgBzb&$//cjQSUYFZFZ4bBFF6NNN`bG)##!"k488qT^_acgThTh=pRVVK=P=P=PnpK+s++ 6AI4:k222rDD    DDDK.. "tDDDDD 	 	 	H	4sD   A>E=AE=AE=0E	E=	EE=E"E==
F
Fc                    g }t          | dd          pg D ]f}	 t          |dd          p-t          |t                    r|                    d          ndpd}t          |dd          p+t          |t                    r|                    d          nd}t          |dd          p+t          |t                    r|                    d          nd}t          |dd          p+t          |t                    r|                    d          nd}t          |t                    s(	 t          j        |          }n# t          $ r d	}Y nw xY w|                    |d|pd|d
d           W# t          $ r Y dw xY w|S )zyConvert a provider-shim tool_calls list into plain-dict form for
    inclusion in subsequent ``messages=[...]`` payloads.r0  Nr1  r2  r   r  r.   r3  z{})r  r3  )r1  r   r   )	r4  rZ   r\   r/   r   rS   rT   r^   r]   )r-  r   r5  r6  r7  r8  fn_argss          r(   _serialize_assistant_tool_callsr>    s    !#Cc<..4"  	Bd++i
2t@T@T0^tZ^iciEZ..hTVX\I]I]3g266*3E3E3EcgBb&$//cjQSUYFZFZ4bBFF6NNN`bGb+t44mPZ[]_cPdPd9l9L9L9LjlGgs++ ##"j11GG  # # #"GGG#JJ"%,]II     
  	 	 	H	Js6   DE-&D;:E-;E
E-	E

!E--
E;:E;  )
max_tokensr*  r+  toolsforced_tool_namer    timeoutr   r@  #Tuple[Optional[Any], Optional[str]]c          
        |rdd|id}nd}|ddg}d}	|D ]}
	 | j         j                            ||||
d||          dfc S # t          $ r}t	          |          j         d	| }	t          |                                          t          fd
dD                       sd|	fcY d}~c S t          
                    d|
|           Y d}~d}~ww xY wd|	pdfS )zCall the judge with a forced tool choice, falling back to ``auto``
    if the provider rejects ``required`` / a specific function choice.

    Returns ``(response, error)``. On success, ``error`` is None.
    r   r  r  r  autoNr   )r+  r   rA  tool_choicetemperaturer@  rC  z: c              3      K   | ]}|v V  	d S r%   r>   )rj   tokenr-  s     r(   rl   z/_call_judge_with_tool_choice.<locals>.<genexpr>  s7        u|      r*   )rG  ztool choicer  zfunction callunsupportedznot supportedinvalid400z6goal judge: tool_choice=%r rejected (%s); falling backz all tool_choice fallbacks failed)chatcompletionscreater^   r   r8   r   r2   anyr   r   )r*  r+  r   rA  rB  rC  r@  primary_choiceattemptslast_errchoicer   r-  s               @r(   _call_judge_with_tool_choicerV    s}   $  $",6CS:TUU#):v>H"H  	;*11!"% 2        	 	 	s)),5555H c((..""C     2      & X~%%%%%%%%LLQSY[^___HHHH	 ????s$   'A
CAC&C.CCrC  rA   *Tuple[List[Dict[str, Any]], Optional[str]]c          	         |                                  sg dfS t                      \  }}|g dfS dt          ddt                              t          | d                    dg}t          |||t          gd	|d
          \  }}|"t          	                    d|           g d| fS 	 |j
        d         j        }n# t          $ r g dfcY S w xY wt          |d	          }|t          |dd          pd}	t          |	          \  }
}|s|
s-t          	                    dt          |	d                     g dfS t          	                    dt!          |
                     |
dfS |d                             d          pg }g }
t%          |t&                    r|D ]}t%          |t(                    rOt+          |                    dd                                                     }|r|
                    d|i           ft%          |t*                    r-|                                 }|r|
                    d|i           |
st          	                    d           g dfS t          	                    dt!          |
                     |
dfS )zPhase-A: ask the judge to break the goal into a checklist via a
    forced ``submit_checklist`` tool call.

    Returns ``(items, error)``. On any failure, returns ``([], reason)``
    so the caller can fall back to freeform mode.
    
empty goalNauxiliary client unavailablesystemroler  r   r   rA   r    r+  r   rA  rB  rC  r@  z$goal decompose: API call failed (%s)zdecompose error: r   zdecompose response malformedr  r.   zLgoal decompose: no submit_checklist tool call AND no parseable JSON (raw=%r)r   z.decompose: judge did not call submit_checklistz;goal decompose: fell back to JSON-content parser (%d items)r3  r  r   z:goal decompose: submit_checklist returned empty items listzdecompose: empty checklistz9goal decompose: produced %d checklist items via tool call)r0   r,  DECOMPOSE_SYSTEM_PROMPTDECOMPOSE_USER_PROMPT_TEMPLATEformatr   rV  r
  r   infochoicesmessager^   r;  r4  r   ro   r/   rZ   r[   r\   r   r]   )rA   rC  r*  r+  r   resperrr-  r5  r  r  parse_failedr   entryr   s                  r(   decompose_goalrl    s    ::<<  <%''MFE~111 &=>>5<<tT** =  	
 	
H -23+  ID# |:C@@@,s,,,,2l1o% 2 2 2111112 
C!3	4	4B	z #y"--37@@| 	Hu 	HKK^'3''   GGGQSVW\S]S]^^^d{;##G,,2I"$E)T"" 	1 	1 	1E%&& 1599VR00117799 1LL&$000E3'' 1{{}} 1LL&$000 0PQQQ///
KKKSQVZZXXX$;s   *B= =CClast_responseTuple[str, str, bool]c                  |                                  sdS |                                 sdS t                      \  }}|dS t                              t	          | d          t	          |t
                              }	 |j        j                            |dt          dd	|dgd
d|          }nL# t          $ r?}t                              d|           ddt          |          j         dfcY d}~S d}~ww xY w	 |j        d
         j        j        pd}n# t          $ r d}Y nw xY wt%          |          \  }	}
}|	rdnd}t                              d|t	          |
d                     ||
|fS )u   Legacy freeform judge — kept for goals with no checklist.

    Returns ``(verdict, reason, parse_failed)`` where verdict is ``"done"``,
    ``"continue"``, or ``"skipped"``.
    )skippedrZ  Fcontinuez$empty response (nothing to evaluate)FN)rr  r[  Fr`  )rA   responser\  r]  r   r   r   )r+  r   rH  r@  rC  u@   goal judge: API call failed (%s) — falling through to continuerr  judge error: Fr.   r   z+goal judge (freeform): verdict=%s reason=%sx   )r0   r,  &EVALUATE_USER_PROMPT_FREEFORM_TEMPLATErd  r   _JUDGE_RESPONSE_SNIPPET_CHARSrN  rO  rP  EVALUATE_SYSTEM_PROMPT_FREEFORMr^   r   re  r   r8   rf  rg  r  r   )rA   rm  rC  r*  r+  promptrh  r   rV   r   r   rj  verdicts                r(   judge_goal_freeformr{  B  s    ::<< .--   IHH%''MFE~@@3::tT""=*GHH ;  F
G{&--!.MNNF33  . 	
 	
  G G GVX[\\\?499+=??FFFFFFFGl1o%-3    "7s!;!;D&,,ff*G
KK=w	RXZ]H^H^___FL((s0    1B2 2
C;<4C60C;6C;?D D('D()rC  max_tool_callshistory_pathr|  c                  t                      \  }}|g g dddfS |                     d          }t                              t	          | j        d          |t	          |t                    |rt          |          nd	          }d
t          dd|dg}	t          g}
||

                    dt                     |t          dt          |                    nd}t          |dz             D ]}|dk    r|
nt          g}|dk    rdnd}t          |||	|||d          \  }}|(t                               d|           g g d| ddfc S 	 |j        d         j        }n# t(          $ r g g dddfcY c S w xY wt+          |d          }|t-          |d                   }t                               dt/          |                    d          pg           t/          |                    d          pg           t	          |                    dd          d                     |dfc S t+          |d          }||dk    r|d         }t3          t          |                    dd                    |                    dd          |                    d t4                    |!          }|	                    d"t9          |d#d          pdt;          |          d$           |	                    d%|d&         d|d'           |dz  }t9          |d#d          pd}|                                rzt?          |          \  }}|sft                               d(t/          |                    d          pg           t/          |                    d          pg                      |dfc S t                               d)|t	          |d                     g g d*ddfc S g g d+ddfS ),ab  Phase-B: judge evaluates each pending checklist item via forced
    tool calls.

    The judge has two tools available:
      - ``read_file``: inspect the dumped conversation history
      - ``update_checklist``: issue the verdict (terminates the loop)

    ``tool_choice="required"`` forces one of them every iteration. We loop
    until ``update_checklist`` is called or ``max_tool_calls`` is exhausted.

    Returns ``(parsed, parse_failed)`` where parsed is
    ``{"updates": [...], "new_items": [...], "reason": str}``.
    Falls open on transport errors: empty updates/new_items, parse_failed=False.
    Nr[  r   FTrx   r`  u)   (unavailable — judge from snippet only))rA   checklist_blockrs  r}  r\  r]  r   r   r   r  r?  ra  z,goal judge (checklist): API call failed (%s)rt  zjudge response malformedr3  z9goal judge (checklist): updates=%d new_items=%d reason=%sr   r   r   r.   ru  r   r   r   rg   r   r  	assistantr  )r^  r  r0  toolr1  )r^  tool_call_idr  r  zPgoal judge (checklist): fell back to JSON-content parser updates=%d new_items=%dus   goal judge (checklist): judge emitted neither read_file nor update_checklist (iteration=%d, content=%r) — bailingz#judge did not call update_checklistz)judge tool-loop exhausted without verdict) r,  r   'EVALUATE_USER_PROMPT_CHECKLIST_TEMPLATErd  r   rA   rw  r    EVALUATE_SYSTEM_PROMPT_CHECKLISTr  insertr  r  rC   rangerV  r   re  rf  rg  r^   r;  _normalize_update_argsro   r/   r$  r  r]   r4  r>  r0   r   )r   rm  r}  rC  r|  r*  r+  r  user_promptr   rA  
reads_left	iteration
loop_toolsforcedrh  ri  r-  	update_tcparsedread_tcr:  tool_resultr  rj  s                            r(   evaluate_checklistr  u  s   , &''MFE~R;YZZ\abb ,,d,;;O9@@uz4(('=*GHH*6gS&&&<g	 A  K &FGGK00&H $G"GEQ45550<0HQN++,,,aJ
 :>** a
 a
	 )1nnUU3V2W
 (2Q##D0#
 
 
	c <KKFLLL  "!#3c33 
    	,q/)CC 	 	 	R;UVV     	 's,>??	 +Ik,BCCFKKKFJJy))/R00FJJ{++1r22&**Xr22C88	   5=    %S+66:>>;'D*DHHVR(())xx!,,hhw(BCC)	  K OO#"3	266<"=cBB    
 OO '#&	     !OJ #y"--3==?? 		%#;G#D#D FL %.

9--344

;//5266	   u}$$$Fy#..	
 	
 	
 ? 
 
 	
 	
 	
 A	
 	

 	 s   E""E98E9r:  c                   |                      d          pg }|                      d          pg }t          |                      d          pd                                          pd}g }t          |t                    r|D ]}t          |t
                    s	 t          |                     d                    }n# t          t          f$ r Y Ow xY wt          |                     dd                                                    	                                }|t          vrt          |                     d          pd                                          pd	}|                    |d
z
  ||d           g }	t          |t                    r|D ]}
t          |
t
                    rOt          |
                     dd                                                    }|r|	                    d|i           ft          |
t                    r-|
                                }|r|	                    d|i           ||	|dS )u  Validate and normalize the ``update_checklist`` tool arguments.

    Performs the same 1-based → 0-based conversion and terminal-status
    filter as ``_parse_evaluate_response``. Returns the canonical
    ``{updates, new_items, reason}`` shape callers expect.
    r   r   r   r.   r   r   r   r!   Nrg   r   r   r   )r/   r   r0   rZ   r[   r\   rC   r   r   r2   ru   r]   )r:  raw_updatesraw_newr   r   r   r   r   r!   r   rk   r   s               r(   r  r    sS    ((9%%+Khh{##)rG(##)r**0022J6JF)+L+t$$  	 	Cc4((  !1!122

z*   2..//5577==??F333377:..4"55;;==EH#a $! !     &(H'4   	4 	4 	4B"d## 4266&"--..4466 4OOVTN333B$$ 4xxzz 4OOVTN333#(fMMMs   "B<<CCc                     e Zd ZdZedd<dZed=d
            Zd>dZd>dZ	d?dZ
d?dZddd@dZdAdBdZdddCdZdDdZdEd ZdFd#ZdGd&ZdHd'ZdDd(Zdddd)dId1Zddd2dJd5ZedKd7            ZedLd9            ZdMd;ZdS )NGoalManageru\  Per-session goal state + continuation decisions.

    The CLI and gateway each hold one ``GoalManager`` per live session.

    Methods:

    - ``set(goal)`` — start a new standing goal.
    - ``clear()`` — remove the active goal.
    - ``pause()`` / ``resume()`` — explicit user controls.
    - ``status()`` — printable one-liner.
    - ``add_subgoal(text)`` — user appends a checklist item.
    - ``mark_subgoal(index, status)`` — user flips an item (override).
    - ``remove_subgoal(index)`` — user deletes an item.
    - ``clear_checklist()`` — user wipes the checklist; next turn re-decomposes.
    - ``evaluate_after_turn(last_response, agent=None)`` — call the judge,
      update state, return a decision dict.
    - ``next_continuation_prompt()`` — the canonical user-role message to
      feed back into ``run_conversation``.
    )default_max_turnsr   r   r  rC   c               r    || _         t          |pt                    | _        t	          |          | _        d S r%   )r   rC   r_   r  r   _state)r'   r   r  s      r(   __init__zGoalManager.__init___  s3    $!$%6%K:K!L!L+4Z+@+@r*   r"   r   c                    | j         S r%   )r  r&   s    r(   r   zGoalManager.statef  s
    {r*   rN   c                4    | j         d uo| j         j        dk    S )NrB   r  r   r&   s    r(   	is_activezGoalManager.is_activej  s    {$&I4;+=+IIr*   c                0    | j         d uo| j         j        dv S )N)rB   pausedr  r&   s    r(   has_goalzGoalManager.has_goalm  s    {$&U4;+=AU+UUr*   c                   | j         }|	|j        dv rdS |j         d|j         d}|                                \  }}}}d}|rd||z    d| d}|j        dk    rd	| | d
|j         S |j        dk    r$|j        r
d|j         nd}d| | | d
|j         S |j        dk    rd| | d
|j         S d|j         d| | d
|j         S )N)r   z*No active goal. Set one with /goal <text>./z turnsr.   z,  donerB   u   ⊙ Goal (active, ): r      — u   ⏸ Goal (paused, r   u   ✓ Goal done (zGoal ()r  r   rD   rE   rr   rA   rJ   )	r'   sturnscl_totalcl_donecl_impr   cl_textextras	            r(   status_linezGoalManager.status_linep  sD   K9L00??<55!+555'('9'9';';$'61 	>=7V+==h===G8xCCwCC16CCC8x12H-AO---bEJJwJJJ!&JJJ8v@U@G@@@@@???E?7??qv???r*   c                f    | j         dS | j         j        sdS | j                             d          S )z-Public helper for the /subgoal slash command.Nz(no active goal)u=   (checklist empty — judge will populate it on the next turn)Trx   )r  rM   r   r&   s    r(   r   zGoalManager.render_checklist  s=    ;%%{$ 	SRR{++T+:::r*   N)rE   rA   rE   Optional[int]r@   c          
        |pd                                 }|st          d          t          |dd|rt          |          n| j        t          j                    dg d          }|| _        t          | j        |           |S )Nr.   zgoal text is emptyrB   r   r   F)rA   r   rD   rE   rF   rG   rM   rO   )	r0   r   r@   rC   r  timer  r   r   )r'   rA   rE   r   s       r(   setzGoalManager.set  s    
!!## 	31222(1Mc)nnnt7My{{	
 	
 	
 $/5)))r*   user-pausedr   c                    | j         sd S d| j         _        || j         _        t          | j        | j                    | j         S )Nr  )r  r   rJ   r   r   r'   r   s     r(   pausezGoalManager.pause  sA    { 	4%$*!$/4;///{r*   T)reset_budgetr  c                   | j         sd S d| j         _        d | j         _        |rd| j         _        t	          | j        | j                    | j         S )NrB   r   )r  r   rJ   rD   r   r   )r'   r  s     r(   resumezGoalManager.resume  sS    { 	4%$(! 	'%&DK"$/4;///{r*   r   c                r    | j         d S d| j         _        t          | j        | j                    d | _         d S )Nr   )r  r   r   r   r&   s    r(   clearzGoalManager.clear  s8    ;F&$/4;///r*   c                    | j         sd S d| j         _        d| j         _        || j         _        t	          | j        | j                    d S )Nr   )r  r   rH   rI   r   r   r  s     r(   	mark_donezGoalManager.mark_done  sI    { 	F##) "($/4;/////r*   r   r   c                N   | j         t          d          |pd                                }|st          d          t	          |t
          t          t          j                              }| j         j        	                    |           t          | j        | j                    |S )zEUser appends a new checklist item. Requires an active or paused goal.Nno active goalr.   zsubgoal text is emptyr   r   r   r   )r  RuntimeErrorr0   r   r   r1   r5   r  rM   r]   r   r   )r'   r   ra   s      r(   add_subgoalzGoalManager.add_subgoal  s    ;/000
!!## 	64555"Y[[	
 
 
 	$$T***$/4;///r*   index_1basedr   c                t   | j         t          d          |pd                                                                }|t          vr't          dt          t                     d|          t          |          dz
  }|dk     s|t          | j         j	                  k    r*t          dt          | j         j	                   d	          | j         j	        |         }||_        |t          v r't          j                    |_        |j        sd
|_        nd|_        t!          | j        | j                    |S )u  User overrides an item's status.

        ``status`` may be ``completed``, ``impossible``, or ``pending``
        (the last only as an undo flow). Stickiness rules do NOT apply to
        user actions — the user is the only authority that can revert
        terminal items.
        Nr  r.   zstatus must be one of z; got rg   r   index out of range (1..r}   zmarked by user)r  r  r0   r2   r3   r   sortedrC   ro   rM   
IndexErrorr   ru   r  r   r!   r   r   )r'   r  r   r   ra   s        r(   mark_subgoalzGoalManager.mark_subgoal  s<    ;/000,B%%''--//,,,V0C)D)DVVFVV   ,!#77cS!67777G#dk.C*D*DGGG   {$S)+++ $	D= 1 0 $D$/4;///r*   c                b   | j         t          d          t          |          dz
  }|dk     s|t          | j         j                  k    r*t          dt          | j         j                   d          | j         j                            |          }t          | j        | j                    |S )Nr  rg   r   r  r}   )	r  r  rC   ro   rM   r  popr   r   )r'   r  r   removeds       r(   remove_subgoalzGoalManager.remove_subgoal  s    ;/000,!#77cS!67777G#dk.C*D*DGGG   +'++C00$/4;///r*   c                |    | j         dS g | j         _        d| j         _        t          | j        | j                    dS )zIWipe the checklist and reset decomposed=False so the judge re-decomposes.NF)r  rM   rO   r   r   r&   s    r(   clear_checklistzGoalManager.clear_checklist  s=    ;F "!&$/4;/////r*   )user_initiatedagentr   rm  r  r  r   r   Optional[List[Dict[str, Any]]]r#   c                  | j         }||j        dk    r|r|j        ndddddddS |xj        dz  c_        t          j                    |_        |j        st          |j                  \  }}d	|_        d}|rt          j                    }	|D ]=}
|j        	                    t          |
d
         t          t          |	                     >d|_        dt          |           d|_        dt          |           d}t!          | j        |           dd	|                                 d|j        |dS t&                              d|           d| |_        |                     ||||          \  }}}||_        ||_        |r|xj        dz  c_        nd|_        |dk    r(d|_        t!          | j        |           dddd|d| dS |j        t.          k    r>d|_        d|j         d|_        t!          | j        |           dddd|d|j         ddS |j        |j        k    rNd|_        d|j         d|j         d|_        t!          | j        |           dddd|d |j         d|j         d!dS t!          | j        |           |                                \  }}}}d}|rd"||z    d| d#}dd	|                                 d|d$|j         d|j         | d%| dS )&u  Run the judge and update state. Return a decision dict.

        ``user_initiated`` distinguishes a real user prompt (True) from a
        continuation prompt we fed ourselves (False). Both increment
        ``turns_used`` because both consume model budget.

        ``messages`` is the agent's full conversation list for this session.
        When provided, it's dumped to ``<HERMES_HOME>/goals/<sid>.json`` so
        the Phase-B judge's read_file tool can inspect history. Optional —
        when missing, the judge runs from the snippet only.

        ``agent`` is a back-compat path — when ``messages`` is None we try
        to extract them from common AIAgent attribute names. Most callers
        should pass ``messages`` directly because AIAgent does not store
        the message list as a public instance attribute.

        Decision keys:
          - ``status``: current goal status after update
          - ``should_continue``: bool — caller should fire another turn
          - ``continuation_prompt``: str or None
          - ``verdict``: "done" | "continue" | "skipped" | "inactive" | "decompose"
          - ``reason``: str
          - ``message``: user-visible one-liner to print/send
        NrB   Finactiver  r.   )r   should_continuecontinuation_promptrz  r   rg  rg   Tr   r  	decomposezdecomposed into z itemsu   ⊙ Goal checklist created (z) items). Use /subgoal to view or edit it.u>   goal: decompose failed (%s) — falling back to freeform judgezdecompose failed: r  r   r   r   u   ✓ Goal achieved: r  z(judge model returned unparseable output z turns in a rowrr  u%   ⏸ Goal paused — the judge model (z turns) isn't returning the required JSON verdict. Route the judge to a stricter model in ~/.hermes/config.yaml:
  auxiliary:
    goal_judge:
      provider: openrouter
      model: google/gemini-3-flash-preview
Then /goal resume to continue.zturn budget exhausted (r  r}   u   ⏸ Goal paused — zD turns used. Use /goal resume to keep going, or /goal clear to stop.r  r  u   ↻ Continuing toward goal (r  )r  r   rD   r  rG   rO   rl  rA   rM   r]   r   r1   r4   rH   ro   rI   r   r   next_continuation_promptr   re  _evaluate_state_phase_brK   &DEFAULT_MAX_CONSECUTIVE_PARSE_FAILURESrJ   rE   rr   )r'   rm  r  r  r   r   r  ri  decompose_messagenowrk  rz  r   rj  r  r  r  r   progresss                      r(   evaluate_after_turnzGoalManager.evaluate_after_turn  s   @ =ELH44*/9%,,T#('+%*   	A!Y[[   	;'
33JE3#E " ikk"  EO**%!&v#/%3%(	      &1"$Is5zz$I$I$I!83u:: 8 8 8 " $/5111&'++/+H+H+J+J*#/0   KKXZ]^^^ :S : :E )-(D(D= )E )
 )
% %"
  	1,,1,,,/0E,f!ELdou--- #('+! 999   +/UUU#ELl5;[lll  dou---"#('+% 5E<\ 5 5 5  $ u..#EL"aE<L"a"au"a"a"aEdou---"#('+% N5+; N Neo N N N
 
 
 	$/5)))','='='?'?$'61 	BAw/AA(AAAH##'#@#@#B#B!hu/?hh%/hS[hh`fhh	
 	
 		
r*   r  r   rn  c               l   |                                 sdS |j        rd}g }|rt          |          }n||                     |          }|r8t	          | j        |          }| t                              d| j                   n t                              d| j                   t          |||          \  }}| 	                    ||           |
                                rd|                    d          pd|fS d	|                    d          pd
|fS t          |j        |          \  }	}
}|	|
|fS )aJ  Run the right kind of Phase-B evaluation given current state.

        With a non-empty checklist: harsh per-item evaluation with a bounded
        read_file tool loop.

        With an empty checklist (e.g. decompose failed twice): fall back to
        the legacy freeform judge so the goal still has a way to terminate.
        rq  Nz-goal: conversation dump failed for session %suO   goal: no messages available for session %s — judge will run from snippet only)r}  r   r   zall checklist items terminalrr  zchecklist progress)r0   rM   r[   _extract_agent_messagesr   r   r   r   r  _apply_checklist_updatesrw   r/   r{  rA   )r'   r   rm  r  r   r}  msgsr  rj  rz  r   s              r(   r  z#GoalManager._evaluate_state_phase_b  sw     ""$$ 	MLL? 	Z ,0L)+D ;H~~"33E:: 0$GG'LLG  
 eO  
 $6}<$ $ $ FL ))%888!!## dvzz(33U7UWcccvzz(33K7K\YY )<EJ(V(V%,,r*   r   c                    dD ]?}	 t          | |d          }t          |t                    r|r|c S 0# t          $ r Y <w xY wg S )zBest-effort extraction of the agent's conversation history.

        Tries common attribute names so we don't tightly couple to AIAgent.
        Returns an empty list when nothing is available.
        )r   conversation_history	_messageshistoryN)r4  rZ   r[   r^   )r  attrr  s      r(   r  z#GoalManager._extract_agent_messages  sr     Q 	 	DudD11dD))  d  KKK   	s   )3
A A r  c           	        t          j                     }|                    d          pg D ]}	 t          |d                   }n# t          t          t
          f$ r Y 2w xY w|dk     s|t          | j                  k    rU| j        |         }|j        t          v rq|                    d          }|t          vr||_        ||_
        |                    d          }|r||_        |                    d          pg D ]c}|                    d          pd                                }	|	s.| j                            t          |	t          t           |	                     dd
S )zBApply judge updates with stickiness: never regress terminal items.r   r   r   r   r!   r   r   r.   r  N)r  r/   rC   KeyErrorr   r   ro   rM   r   ru   r   r!   r0   r]   r   r1   r4   )
r   r  r  r   r   ra   
new_statusr!   new_itemr   s
             r(   r  z$GoalManager._apply_checklist_updates  s    ikk::i((.B 	) 	)C#g,''i4   Qww#U_!5!555?3'D{444**J!777$DK #Dwwz**H ) (

;//52 	 	HLL((.B5577D O""'+ 	     		 	s   AAAr    c                Z   | j         r| j         j        dk    rd S | j         j        s%t                              | j         j                  S | j                                         \  }}}}t                              | j         j        ||z   || j                             d                    S )NrB   r_  Frx   )rA   r   rq   rM   )	r  r   rM   CONTINUATION_PROMPT_TEMPLATErd  rA   rr   +CONTINUATION_PROMPT_WITH_CHECKLIST_TEMPLATEr   )r'   r  r  r  r   s        r(   r  z$GoalManager.next_continuation_prompt  s    { 	dk0H<<4{$ 	N/66DK<L6MMM'+{'C'C'E'E$'61:AA!6!k22E2BB	 B 
 
 	
r*   )r   r   r  rC   )r"   r   r   r   )rA   r   rE   r  r"   r@   )r  )r   r   r"   r   )r  rN   r"   r   )r"   r   )r   r   r"   r   )r   r   r"   r   )r  rC   r   r   r"   r   )r  rC   r"   r   )
rm  r   r  rN   r  r   r   r  r"   r#   )
r   r@   rm  r   r  r   r   r  r"   rn  )r  r   r"   r   )r   r@   r  r#   r"   r   )r"   r    )r8   r9   r:   r;   r_   r  propertyr   r  r  r  r   r  r  r  r  r  r  r  r  r  r  r  staticmethodr  r  r  r>   r*   r(   r  r  J  sJ        ( EV A A A A A A    XJ J J JV V V V@ @ @ @$; ; ; ; <@      $     .2         0 0 0 0   "   @
 
 
 
0 0 0 0  $37e
 e
 e
 e
 e
 e
X 375- 5- 5- 5- 5- 5-r    \ " " " \"L
 
 
 
 
 
r*   r  c               &    t          | ||          S )u5   Back-compat wrapper — defers to the freeform judge.rW  )r{  )rA   rm  rC  s      r(   
judge_goalr  .  s     t]GDDDDr*   )r   r@   r  r  r  r_   DEFAULT_MAX_JUDGE_TOOL_CALLSr1   rh   rn   r   ru   r3   r   r   r   r  r{  rl  r  r   r   )r   r   r"   r   )r"   r   )r   r   r"   r   )r   r   r   r@   r"   r   )r   r   r"   r   )r"   r   )r   r   r"   r   )r   r   r   r   r"   r   )r   r   r   rC   r"   r   )rV   r   r"   r   )rV   r   r"   r   )rV   r   r"   r   )rV   r   r"   r   )
r   r   r   rC   r   rC   r  r   r"   r   )r"   r%  )r-  r   r.  r   r"   r   )r-  r   r"   r   )r*  r   r+  r   r   r   rA  r   rB  r    rC  r   r@  rC   r"   rD  )rA   r   rC  r   r"   rX  )rA   r   rm  r   rC  r   r"   rn  )r   r@   rm  r   r}  r   rC  r   r|  rC   r"   r   )r:  r#   r"   r#   )Sr;   
__future__r   rS   loggingosr   r  dataclassesr   r   r   pathlibr   typingr   r	   r
   r   r   	getLoggerr8   r   r_   DEFAULT_JUDGE_TIMEOUTrw  r  r  r  r  r1   rh   rn   	frozensetru   r3   r   r4   r5   r  r  rb  rc  rx  r  r  rv  r   r@   r   r   r<   r   r   r   r   r   r   r   r   r   compileDOTALLr   r   r   r   r   r  r
  r  r$  r,  r;  r>  rV  rl  r{  r  r  r  r  __all__r>   r*   r(   <module>r     sj  * * *X # " " " " "   				 				  0 0 0 0 0 0 0 0 0 0       3 3 3 3 3 3 3 3 3 3 3 3 3 3		8	$	$    !%  *+ &  !  ! #  "NO#DEE i~ OPP  EU% P + ,$H * C   !8 !L: ( ' #
 #
 #
 #
 #
 #
 #
 #
L M  M  M  M  M  M  M  M j        	       <   *
> 
> 
> 
>! ! ! !   $& & & &? ? ? ?   ., , , , "*Y	224 4 4 4.   6   .2U 2U 2U 2UL   %C  &#P   &V8RVVV9  *  /
 
" "%0 %0  % % % %R "G
 #* !)"(0/O% %' &,H	 	 * !	/
 
   #7 #7 # # # # #N "M  $/ !) )2/I& &
 )1)4l(C' '
 )1%K) )' '$ %D$C$C)    D $2 !)"(0/H% %' &,H	 	 ( %#^ k9 9t ;::y=
 =
H HK7 K7 # K K K Kb +#'< < < < < <H   "   B   B 0@ 0@ 0@ 0@ 0@ 0@l +P P P P P Pn +	0) 0) 0) 0) 0) 0)p +6` ` ` ` ` `F*N *N *N *Nd`
 `
 `
 `
 `
 `
 `
 `
P +	E E E E E E  r*   