Configuration
Every tuning knob, its current default, and why it's set there. These are the levers that shape how aggressively the system extracts, deduplicates, and answers.
The knobs
| Setting | Value | Why |
|---|---|---|
max_entries_per_meeting |
8 | Hard ceiling — kills the "270 from one meeting" bloat. |
reduce_max_candidates |
40 | Above this, reduce hierarchically (batch, then final). |
dedup_similarity_threshold |
0.83 | At or above, a dedup candidate — then the stance judge decides. Lowered from 0.88 so more near-dups merge. |
merge_certain_threshold |
0.93 | Merge without asking (or if the judge errors) — very high similarity is strong duplicate evidence. |
dedup_neighbor_k |
3 | Check the 3 nearest, so a dup of the 2nd or 3rd neighbour is still caught. |
confidence_promote_at |
3 | Corroborated 3 times → promote to high confidence. |
pattern_match_threshold |
0.72 | At or above, send the observation to the stance judge (accumulate) vs mint. Lower than entry dedup on purpose. |
pattern_confirm_threshold |
2 | Seen twice = real; reaches the answer layer. |
pattern_provisional_ttl_days |
7 | Unconfirmed provisional patterns pruned after a week. |
max_active_patterns |
14 | Merge target — keep the served set tight. |
strong_match_threshold |
0.68 | Best-result cosine for "quote directly." Tuned to the embedding model's ~0.70 ceiling. |
retrieve_patterns_k / pattern_relevance_floor |
4 / 0.54 | Serve only the few patterns actually relevant to the question. Below the floor is noise → serve fewer, or zero. |
| rerank weights | 0.6 / 0.15 / 0.15 / 0.1 | relevance / type / confidence / freshness. Relevance leads; the rest are tie-breakers. |
meeting_max_tokens / window_max_tokens |
24000 / 4000 | Within 24k, single-pass whole-meeting. Above, windowed map-reduce. |
| embedding model | nomic-embed-text, 768-dim | Local, so nothing leaves the box to embed. |
| extraction / judge model | deepseek-chat (dev) / Claude Sonnet (prod) | One env switch. |
What each knob actually controls, in plain terms
-
max_entries_per_meeting(8): the most knowledge entries kept from a single meeting. Even a packed two-hour call is capped at 8. This is the guardrail that killed the "one meeting → 270 entries" problem — it forces the system to synthesize the few big ideas instead of hoarding every sentence. -
reduce_max_candidates(40): only matters on the long-transcript fallback path. When a meeting is too big to read in one pass, the system first jots down rough candidate observations, then boils them down. Above 40 candidates, it boils in batches first and then boils the results again (two rounds), so no single call is handed too much at once. -
dedup_similarity_threshold(0.83): the "are these two entries about the same thing?" line. At ≥83% similarity, a new entry is treated as a possible duplicate and the judge decides what to do. Below that, it's assumed genuinely new and stored. Lower this and more things get compared as duplicates. -
merge_certain_threshold(0.93): the "so obviously the same we don't even need to ask" line. At ≥93% similarity, if the judge is unavailable or errors out, the system merges anyway rather than risk a near-identical duplicate. -
dedup_neighbor_k(3): how many of the closest existing entries a newcomer is checked against — not just the single nearest. Catches the case where the real duplicate is the 2nd or 3rd closest. -
confidence_promote_at(3): how many separate times the same position must land before its confidence is auto-bumped to high. Three independent sightings is treated as proof it's reliably held. -
pattern_match_threshold(0.72): the same idea as dedup, but for reasoning patterns. At ≥72% similarity a newly observed move is treated as the same pattern (and the judge confirms). Set lower than the entry bar because short reasoning moves vary more in wording. -
pattern_confirm_threshold(2): how many times a reasoning move must be seen before it graduates from a hidden hypothesis into a confirmed pattern the answer layer can use. Just two. -
pattern_provisional_ttl_days(7): how long a seen-only-once, unconfirmed pattern hangs around before deletion. One week. Stops the pattern list silting up with one-off noise. -
max_active_patterns(14): the ceiling on how many reasoning patterns are served. The periodic cleanup clusters everything down to at most 14 canonical patterns, so the set stays tight and non-redundant. -
strong_match_threshold(0.68): the line between "we already have a direct answer" and "we need to extrapolate." At ≥68% similarity the agent can quote directly; below that, it reasons from patterns instead. -
retrieve_patterns_k(4) andpattern_relevance_floor(0.54): how many reasoning patterns the agent gets per question (at most 4), and the minimum relevance a pattern needs (54%) to be included at all. Together: the agent gets only the few patterns that bear on the question, and zero if none really fit. -
rerank weights (0.6 / 0.15 / 0.15 / 0.1): how much each factor counts when ordering results — relevance 60%, type 15%, confidence 15%, freshness 10%. Relevance dominates, but a more-transferable or more-reliable entry can edge ahead of one that's only slightly more similar.
-
meeting_max_tokens(24000) andwindow_max_tokens(4000): size limits for how much text is fed to the model at once. If a whole transcript fits in ~24k tokens, it's read in one go (best — the model sees full context); if bigger, the system falls back to ~4k-token windows and stitches the results. -
embedding model (nomic-embed-text, 768-dim): turns text into a 768-number vector for similarity comparison. Runs locally, so no text leaves the box just to be embedded.
-
extraction / judge model: the LLM that extracts and judges. A cheaper, fast model while developing; a higher-quality model in production. One setting flips between them.