Startup Ideas Bank
Interesting tech, but unclear on market fit and execution.
AI roast score: 62/100 (C)
The idea
Show HN: Morph Reflexes – Multi-head classifiers for agent traces
Show HN: Morph Reflexes – Multi-head classifiers for agent traces | Hacker News Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Morph Reflexes – Multi-head classifiers for agent traces
20 points by bhaktatejas922 8 hours ago | hide | past | favorite | 2 comments
The most common failures for production agents are behavioral: looping, reasoning leakage, user frustration, and more. Using a frontier model like GPT or Sonnet to judge every turn is too expensive and slow to run at scale. To solve this, we built Reflexes: semantic signals from agent traces, served fast and cheap over API. Built on custom kernels and a custom inference engine forked from vLLM. Under the hood, it is a small LLM architected around multi-head inference. Small models need to be trained for specific tasks, but running 50 separate small models on the same input for 50 tasks makes no sense. How it works:
We use a modern LLM with hybrid attention and remove the decode step. We built an inference engine that lets prefill compute be 99% reused from reflex to reflex, similar in spirit to older 2019-era BERT HYDRA and older multiple-head techniques. we built the inference engine to reuse the KV cache across inputs and compute across all reflexes. One shared backbone reads the trace once, then many heads classify different signals. Our inference engine reuses the same KV cache and compute across all reflexes, giving us sub-30ms inference with less than 0.1% overhead for each additional reflex. We took the same high-level idea and did the hard work to make it work with a modern architecture and attention. On it, we can run inference in under 30ms and serve the full request in under 90ms. If you run 4 reflexes or 100, the extra overhead is less than 2ms. Why does optimizing this matter? If you’re even a medium-sized startup, you’re dealing with tens of thousands of agent runs and millions of turns. If you want to track things like user frustration rates over time, frontier LLM-as-judge does not scale. I built a similar stack at Tesla. When ML engineers needed to sample data across petabytes for signals like `is_camera_obfuscated=true`, along with 200 other things, you need to 1) spin them up quickly 2) run at scale efficiently What it is not:
A dashboard. 99% of dashboards go unused.
100% API first and made for devs who want to use this to trigger their own stuff. vibetrain a custom reflex in our dashboard, and or then let it self improve in production: https: www.morphllm.com dashboard reflex Docs: https: docs.morphllm.com sdk components reflexes index I’d love feedback from people running agents in prod: what sorts of things do you wish you could track over time across 100% of turns b
The roast
Morph Reflexes reads like an impressive technical feat, but fails to address a clear business need. While the concept of multi-head classifiers for agent traces is fascinating, the real-world application seems murky. You’re targeting a niche subset of ML engineers in medium to large enterprises, but the path to adoption and scaling is vague. The tech sounds complex and advanced, but complexity alone won't sell unless it's solving a pain point that's felt viscerally by the target users. The market positioning is not obvious, and you seem to be banking on developers to organically discover and adopt your product, which is a risky bet. It’s also unclear how you’ll compete with established players who already have a foothold in this space.
Red flags
- Unclear market fit
- Solo founder with no funding
- Complexity without clear differentiation
Verdict
You need to focus on validating market demand and simplifying your value proposition to address a clear and present pain point for enterprise developers.
Roast your own startup idea →