Ellipsis started in late 2023 with a simple observation: every engineering team we talked to was struggling with the same set of problems. PRs sat unreviewed for days. Known bugs lingered because nobody had time to write the fix. Documentation was perpetually outdated. Routine tasks — standup reports, Jira updates, release notes — consumed hours of engineering time every week.
These were not hard technical problems. They were high-volume, low-complexity tasks that required codebase context but not deep architectural thinking. The kind of work that AI agents should be doing.
We applied to Y Combinator W24 with a prototype that did one thing: automated code review on GitHub pull requests. We got in. Fourteen weeks later, we had a platform.
The code review starting point
We chose code review as our entry point for three reasons. First, it is read-only — the agent comments on code but does not modify it. This made trust and adoption easier. Second, it is immediately measurable — you can count how many bugs the reviewer catches, how many comments are actionable, how quickly reviews happen. Third, every team needs it and most teams are bottlenecked on it.
The first version was straightforward. On every PR, we fetched the diff, chunked it into context windows, and prompted a language model to review each chunk. We posted the results as inline GitHub comments.
It worked. Sort of. The reviews were noisy — too many false positives, too much stylistic nitpicking, not enough focus on actual bugs. We spent the first two months making the reviews less annoying and more useful.
[Screenshot: Early Ellipsis review comment on a GitHub PR — showing a bug catch with an inline code suggestion and a confidence badge]
The pivot to platform
Three months in, we had a good code reviewer. Teams liked it. But the feature requests told us something interesting: nobody wanted just a code reviewer. They wanted a code reviewer and a bug fixer and a code generator and a Q&A bot. They wanted agents.
We had two options. Build a suite of point solutions — a code review product, a code generation product, a Q&A product — and sell them separately. Or build a platform that could run any agent, and ship code review, code generation, and Q&A as agents on that platform.
We chose the platform. It was the harder path, but it was the right abstraction. An agent is a combination of a trigger (when to run), a capability (what to do), and a set of integrations (where to read and write). If we got the platform right, adding new agent types would be fast.
Architecture decisions
The Ellipsis platform has three layers.
The control plane manages configuration, permissions, and routing. When an event arrives — a GitHub webhook, a Slack message, a cron tick — the control plane determines which agent to invoke, with what context, and with what permissions.
The execution layer runs agents in sandboxed environments. Each agent invocation gets an isolated container with access to a cloned copy of the repository. The agent can read code, run tools, and produce output, but it cannot access other repos, other agents, or the host environment.
The integration layer handles I/O with external services — GitHub, Slack, Linear, Jira, Sentry. The agent produces structured output; the integration layer translates that output into the appropriate API calls.
[Architecture diagram: Three-layer stack — Control Plane (config, routing, permissions) → Execution Layer (sandboxed containers, repo clones, tool access) → Integration Layer (GitHub, Slack, Linear, Jira, Sentry)]
Mistakes we made
We over-invested in prompt engineering early on. The first few months, we treated every quality problem as a prompting problem. In practice, most quality issues were context problems — the model did not have enough information to make a good decision. Fetching the right files, the right git history, the right style guide mattered more than the prompt.
We underestimated the importance of configurability. Teams have strong opinions about what a code reviewer should and should not flag. Our first version had no configuration. Teams either loved the default behavior or hated it, and we could not help the ones who hated it. Adding custom rules and style guides was the single highest-impact feature we shipped.
We built our own orchestration before we needed to. We should have started with simpler workflow primitives — cron triggers, webhook handlers, sequential steps — instead of building a general-purpose orchestration engine. The general-purpose system was more complex and harder to debug. We eventually simplified.
What we would do differently
Start with the configuration layer. If we had built ellipsis.yaml on day one, we would have gotten to product-market fit faster. The configuration file — committed to the repo, version-controlled, reviewable — is the right interface for defining agent behavior. We arrived at this eventually, but we spent months building UI-based configuration that we later deprecated.
Invest in evaluation infrastructure earlier. We did not have systematic evaluation of agent output until month six. Before that, quality assessment was manual and inconsistent. Building automated evals — test cases, regression suites, quality metrics — should have been a week-one priority.
Ellipsis is now used by hundreds of engineering teams. The platform runs six agent types across GitHub, Slack, Linear, Jira, and Sentry. But the core insight has not changed since day one: the work that slows engineering teams down is the work that AI agents should be doing.