Databricks Adds GPT-5.5 to Enterprise Agent Workflows

Databricks GPT-5.5 is now available for customer agent workflows after the model became the first to surpass 50% accuracy on OfficeQA Pro, Databricks' benchmark for complex enterprise document tasks. In the agent-harness evaluation, GPT-5.5 also reduced error rates by 46% against GPT-5.4, marking a meaningful step forward for production-grade AI agents.

How GPT-5.5 Cleared the 50% Barrier on OfficeQA Pro

OfficeQA Pro is built around the conditions that consistently break enterprise agents in live deployments: extracting data from scanned PDFs, working across legacy file formats, and completing grounded reasoning inside long-form business documents. A single parsing mistake early in a workflow can cascade through every step that follows, corrupting outputs in ways that are difficult to trace.

Databricks saw the biggest gains from GPT-5.5 in exactly those parsing-heavy conditions. According to Research Engineer Arnav Singhvi, GPT-5.5 delivered "a step-function lift in parsing older documents and scanned PDFs" compared to prior model generations, which would sometimes fail to extract individual digits accurately enough to sustain a complete workflow.

Orchestration also improved. Singhvi noted that GPT-5.4 would sometimes take unnecessary retrieval detours during multi-step tasks, generating inefficient agent paths. GPT-5.5 handled those sequences more directly, completing complex workflows with fewer redundant steps and less need for external supervision.

What AI Unity Gateway Access Means for Enterprise Teams

Databricks is making GPT-5.5 available through AI Unity Gateway, where it operates as the supervising model inside workflows built with AgentBricks and the Agent Supervisor API. In that configuration, GPT-5.5 manages coordination across specialized sub-agents, handling document extraction, context retrieval, and task execution within each pipeline.

For engineering and data teams building agent systems over internal document repositories — compliance files, financial records, or legacy data stores — the practical benefit is a model that processes messier inputs without failing mid-workflow. Singhvi described the overall shift as a meaningful lift in knowledge work capability, not a marginal model update.

Databricks expects significant customer uptake across AgentBricks and Agent Supervisor API deployments, with GPT-5.5 handling the orchestration layer that ties multi-agent pipelines together.

Why This Benchmark Result Matters Beyond the Numbers

Enterprise buyers have consistently found that benchmark gains do not always survive contact with real deployments. Databricks' announcement is notable because it connects a verified accuracy improvement on a domain-specific evaluation directly to production availability through existing infrastructure.

Teams currently running agent workflows on Databricks can access GPT-5.5 through AI Unity Gateway without rebuilding their pipelines. That makes the upgrade an API-level decision rather than an infrastructure project, which significantly lowers the adoption barrier for teams mid-deployment.

[Analysis] The OfficeQA Pro result gives enterprise procurement and engineering teams a concrete, workflow-specific accuracy signal at a time when most AI evaluations rely on general-purpose benchmarks. For document-centric automation, supervised agent pipelines, and multi-step retrieval tasks, GPT-5.5's performance on OfficeQA Pro offers a more relevant reference point than broad leaderboard rankings alone.

Source: OpenAI