August 19, 2025 · Camila Rossi · case study

From 68% to 94%: Improving Cash-Flow Dashboard Accuracy for a Neobank SMB Product

A walkthrough of how one SMB banking product team improved transaction category accuracy from 68% to 94% by inserting a reclassification layer between their Plaid feed and their dashboard.

Starting point: a dashboard that looked good but wasn't trusted

The product team at a growing neobank had built an SMB cash-flow dashboard they were proud of. Clean UI, monthly spending breakdowns by category, a projected balance line for the next 30 days. The engineering work was solid. The design held up well in user testing. They shipped it to their SMB customer base.

Three months in, the analytics told a story they hadn't expected. Dashboard open rates were good the first week — most SMB customers clicked through to see the new feature. But return visits dropped steeply. In user interviews, the pattern was consistent: customers said the categories "didn't make sense" and the cash-flow projections were "off." A few had shared screenshots with their accountants; the accountants had pointed out that the expense breakdown didn't match what the business actually spent money on.

The team ran a focused investigation. They pulled 90 days of transaction data for 20 of their most active SMB accounts and reviewed each transaction against the category their dashboard was showing. The finding: 68% categorization accuracy across the SMB account population. For ACH-heavy accounts — service businesses with client payments coming in via ACH — it was lower, around 59%. That was the number under the dashboard they'd shipped.

Understanding the error distribution

Before designing a solution, the team did a careful breakdown of where the 32% error rate was concentrated. This step mattered because not all miscategorization errors have equal impact on the dashboard features their users cared about most.

The error analysis revealed three dominant patterns, accounting for roughly 80% of all miscategorization:

ACH credits labeled as Transfer (38% of errors): Customer invoice payments, intercompany transfers, and capital injections were all landing in the same Transfer bucket. The forecasting model had no income signal for service businesses that received most revenue via ACH. This was the primary driver of bad cash-flow projections.
B2B vendor purchases labeled as Shopping (27% of errors): Purchases from wholesalers, specialty suppliers, and industrial vendors were landing in consumer shopping categories. The expense breakdown was meaningless for any business that bought materials or supplies.
Payroll ACH debits labeled as generic debit/transfer (15% of errors): The largest recurring expense for most of their SMB customers was being excluded from the recurring expense stack. This made the projected fixed cost baseline too low, which inflated projected runway and made the cash-flow forecast optimistic in a way that could mislead business decisions.

The remaining 20% of errors were distributed across a long tail of miscategorized transactions — professional services fees, insurance, tax payments, equipment purchases — each individually lower impact but collectively adding noise to every dashboard component.

The intervention: inserting a reclassification layer

The team's open banking feed came from a Plaid integration. The reclassification approach they chose was Pattern 1 — synchronous classification in the transaction fetch path — because they hadn't shipped the feature at high volume yet and could redesign the pipeline without a large migration cost.

The integration change was minimal in code terms: in the Plaid webhook handler, after fetching the new transaction batch, they added a POST to the classification API before writing to the database. The classification API returned corrected categories and confidence scores for each transaction. Transactions with confidence scores above 0.80 were written with the corrected category. Transactions below 0.80 — roughly 8% of the volume — were written with a needs_review flag, which the team used to build a lightweight category correction flow for users to fix individually.

The total engineering time from decision to production was 11 days, most of which was testing against their staging environment and validating the confidence score threshold. The pipeline change itself was small. The larger investment was the historical data backfill — running a batch job over 6 months of existing transactions, validating the correction set against the sample analysis, and staging the UI change so customers would see an explanation when their historical category totals changed.

The outcome across 60 days

Sixty days after shipping the reclassified pipeline to production, the team measured categorization accuracy again using the same manual review methodology. The result: 94% accuracy across the SMB account population. For ACH-heavy service business accounts, the rate came in at 91%, reflecting the structural difficulty of ACH context but a significant improvement from 59%.

The dashboard engagement metrics changed materially. Return visit rate to the cash-flow dashboard increased substantially in the 60-day window. More importantly, user interview sentiment changed: "the categories finally match what I actually spend" was a recurring theme. Two accountants who had previously flagged inaccuracies told their clients the dashboard was now useful for tax preparation reference.

The cash-flow projections improved as a downstream consequence of better category data. With ACH client payments correctly identified as revenue in the income stack, and payroll correctly identified as the primary recurring fixed cost, the 30-day projection line tracked actual account behavior much more closely. The team measured this by comparing projected 30-day balance against actual ending balance for the same accounts across a 60-day period: mean absolute percentage error on the projected balance dropped significantly after reclassification, from variance ranges that made the projection essentially unusable for planning to variance ranges where the projection had genuine decision utility.

What the experience validated and what it didn't

The case confirmed what the structural analysis suggested: for SMB banking features, categorization accuracy is a prerequisite for dashboard trust, and the investment in fixing it pays off in the engagement metrics that matter. A 26-percentage-point improvement in categorization accuracy produced measurable changes in user behavior within two months.

What the case didn't validate: that 94% is good enough for all SMB use cases. The 6% residual error rate is mostly concentrated in the hardest cases — novel merchants with no history in the normalization cache, ACH transactions from new counterparties, complex multi-leg transfers. For a cash-flow dashboard, 94% is a significant functional improvement. For a tax export feature where every category label matters to an accountant, the residual error rate still requires either a user review workflow or a higher-confidence threshold with more human-in-the-loop correction.

The team's next step, building on the reclassification foundation, is a user-facing category correction flow that lets SMB customers correct the 6% of low-confidence transactions the system flags. Those corrections feed back into improving the classification model's accuracy for that specific account over time. The reclassification layer is not a static fix — it's a foundation that gets more accurate as account history accumulates and user corrections provide ground-truth labels for edge cases.

Lessons for teams in a similar position

If your SMB banking dashboard is showing weak re-engagement numbers after initial release, categorization accuracy is the first thing to audit. It's often the invisible culprit in feature adoption failure for data-dependent banking features — the UI looks right, the engineering is solid, but the numbers users see don't match their reality. That mismatch erodes trust faster than any UX problem.

Measure before you build. Running a manual categorization accuracy audit on 20-30 accounts before designing a solution saves weeks of fixing the wrong thing. The error distribution will tell you where to focus — and in most SMB account populations, the 80% of impact comes from a small number of transaction types that can be specifically targeted in the reclassification approach.

← All posts Get API Access