founder

AI Retail Kiosk Lessons From Our First 100 Stores

A founder retrospective on what we learned from the first 100 stores running Remi - the integrations that broke, the assumptions we got wrong, and what we'd do differently.

By Mike Yadago· March 17, 2027· 8 min read

A retrospective is only useful if it's honest, which is hard to do in public when you're also trying to sell the same product the retrospective is about. I'm going to try anyway. Here's what we learned getting Remi from the first store - mine, in San Diego - to a hundred. The customer behaviors we didn't expect, the assumptions we got wrong, the integrations that broke, and what we'd do differently if we ran the playbook again.

What we got right (and almost didn't)

The decisions that look obviously correct in hindsight were not obviously correct at the time. Two specifically:

Building for indie operators first, not enterprise. Every advisor we talked to in the first year pushed us toward the chain market. Bigger contracts, longer sales cycles, but predictable. We resisted, partly out of stubbornness and partly because I knew indie operators and didn't know enterprise. The retrospective answer is that the indie market taught us things the enterprise market would have hidden. Indie operators tell you immediately when the product doesn't work. Enterprise gives you a quarterly review where polite executives explain that the rollout is "ongoing." We learned faster.

Voice as a first-class interface, not an add-on. Our earliest prototypes were typing-only, because voice was hard to make work in a noisy retail environment. We almost shipped that way. The week we forced ourselves to make voice the default for v1 was the week the product got real. Customers don't type into kiosks. They never did, and they never will. Every typing-only kiosk we've replaced was effectively unused.

Customer behaviors we didn't predict

We had hypotheses about how customers would interact with an in-store AI assistant. Some were right. Some were embarrassingly wrong.

Wrong: customers would treat the kiosk like a search engine. We over-indexed on retrieval. Customers don't ask "show me single malt scotch under $50." They ask "what's a good gift for my dad, he likes whiskey but isn't fancy." The interaction pattern is conversational and contextual, not query-based. Our v1 ranking algorithm was tuned for the wrong query shape and we rebuilt it twice.

Wrong: customers would be embarrassed to talk to a kiosk in front of other shoppers. Some are. Most aren't. The behavior we actually saw was the opposite - customers having longer, more detailed conversations than they'd have with a human staffer, because the kiosk doesn't judge their question. People asked things they were too embarrassed to ask another person. That changed the kinds of questions the system needed to handle gracefully.

Right but underweighted: language fluency matters more than feature parity. Spanish-speaking customers in border-region liquor stores used the kiosk three to four times more than English-speaking customers in the same store, once we shipped a real Spanish mode. The wins from full bilingual support were larger than the wins from any single new feature we shipped that year. We should have prioritized it earlier.

Wrong: kids would mess with it. We expected to need a "kids mode" or a way to lock out non-adult interactions. We barely needed it. Kids are interested for thirty seconds and then go find their parents. The system never became a babysitting toy.

Right and important: regulars hate it for the first three weeks and love it after. The customer who comes in twice a week treats the kiosk as an intrusion at first. By week four, they're using it to check whether you got the new bourbon they asked about. The pattern was reliable enough that we now tell new operators to ignore early regular feedback for a month.

Integrations that broke (and what we did about it)

The promise of "we integrate with your point-of-sale" is easy to make and brutal to deliver. The first dozen integrations we built each had their own surprise.

Legacy POS systems with no real API. Several of the smaller convenience store operators we worked with were on POS systems that required scraping nightly CSV exports. We built a per-vendor adapter layer earlier than we wanted to. It paid for itself, but it added a class of bugs we hadn't budgeted for.

Inventory feeds that lie. The single most common integration bug wasn't a code bug. It was that the operator's inventory feed showed items as in stock that weren't physically on the shelf. The kiosk would recommend something, the customer would walk over, and it wouldn't be there. We added a "low-confidence" threshold for recommendations from low-update-frequency feeds. Operators eventually fixed their underlying data; the buffer kept us from looking broken in the meantime.

Stripe webhook timing. Smaller bug, bigger headache. Some operators saw a customer pay through a kiosk-generated checkout link, then watched the system take 40 seconds to register the payment. The webhook was firing fine; the issue was that we were caching the order status too aggressively. Fixed early, but it taught us to be paranoid about caching anything tied to a payment state.

SMS deliverability post-10DLC. The carrier registration process for SMS got materially harder through 2025 and 2026. We watched several operators' opt-in flows break for weeks until their 10DLC campaign was approved. We now warn operators in onboarding to start the registration process well before they need SMS to work.

Assumptions about operators we got wrong

We were wrong about who our customers actually were on a few dimensions.

Wrong: operators wanted analytics dashboards. Most operators don't want to look at a dashboard. They want one number, or one alert, on their phone. We over-built the dashboard tier in year one and under-built the alerting and weekly digest layer. We've corrected this, but it took longer than it should have.

Wrong: a single-store kiosk would be a meaningful percentage of our customer base long-term. It is, today. But the operators who got the most out of the product were almost always multi-store, because the consistency value compounds across locations. The single-store operators got real value too, but they churned at higher rates because their alternative - a knowledgeable owner who's always on the floor - was already pretty good. The multi-unit story is the durable one.

Right: operators want to own their customer data. We made data portability a contract clause from the start, and it consistently came up in sales conversations. Operators who'd been burned by previous vendors were checking for it specifically.

Wrong: operators would care about model selection. Almost no operator has ever asked which underlying model we use. They care about whether the answer is right and whether it sounds natural. The model choice is our problem, not theirs. We stopped putting it on the marketing site.

Things we'd do differently

If we restarted today with what we know now:

Ship Spanish-language support in v1, not v3. We left meaningful business on the table for almost a year by treating it as a v3 feature.

Make voice the only interface for the first six months. We supported typing and voice in parallel from day one. The typing path got disproportionate engineering effort and almost no real usage. We should have shipped voice-only and added typing as an accessibility option.

Charge less for the first store and more for the second. The customer-acquisition cost is in the first store. The expansion revenue is in stores two through ten. We had this backwards in early pricing and corrected it. The current pricing structure reflects the lesson.

Build the alerting layer before the dashboard layer. Operators want to know when something is wrong, not when something is normal. The dashboard answers the second question. The alert answers the first.

Stop saying "AI." The word does no work. Customers don't care that it's AI. Operators don't care that it's AI. They care whether it answers questions well. We've drifted away from leading with the AI framing and the conversion data on operator sales calls is better for it.

Build for slow internet. A surprising percentage of indie retail locations have unreliable wifi. The kiosk needs to gracefully degrade when the connection is bad. We learned this in stores where the back corner has 1 bar and the kiosk hangs for 12 seconds. Now we cache aggressively at the device.

What I'm proudest of

This is the part that risks sounding self-congratulatory, so I'll keep it short. The thing I'm proudest of from the first hundred stores isn't a feature. It's that we built a product that the operators we serve - the one-to-ten store owners who don't have IT teams and who have been burned by big-vendor pitches - actually trust. The trust is the moat. The features are commoditizing. The trust isn't.

If you're an operator considering a kiosk and you want to talk through any of this directly, the demo is structured to let you ask the questions you actually care about, not the ones we want to pitch. The About page has the longer founding story, including the parts of it that didn't work.

Frequently asked

What was the single biggest lesson from the first 100 stores?

That the product is the conversation, not the kiosk. We could ship the same hardware and the same UI with a worse conversational layer and we'd be a worse company. The model quality and the prompt design are the actual product.

Did any stores remove the kiosk after installing it?

Yes. A small percentage. The reasons varied - one was a layout change where the kiosk no longer had a natural placement, two were ownership changes where the new owner wanted to start fresh. Almost none removed it because it wasn't working. That's the metric we watch most carefully.

What surprised me about my own behavior as a founder?

How often I was wrong about what would matter. I underestimated the importance of language support, integration depth, and alerting. I overestimated the importance of analytics depth, model selection, and feature breadth. Founder intuition is overrated; operator feedback is underrated.

Is the product done?

No. The roadmap for the next year is longer than the roadmap for the last year. The honest answer is that the product is good enough to be useful and not yet good enough to be obvious. We're working on the gap.

What's the next milestone?

Five hundred stores, but the milestone I actually care about is the one where a typical customer recognizes the chain by the kiosk experience, not by the signage. We're not there yet at any chain. Getting there is the real next chapter.