ops

How to Run a Retail Technology Pilot Program Without Breaking Your Store

A 30-day retail technology pilot framework - scope, success metrics, exit criteria, and the difference between a pilot and a demo. From an indie operator.

By Mike Yadago· January 13, 2027· 7 min read

The fastest way to wreck a small store's operations is to skip the pilot phase and roll new technology straight to production. The second-fastest way is to run something you call a "pilot" but is actually just a long demo. There's a real distinction between the two, and getting it right is the difference between learning something and wasting a month. Here's the 30-day framework I wish someone had handed me when I was first evaluating tech for my own liquor store.

The pilot vs. demo distinction

A demo is the vendor showing you what the product can do, on their hardware, with their data. A pilot is your store using the product, on your hardware, with your customers, against measurable goals. Demos last an hour. Pilots last weeks. Demos sell. Pilots reveal.

The reason this distinction matters is that vendors will happily call a 30-day demo a "pilot" if you let them. The tell is whether real customers are interacting with the system in real conditions, and whether you've defined what success and failure look like before you start. If neither of those is true, you have a demo with extra meetings.

When operators evaluate Remi or any other retail kiosk, I push hard on this point: don't agree to anything that doesn't have a written success metric and a written exit clause. Both of you will benefit from the discipline.

The 30-day framework

Thirty days is the minimum window to get past the novelty effect. The first week of any new in-store tech, your customers are curious because it's new, your staff is engaged because they were trained yesterday, and you don't yet know what's going to break. By week three, the novelty has worn off and you're seeing baseline behavior. By week four, you've seen at least one weekend rush, one staff scheduling change, and probably one minor vendor outage.

If a vendor wants to compress the pilot to two weeks, they're either confident you won't see the failure modes, or they need a closed deal before quarter-end. Either way, hold the line at 30 days.

Week 1 - Setup and shakedown

Goal: get the system working in the physical space. Not customers. Just the infrastructure. Network, power, mounting, lighting, sound, integration with your point-of-sale or inventory system if applicable.

This is where most pilots already start to fail, and it has nothing to do with the AI or the software. It has to do with the fact that the wifi in the back corner of the store is bad, the kiosk position blocks a fire exit, the speakers echo in a tiled room, or the tablet glare is unreadable when the front door opens at noon.

Catch all of that in week 1, before any customer ever touches it.

Week 2 - Staff usage

Goal: your team uses it daily, in normal workflow. They're the first real users.

If your staff doesn't trust or use the new tool, no customer interaction matters - because your staff will steer customers around it. Watch what they do. Don't ask them. Watch.

I'd specifically time how long it takes a cashier to handle a customer question that the new system was supposed to handle. If the cashier still answers it themselves, the system isn't actually offloading the question. Find out why before you scale.

Week 3 - Customer exposure

Goal: real customers, real conditions, no vendor on-site.

This week is the one that vendors don't love. They want to be standing next to the kiosk smoothing over rough edges. Don't let them. The whole point is to see what happens when nobody is hovering. Take video if you can - phone-camera quality is fine. Watch it back at the end of the week.

Specifically watch for: customers who walk up, stop, and walk away. That's your dropout rate, and it's the single most useful data point in the entire pilot.

Week 4 - Stress test and decision

Goal: handle a peak day, finalize the numbers, decide.

Pick your busiest day of the week (Friday or Saturday for most categories). Run the full pilot at peak. The questions: does the system stay up, does it stay fast, does staff have any new failure mode they didn't have in week 2.

Then sit down with the data and decide. The decision should already be on rails because you wrote success criteria before week 1. If you didn't write them, you'll find a reason to keep the system regardless of how it performed - that's how every "pilot" becomes a permanent rollout.

What success metrics should look like

Pick at most three. More than three and you'll cherry-pick whichever one looks good. The right ones depend on your category, but for indie retail the candidate list is:

Conversion lift on a specific basket type. Example: customers who interacted with the kiosk vs. matched customers who didn't, average basket size, two-week measurement window.
Question-deflection rate. Of the questions customers ask the kiosk, what percentage get a useful answer (defined ahead of time) without staff intervention.
Customer return frequency. For categories where you have any customer identification - loyalty, mobile app, repeat credit card - whether kiosk users come back faster.
Staff time saved. Real measurement, not a survey. Time-and-motion of a typical hour at the counter, before and after.

Pick the one or two that matter most to your business model. For a liquor store, conversion lift on premium spirits is usually the most economically meaningful. For a grocery store, it might be question-deflection during the rush.

Exit criteria - written, before day 1

This is the part most operators skip and then regret. Write down, before the pilot starts, the conditions under which you walk away. Examples:

Customer dropout rate above 50% by week 3.
Any single 4-hour outage during business hours.
Staff reports they actively avoid the system after week 2.
Conversion lift below the cost of the subscription, week 4 measurement.
A safety, accessibility, or compliance issue at any point.

Vendors won't volunteer to help you write these. You write them. You sign them. You give a copy to the vendor on day 1. The fact that they exist will materially change how the vendor behaves during the pilot.

How we structure our own pilots

When operators ask about Remi specifically, the structure looks like the framework above. Week 1 is install and integration, including any point-of-sale sync. Week 2 is staff training and shadowing. Week 3 is unattended customer exposure with us watching session logs remotely. Week 4 is the busiest day plus decision meeting. Pricing during the pilot is on the pricing page - we don't change it for pilots, because if we discount the pilot we're hiding what the real ongoing cost looks like.

The exit criteria we offer up front are the obvious ones: any safety issue, any data integrity issue, sub-target conversion lift, or any 4-hour business-hours outage. If we hit those, the pilot ends and there's no termination fee.

The reason I like this structure is that the operators who say yes after a 30-day pilot stay customers. The ones who would have churned out at month two never get past the pilot, which saves both of us time.

What to avoid

A few specific anti-patterns I've seen kill pilots:

Letting the vendor pick the test store. They'll pick whichever location is easiest to make look good. You pick. Pick a normal store, not your best or worst.
Running multiple pilots simultaneously. You can't isolate which change caused what. One thing at a time.
Skipping the data export. At end of pilot, get a copy of every interaction, every event, every record. Even if you don't continue with the vendor, that data tells you what your customers actually wanted.
Letting "almost ready" linger. If the system isn't working at end of week 1, kill the pilot. Don't extend. The vendor will recover faster from a killed pilot than from a dragged-out one, and so will you.

Frequently asked

Can a pilot be shorter than 30 days?

For mature, plug-and-play tools - new payment terminal, new label printer - a two-week pilot is fine because you're really just confirming compatibility. For anything customer-facing or AI-driven, 30 days is the floor. Customer behavior takes that long to normalize.

Should I pay during the pilot?

Yes, usually. Free pilots produce a different relationship - the vendor doesn't prioritize you, and you don't take the integration seriously. A modestly-discounted pilot fee or a refundable deposit aligns incentives. Watch out for vendors who insist on full annual contracts before a pilot starts; that's a demo with extra steps.

What if the vendor refuses to define exit criteria?

Walk. The fact that they refuse is the answer. Mature vendors know that pilots fail sometimes and they'd rather have an honest cycle than a customer they trapped into a contract.

How many pilots can a single store realistically run per year?

Two, at most. Pilots are operationally expensive even when they go well - your staff is learning, your data is being collected, and your attention is split. Three or more in a year and your team will start treating new tech the way they treat new corporate initiatives: as something to wait out.

What's the most common failure mode?

Scope creep mid-pilot. The vendor adds a feature, you add a use case, suddenly you're testing something you didn't agree to test, and your success criteria don't apply anymore. Lock the scope at day one and document any change in writing. If the vendor needs to add something, that's a sign the original scope was wrong - reset, restart the clock, or kill the pilot.