Free AI Writing Co-Pilot: How Local Models Beat Sudowrite for First Drafts

The moment

It's 11 p.m. You're 600 words into a chapter and you've stalled on a sentence that starts, "She knew, the way you know things in dreams, that…" — and your brain has gone empty. You don't want a Magic 8-Ball to write the next paragraph for you. You want a nudge. Something to read and react to. A tiny mirror that says: "…that the door would be open whether she turned the handle or not."

You pause for a second and a half. The gray italic ghost appears inline. If it's right, Tab. If it's wrong, Esc, or just keep typing — it disappears. That's the entire interaction. It's free. It runs on our hardware. It doesn't read your draft back to anyone.

This is MimicReader's Writing Studio with the AI co-writer toggle on. And the contrast with what Sudowrite charges for the same idea is, frankly, embarrassing for Sudowrite.

The Sudowrite tax

Sudowrite is great marketing wrapped around a fairly thin technical product. The pricing structure as of mid-2026:

$19/month minimum for the "Hobby and Student" plan
225,000 credits/month — sounds enormous until you realize their Story Engine eats them in a few sittings
Heavy users buy bigger plans ($29, $59, $129) to avoid running out
Your prose is shipped to OpenAI for every generation — Sudowrite is a wrapper

That last point gets glossed over in the marketing. Sudowrite doesn't have its own model. It calls OpenAI's API behind the scenes. Your manuscript-in-progress — including the half-written scenes you'd be embarrassed to show your editor — is sent to OpenAI's servers, processed there, and the completion is sent back. OpenAI's enterprise privacy terms say they don't train on API data, which is reassuring if you trust enterprise privacy terms. (They've already changed twice this decade.)

NovelCrafter is the slightly cheaper variant: $7/month for the subscription, then you bring your own OpenAI or Anthropic API key and pay them directly per generation. Same data flow — your prose still leaves the platform you typed it into.

What "local model" actually means

MimicReader's AI co-writer doesn't call OpenAI. It doesn't call Anthropic. It hits a Llama-family model — specifically gemma3:4b — running on our own RTX 3090 in our own server room in Scotland.

When you pause typing, the editor sends the last ~500 words of your draft to 192.168.20.155:11434 (via WireGuard tunnel from the VPS), gets a 10-30 word continuation back, and renders it as gray italic text inline. The full path:

Your browser → MimicReader VPS (HTTPS via Cloudflare)
VPS → our GPU server (encrypted WireGuard tunnel, internal network)
GPU server runs Ollama with gemma3:4b, generates completion
Completion comes back the same path
Gray italic text appears in your editor

No third-party AI provider is involved. OpenAI never sees your sentence. Anthropic never sees your sentence. The only people who could theoretically see your prose are us, and we don't log requests — they hit the model and the request payload is gone the moment the response is generated. (We log that a request happened, for rate limiting. We don't log what was in it.)

        Why this matters: if you write genre fiction, erotica, a sensitive memoir, or anything where the prose itself is the asset, every API call to a third-party LLM is a moment where your draft exists on someone else's hardware. Local model is the only fully sane choice.
    

Honest comparison: when local is great, when it isn't

We're not going to oversell gemma3:4b. It's a 4-billion-parameter open-weights model. It's not Claude 4.7. It's not GPT-5. Here's the honest matrix:

Task	Local gemma3:4b	Big cloud model
Finish this sentence (5-15 words)	Excellent	Excellent
Finish this paragraph (20-40 words)	Good	Excellent
Draft the next paragraph from scratch	Decent	Excellent
Draft a full chapter	Mediocre — use Workshop	Excellent — use Workshop
Maintain voice consistency over 1000 words	Drifts	Drifts less, doesn't disappear
Plot a 50-chapter outline	Don't	Use AI Workshop / Claude

Ghost text is intentionally scoped to the green column. It exists for the moment your brain pauses and you want a finishing fragment, not a chapter draft. When you want chapter drafts — that's a different feature inside MimicReader called AI Workshop, which uses Claude or Gemini (paid, your credits) for high-quality structural drafting. We give you both tools and let you pick the right one for the moment.

If you spend most of your day asking AI to "write the next 500 words," you don't want ghost text — you want Workshop. If you spend most of your day writing yourself and occasionally wanting a finishing nudge, you want ghost text and you'll never touch Workshop. Both ship in every account.

Latency reality: ~400ms

Pause for 1.5 seconds. The request fires. The model responds in about 400 milliseconds. Total time from your last keystroke to gray text appearing: just under 2 seconds. It feels like the editor is reading your mind one breath behind you.

For comparison, cloud APIs typically run 1-3 seconds end-to-end (longer for OpenAI's bigger models). Sudowrite's "Write" command often takes 5-15 seconds because it's chaining several generations server-side. The ghost text in MimicReader is fast specifically because the model is small and lives next door to the request.

Tab to accept. Esc to dismiss. You stay in control.

The interaction is deliberately minimal:

Tab — accept the suggestion, gray text becomes real prose, cursor moves to the end
Esc — dismiss, suggestion vanishes, you keep typing
Just keep typing — suggestion vanishes automatically the moment you press another key

Nothing is ever auto-inserted. The AI never modifies your draft without an explicit Tab. If you don't look at the gray text and keep typing, it disappears as if it was never there. Many writers turn the feature on and forget it exists, then occasionally accept a suggestion when the editor surprises them with a good one. That's the right mode of operation.

We rate-limit ghost text to 60 requests per minute per user with a sliding window. Practically, this means you can pause and trigger a suggestion every second for a full minute before hitting the cap. No one writes that way. The limit exists to stop bots, not writers.

Privacy: the real reason to care

Cloud AI is fine for code, fine for emails, fine for the dull half of your job. It is not fine for the prose you're not sure about yet.

Specifically: erotica writers, memoirists with family members still alive in the manuscript, fiction writers exploring dark themes (true crime, abuse, addiction), professional writers under NDA, lawyers drafting briefs, therapists journaling about clients, anyone working on something embargoed, anyone writing in a language with cultural sensitivities the cloud provider's safety filter doesn't understand — for all of these, every cloud API call is a small risk you're choosing to take.

Local model is the absence of that risk. We're not asking you to trust OpenAI's privacy policy. We're not asking you to trust Anthropic's enterprise terms. We're asking you to trust that the model running on our GPU doesn't have a network path off our GPU. (It doesn't. The Ollama process binds to 192.168.20.155:11434 on a private network. The VPS reaches it through a WireGuard tunnel. There's no outbound from the GPU to the public internet for inference traffic.)

Why we eat the GPU time

Honest answer: it costs us about £0.001 of electricity per ghost text completion at UK power prices. At 60 completions/min/user limit, even a heavy writer doing four hours of sustained writing with the feature pegged would cost us maybe £1 in power — and they're probably also generating audiobooks (which is where our actual revenue lives) on the same account. Ghost text is a feature that makes the platform stickier and barely registers on our cost sheet. So we gave up trying to monetize it and just made it free.

Sudowrite charges $19/month for ghost text because Sudowrite has to pay OpenAI per call. Their margin requires your subscription. We don't pay OpenAI. We pay our electricity bill. The economics are genuinely different, and we'd rather you spent the $19 on a credit pack for actual audiobook generation, which is where the cost lives.

How to enable it

Inside the MimicReader app:

Open Settings
Scroll to Writing Studio
Toggle on AI co-writer (ghost text)
Open any project, start writing — pause for 1.5 seconds when you want a suggestion

You can toggle it off any time. Settings are per-user, persist across devices. The feature is on the same panel as other Writing Studio preferences (font, theme, default chapter pause, etc.).

Try the ghost text — it's just there

Free account, no credit card, the AI co-writer is included from day one. So is voice notes, manuscript editing, cover generation, and 1 hour of audiobook generation per month.

Start Writing Free

What ghost text does not do

To be clear about what we built and what we didn't:

It doesn't proofread — that's a different feature (and honestly, hire a human)
It doesn't continue across paragraph breaks reliably — it's tuned for finishing the current thought
It doesn't know your full manuscript — it sees the last ~500 words for context, not chapter 1
It doesn't maintain character voice across long stretches — that's a much harder AI problem and a 4B-parameter model can't solve it
It doesn't bulk-rewrite — for that, use AI Workshop with a higher-tier model

It does one thing — small inline continuation when you pause — and does it well, fast, free, and private.

Where to go next

Write a novel and generate the audiobook in one place — the broader pitch for MimicReader as a writing platform
Best free AI audiobook generators in 2026 — what's actually free, what's marketing
Voice notes to audiobook: the full 4-step pipeline — capture, draft, package, narrate