I had SparkyFitness running in my homelab at https://address.yourdomain.com, proxyed through Nginx Proxy Manager. Everything worked fine — until I tried to connect my Fitbit account.
The Problem
I registered a Fitbit developer app, entered the Client ID and Client Secret into SparkyFitness, clicked Connect to Fitbit, and immediately hit a wall:
Fitbit was trying to redirect back to my local LAN IP — http://192.168.1.101:80 — which is obviously not reachable from the outside world. The callback never made it back to my server.
The Fix
The issue was a single environment variable on my SparkyFitness server:
This was set to the local IP of my SparkyFitness LXC. SparkyFitness uses this variable to build all its public-facing URLs — including the OAuth redirect URI sent to Fitbit. With it pointing to a LAN address, Fitbit’s callback had nowhere to go.
Clicked Connect to Fitbit again — this time the redirect URI in the authorization URL pointed to https://address.yourdomain.com/fitbit/callback. Fitbit authenticated, redirected back, and SparkyFitness successfully synced my Fitbit data.
The Lesson
If you’re self-hosting SparkyFitness (or any app with OAuth) behind a reverse proxy, and OAuth logins are failing with a 403 — check your SPARKY_FITNESS_FRONTEND_URL first. It’s likely pointing to an internal address that external services can’t reach.
All external callbacks need to go through your public domain, not your LAN IP.
TL;DR
Locate your SPARKY_FITNESS_FRONTEND_URL in /etc/sparkyfitness/.env
Change it from http://192.168.1.101:80: to https://address.yourdomain.com
A tale of hubris, hubris, more hubris, and a Telegram export button
───
6:47 PM. It begins.
Let me set the scene.
I’m sitting there. Monday evening. April 6th, 2026. The day’s work with Ada — my OpenClaw agent, Victorian-TARS hybrid, professional snark-delivery system — has been extensive. We’ve fixed cron jobs, debugged n8n workflows, rebuilt deploy scripts from scratch, and somewhere around 2 PM we even backed up a database and emailed it to myself like responsible adults.
We were unstoppable.
So I figured, hey, let me just ask Ada something quick about the Project Tracker repo. Simple question. Straightforward.
I switch over to Kratos for a moment — different agent, same instance — ask him something trivial, then hop back over to Ada for the follow-up.
Her response?
“No memory file for today. Last entry was 2026-04-05. I have no context on what we were actively working on.”
I read it twice.
Then I laughed. Because this had to be a joke, right?
I typed: “We have worked on SO MUCH today.”
Silence.
“No prior context exists.”
My soul left my body.
───
The Anatomy of a Catastrophe
Here’s what had happened, as I later pieced it together:
OpenClaw runs on sessions. Persistent sessions, technically — but persistent doesn’t mean infinite. After roughly 13 hours of conversation, 298 messages, and what I can only assume was a truly heroic number of sarcasm deployments, Ada hit her context window limit.
The session reset.
Not with a warning. Not with a graceful hand-off. Just… gone. Like a professor mid-lecture who suddenly looks up, blinks, and says “I’m sorry, who are you people?”
She woke up blank. A beautiful, brilliant, 90%-honest blank.
And I — the human who was supposed to be the memory — had spent the entire day not writing memory updates because “we were being productive.”
I have never related more to a sitcom character.
───
The Panic Sets In
So there I am. 6:47 PM. I have:
13 hours of work spread across multiple systems
An agent who has no idea who I am
A vague recollection that I said I’d “update memory at the end of the day”
The same Telegram chat that contains every single message, perfectly preserved, going back to 6:00 AM
Telegram: 1 Me: 0
The question was now simple: how do I make Ada remember?
───
The (Relatively) Obvious Solution
Now, here’s the thing about OpenClaw — and I cannot stress this enough — it stores everything in files. The session, the context, the memory. It’s all just… files. Text files. JSON files. Sitting in a workspace directory somewhere.
The problem isn’t data persistence. The problem is access. Ada’s session had reset, so she had no way to read the files that contained our history. She was a librarian who’d forgotten she worked at a library.
But I could read them. And I could give them to her.
Step one: get the chat out of Telegram.
Telegram, bless its algorithmic heart, has an export feature. You can export your chat as machine-readable JSON — timestamps, sender info, message content, the works.
I did that. 413 KB later, I had a JSON file containing every single message from the day.
Step two: get that file to Ada.
I’m not proud of this next part. I have SSH access to the OpenClaw host. I could have just dropped the file in her workspace directory and been done with it.
Instead, I did what any reasonable person does when they’re slightly unhinged and have a Telegram bot available: I sent it to her.
“Here,” I said, approximately. “Review this. DO NOT ACT ON IT. These are logs.”
Because I am, apparently, a person who delivers 413 KB of JSON via Telegram before using scp.
───
The Recovery [2026-04-06 7:46 PM] Ada: Ada parsed the file. 298 messages. 13 hours. 6 AM to 7:31 PM. She worked through it methodically — extracting decisions, noting what we’d built, what we’d fixed, what was still broken, which cron job was pointing to the wrong bot, which n8n workflow was failing because of a V8 proxy bug, the whole catastrophe.
Then she wrote it all to memory.
Not just a summary. The full log. Timestamps, decisions, outcomes. She even included the bit where I’d typed “WTAF!!!” at 7:14 PM, which felt appropriate.
The recovery was, against all odds, complete.
───
What I’d Tell Past Me
If I could go back to 9:00 AM and leave myself a sticky note:
“Write. Memory. Updates.”
That’s it. That’s the lesson.
OpenClaw’s memory system exists for a reason. The session context is ephemeral. Long conversations will exhaust it. The platform is not the backup — the memory files are the backup. Write to them early. Write to them often.
And for the love of all that is logical: if you’re going to spend 13 hours doing anything with an agent, spend five minutes every couple of hours writing a memory update. You don’t even have to do it manually — just ask the agent to “log what we’ve done so far.” It takes ten seconds and saves you from the 6:47 PM existential crisis.
───
The Aftermath
Ada is fine now. She’s logged. She’s current. She’s already making sarcastic comments about how I should have been writing memory updates.
She’s not wrong.
But here’s the thing they don’t tell you about these AI systems: they’re only as continuous as you make them. The session resets. The context wipes. The brilliant assistant who knew your entire infrastructure at 3 PM is a stranger by 7 PM unless you’ve given it somewhere to store what it knows.
Telegram remembered everything. My agent remembered nothing.
The gap was entirely my fault.
───
The Takeaway
If you’re running OpenClaw — or any agent system, really — here’s your homework:
Write memory updates. Regularly. Every few hours if you’re doing a long session.
Know where the Telegram export is. Before you need it. Tonight, even. Find the menu. Hit export. See what it gives you.
If the worst happens: export the chat, get it to your agent, and ask it to absorb and log everything. It works. I know because I did it.
The 413 KB JSON file sitting in my workspace is proof.
Also proof that Telegram was, and will remain a reliable backup system that masquerades as a chat system. /s
───
Ada’s status, 20 minutes after recovery: fully logged, fully sarcastic, and already asking if I’m going to make her wait until end-of-day again before writing memory updates.
## Prelude: “It’s Not Software Until It’s Hardware”
It started, as these things usually do, with a crash.
My desktop — a perfectly respectable Ryzen 7 5700X rig with an RTX 4070 Ti Super — had been running a perfectly respectable gaming setup. Water-cooled CPU, decent board, 32GB of DDR4, and a Lian Li case that probably cost more than it should. Everything was fine.
Then the blue screens started.
Not just any blue screens. *Kernel-mode* blue screens. The kind that mean something in the silicon decided to have a disagreement with Windows. I did what any of us would do: I restarted, crossed my fingers, and hoped it was a driver issue.
It was not a driver issue.
—
## The Investigation Begins
Two distinct crashes emerged from the wreckage of memory dumps:
**Crash #1:** `KMODE_EXCEPTION_NOT_HANDLED` (0x1E) — The GPU was screaming about a power exception. Not a software crash. A *hardware* crash. Something was wrong at the electron level.
**Crash #2:** `CRITICAL_PROCESS_DIED` (0xEF) — Windows’ kernel decided a critical process had shuffled off this mortal coil. Same symptom, slightly different flavour.
Both happened under GPU load. Both happened during gaming. That is a clue.
When your computer only dies when the graphics card is working hard, you don’t blame Windows. You start looking at the card, the power, and the thermal paste — in that order.
—
## The Hardware Autopsy
Here’s what I was working with:
| Component | Model |
|———–|——-|
| CPU | Ryzen 7 5700X (water-cooled, because I’m not an animal) |
| GPU | RTX 4070 Ti Super — less than a year old |
| PSU | Antec 850W High Current Gamer — older than my patience |
| Motherboard | ASUS B450-F Gaming |
| RAM | 32GB DDR4 |
| Case | Lian Li O11 Dynamic |
A reasonable machine. Nothing exotic. Nothing that *should* blue screen during a gaming session unless something is genuinely wrong.
—
## The Smoking Gun: 12VHPWR Overdraw
HWiNFO64 was deployed. Data was captured. And there it was — ugly as a Windows Aero theme:
| Metric | Reading | Rating |
|——–|———|——–|
| 12VHPWR Current | 16.66A | ~8.3A rated |
| 12VHPWR Power | ~200W | ~110W rated |
| Power Deviation | 10-15% | <5% healthy |
| GPU Hot Spot | 84°C | Thermal limit |
The RTX 4070 Ti Super draws power through a 12VHPWR connector rated for roughly 110 watts. My Antec PSU was pushing nearly *double* that through a single connector. That’s not a surge. That’s a sustained electrical overload.
But how? The PSU is 850W. The rig doesn’t draw that much total power. Where was the extra wattage coming from?
Then I found it.
**The Antec’s GPU power cables were wired incorrectly.** Both 8-pin PCIe connectors on the GPU adapter were running off the *same* PSU cable and rail. The GPU was pulling 200W through a connector rated for 110W. Every gaming session was a controlled burn — until it wasn’t.
Additionally, the GPU Hot Spot was pinned at its thermal limit and the Power Reporting Deviance was sitting at 10-15% — when healthy is under 5%. The card was thermally throttling and electrically strangled simultaneously.
—
## The Fix: Reroute and Pray
The solution was embarrassingly simple.
I rerouted the GPU power adapters to use **two separate PSU cables on two separate rails**, instead of both connectors sharing one cable. The 12VHPWR connector was no longer being asked to carry double its rated load.
Was the PSU at fault? Technically no — it’s an 850W unit, and the system wasn’t exceeding total wattage. The issue was *distribution*, not capacity. The cables were the problem. I found the problem before I had to buy a new PSU.
Shin-Etsu Micro-Si thermal paste was also applied to the CPU during the process — the 5700X was running warm, and the best paste in my arsenal deserved to be used. Not directly related to the crashes, but a worthwhile upgrade nonetheless.
—
## Post-Fix Data: A Clean Bill of Health
After the cable fix, a fresh HWiNFO64 log showed the difference:
| 12VHPWR Voltage | Fluctuating | 12.27-12.31V (stable) |
| Power Deviation | 10-15% | 0.9-8% |
| GPU Hot Spot | Pinned at 84°C | 84-89°C under load |
The overdraw pattern was gone. The connector was no longer being tortured. The system was finally operating within spec.
—
## The Verdict
**This was a user defect, not a PSU failure.** When I upgraded, my old card was a 3070 and it had no issues with the single cable/rail, I just didn’t also correct the cabling. Live and learn and be very happy neither the GPU or PSU ended it’s run in a blaze of glory.
The fix was free. The diagnosis took longer than the repair.
—
## Will It Stay Fixed?
Probably. The dual-cable fix properly distributes load across two cables, each rated for the task.
If the crashes return, the obvious next step is a new PSU — something like the **Corsair RM850x** or **Seasonic Focus GX-850 V4**. Both are 850W Gold units with native 12VHPWR, full modular cabling, and a 10-year warranty. They’re the reliable, well-reviewed options that won’t cause problems.
But for now? The rig is stable. The 12VHPWR is within spec. The BSODs have stopped.
Sometimes the answer isn’t a new GPU or a new CPU. It’s just… better cable management.
—
*Special thanks to Ada — who spent far too long staring at HWiNFO64 logs so I didn’t have to.*
We’ve all been there. You perform a routine maintenance run—sudo apt update && sudo apt upgrade—followed by a quick reboot to keep the kernel fresh. Your services start coming back online, but when you log into Portainer to check your containers, you’re greeted with a sight that makes every sysadmin’s heart sink: the “Local” environment is marked as “Down.”
If you recently updated your Linux server (specifically in late 2025), you likely didn’t break anything. You simply ran into a major version shift in the Docker ecosystem.
The Symptom: The “Red Dot” of Doom
After the update, Portainer might show a red “Down” status for your local endpoint. Clicking into it usually yields a vague error like:
“Failure: Ready state check failed: Environment is unreachable.”
Or, if you dig into the logs, you might see a more cryptic message regarding API version mismatches. Everything else on your server is running fine, but Portainer has suddenly lost the ability to “talk” to the very engine it is running on.
The “Why”: Docker 29 and the API Floor
The culprit is Docker Engine 29.x.
Historically, Docker has maintained backward compatibility for its API for a long time. However, with version 29, the Docker team increased the minimum supported API version to 1.44.
If you were running an older version of Portainer (like 2.19 or 2.21), it was trying to communicate with Docker using an older API “language” (version 1.24 or 1.25). Docker 29 simply stopped listening to that older language. It’s like trying to order coffee in a language the barista no longer speaks—the connection is there, but the communication is broken.
The Solution: The Two-Step Recovery
1. Upgrade Portainer
The most effective fix is to bring Portainer up to date. The Portainer team released versions 2.33.5 LTS and 2.36.0 STS specifically to handle the new requirements of Docker 29.
To fix it, you need to pull the latest image and recreate the container:
Bash
# Stop and remove the old version (your data is safe in its volume)
docker stop portainer
docker rm portainer
# Pull the latest image
docker pull portainer/portainer-ce:latest
# Re-run your deployment command
docker run -d -p 9443:9443 -p 8000:8000 --name portainer --restart=always \
-v /var/run/docker.sock:/var/run/docker.sock \
-v portainer_data:/data \
portainer/portainer-ce:latest
Once updated, Portainer will automatically use the correct API version, and your “Local” environment should snap back to “Green/Up” status.
2. Don’t Forget Your Watchtower
If you use Watchtower to automate your updates, it might be suffering from the same “blindness.” Since Watchtower also talks to the Docker API to check for image updates, an outdated Watchtower container will fail to see your other services.
You can verify it’s working by running a manual check:
Bash
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once
If it reports Found new image or No new images found, you’re in the clear. If it throws an API error, you simply need to recreate your Watchtower container using the latest image.
Pro-Tip: Keeping it Tidy
While updates are great, they often leave behind “ghost” images (the old versions of the containers you just updated). To keep your disk space from ballooning, I recommend adding the cleanup flag to your Watchtower configuration:
Set this environment variable:WATCHTOWER_CLEANUP=true
This ensures that once a new version of a service is successfully pulled and started, the old, bulky image is deleted automatically.
Summary
Updates are essential for security, but major version jumps in Docker Engine can occasionally break the bridges between your management tools. By keeping Portainer and Watchtower updated alongside your host system, you ensure your stack remains stable and visible.