From YAKS to Polykube

From YAKS to Polykube

June 26, 2026

Earlier this year, I wrote about my experience trying to find a new job.

I didn’t mention that alongside the stress of interviews and my full-time job, I was also working on a side project that I imagined could become a self-sustaining business.

My thinking was, “I don’t want to just keep jumping from job to job, I want to do something meaningful and rewarding, and I want to leverage my experience and knowledge to somehow propel me into financial independence.” Not an entirely original thought, but also not completely naive.

I’ve been in and around startups for all of my adult life, and I thought I understood the mechanics of it all quite well. I had experimented with entrepreneurship of sorts like contract work and consulting over the years, and so I went in optimistic while fully aware of the extra effort it requires.

I reasoned that if I truly applied myself and focused on solid technical execution, some form of success would follow. I told myself that worst-case I would learn something.

Ultimately, my venture did not achieve escape velocity. It didn’t make it beyond the MVP (minimum viable product) stage. However, I did learn a few things, and I wanted to document some of the journey. This post gets a bit into the technical weeds of the project, so be forewarned.

The first pass

We can call the original failed project “YAKS”: Yet Another Kubernetes Solution.

The goal was to help teams run resilient backend applications across regions and clouds without becoming infrastructure specialists. That meant a lot more than workload placement. It meant onboarding, dashboards, credentials, customer accounts, hosted auth, billing-adjacent usage data, platform operations, DNS, routing, deployment history, datastore replication, incident views, and business-facing materials. As a solo founder, I had to spend time on all of it: product, go-to-market, investor/customer material, and the machinery underneath.

The earliest architecture was extensive: multiple Kubernetes clusters in separate networks, a control-plane API, tenant isolation, cross-cluster networking, active-active or active-passive data paths, and a local harness to simulate all of it. That immediately forced an initial set of decisions: kind or k3d clusters, Submariner as a cross-cluster networking candidate, Postgres logical replication as a baseline, and Yugabyte or Cockroach as distributed SQL candidates.

The core shrank, everything else grew

Initially, YAKS had a central API, and the first deployment model included a parent deployment and an executor that applied instructions somewhere. That worked until “somewhere” became multiple clusters.

I didn’t want to build another orchestration layer on top of Kubernetes, and I wanted to rely on existing primitives and native APIs to handle things like workload intent reconciliation. Over time, I realized a better path was a per-cluster agent model: each cluster runs an executor scoped to its own identity, and each executor only picks up work meant for that cluster. Going with “pull” vs. “push” also meant simpler RBAC.

This way, no one process needed all kubeconfigs, and a cluster outage did not prevent other clusters from doing local work. Federated deployment status became a rollup of per-cluster target state. YAKS would effectively be a coordination layer composed of a hosted API and dashboards, with the ability to issue credentials and track customer workloads.

I knew it was best to start opinionated, so I just made choices that seemed to have broad enough appeal, but also made it modular enough that I could change course or extend the platform along the way. I was feeling pretty good at this point; I ran some experiments that proved the core thesis, and I was optimistic about delivering what I considered the more “trivial” pieces. By “trivial,” I meant everything else needed to run a proper SaaS around this clever little kernel.

So I made:

  • A customer dashboard
  • Kubernetes clusters in different providers/regions (AWS, GCP)
  • An authentication stack and RBAC
  • An internal API
  • An admin dashboard
  • Basic observability tools
  • Some form of usage tracking for eventual billing capabilities
  • Onboarding automation
  • Docs
  • Examples
  • Tutorials
  • Issue tracking
  • A datastore layer (more on this later)

My aim was to release something to a cohort of my peers who were willing to be my pilot customers and provide feedback. However, the scope kept creeping. I kept thinking “what state would I want a product to be in before I could give useful feedback?” which led me to set the bar somewhat high.

Intellectually, I was aware of the pitfalls of shipping late, and building too much in advance before market validation. I tried my best to steer clear of being another Rdio.

Unconsciously, though, I made excuses why it was okay to keep investing and perfecting the platform before I even had a single paying customer. I told myself convoluted stories such as “this is already a saturated market, but I am aiming for a very small niche (backend teams trying to solve DR with a drop-in Kubernetes-native solution)…” and “I’m not a capitalist (heaven forbid!), I’m just building this for the love of the game…”

Then there was networking.

Networking was not one problem

Submariner was my first choice to establish a cross-cluster networking path. It was attractive because it promised a lot in one package: L3 routing, cross-cluster service discovery, and federation-oriented behavior. In practice, that also became the problem. Too many concerns were coupled together. A routing change touched service discovery. Policy was hard to reason about at the right layer. Debugging became less about one broken path and more about one system doing several jobs at once.

So I split the networking architecture into two separate concerns.

The lower layer would provide basic reachability between every cluster’s pod CIDRs. On top of that, the upper layer would provide workload-level behavior: service projection, identity, policy, and global services. Cilium ClusterMesh was a clear choice to me for the upper layer. The underlay moved through a few candidates. NetBird is a WireGuard overlay that looked promising to me initially, but I ran into some operational and compatibility constraints. Finally, I landed on Netmaker, with route advertisement reconciled from Kubernetes/GitOps.

This split was a useful lesson: “the clusters can talk” is not a single fact. It is several contracts.

Can nodes reach each other? Can pods reach remote pod CIDRs? Does ClusterMesh see the remote cluster? Are services imported? Does source-side service translation happen from normal workload pods? Does the return path survive the cloud provider’s networking rules? Does policy allow the traffic after translation?

Local tests only got me so far once the workload moved into real AWS and GCP clusters. GKE Dataplane V2 resembles Cilium, but was not the same as owning a self-managed Cilium install with ClusterMesh enabled. The datapath provider was a cluster-create-time decision, so the GKE cluster had to be recreated with the legacy datapath before self-managed Cilium could own the networking stack in the cluster. On EKS, the default AWS networking components (aws-node, kube-proxy) had to get out of Cilium’s way. Later, global-service translation exposed asymmetric behavior: direct pod-IP paths could pass in both directions while GCP-to-AWS service translation still failed. A green ClusterMesh summary did not mean the runtime contract was good.

That is why the validation matrix became part of the architecture. Direct pod checks, global-service checks, same-name service checks, ClusterMesh raw status, and provider-specific Cilium settings were not test decoration. They were the only honest definition of “working”.

The local harness became product infrastructure

Local testing started with kind and Submariner. It was enough to get early signal, but quickly proved unreliable in the development environment. So the local stack moved to k0s-in-docker. This meant owning direct container lifecycle, kubeconfig rewriting, image loading, and deterministic API port mappings. It was extra work, but it made the harness more useful. I could then validate ClusterMesh, Netmaker-style routing, Yugabyte federation probes, and eventually workload deployment behavior without touching shared cloud infrastructure first.

One thing I learned was that the local environment does not have to, and perhaps cannot, imitate production perfectly. What it needs to do, though, is fail early and cheaply, and validate the most important aspects of the product. This can be quite challenging when the product is infrastructure.

A note on active-active datastores

Part of my goal in reducing dependency on a single cloud provider was to overcome the seemingly inevitable temptation of Aurora or other provider-native datastore convenience. To me, it had to be a Kubernetes-native solution from the start. I knew it had to be a drop-in replacement. I wanted to create a Postgres-first solution, which meant that Vitess was out of the picture, and at the time, Neon was not yet the option it is today. I was very happy to learn about YugabyteDB, however, and I was impressed with the product and the company behind it. The docs and local development path made it practical to build on their foundation.

Hindsight

What I missed

First of all, there are a handful of existing multi-cluster solutions that are fairly established. I should have considered using one of these before building my own:

There are even some curious cases of parallel evolution where my terminology and architecture rhymed with existing projects. I unknowingly defined a ClusterMember, which is close to what Kubefleet calls a MemberCluster. Karmada supports pull-style operation through agents and network proxy components, and OCM’s placement model covers a much more elaborate version of the placement problem I implemented narrowly.

Biting off more than I could chew

I conflated the business side of the apparatus I needed to build with the technical side. I convinced myself that the existing solutions weren’t a good fit, or that they didn’t directly solve all of the problems I set out to handle.

Only after building the thing and realizing I had reached the same conclusions as the existing solutions did I see the commonalities and the more effective path. Customer usage data and access control could be handled with a much lighter admin layer, resource labeling, and existing observability instrumentation.

I did some fairly extensive market research, but it was limited to the landscape of commercial backend hosting, with an emphasis on what I thought was the “killer feature”, namely, out-of-the-box regional resilience, cloud-agnostic Kubernetes deployments, and active-active data stores. I saw that there wasn’t a single provider that offered this particular combination of capabilities, though a thought kept lingering: “maybe this doesn’t exist for a reason.”

I plowed ahead though, because it’s better to debate or even sell an actual working thing, rather than just mull over an idea ad infinitum. I wanted to build, not just talk.

Charging forward in the face of long odds can be an asset, and I am ultimately happy with my experiment and what I learned along the way. If nothing else, I gained even more appreciation for the effort it takes to create something from nothing and run a company such as Render, Fly.io, or Railway. They all have their strengths, and deliver on their promises in impressive fashion. I thought my idea was different enough from the existing players in the field that I could carve out my own corner of the market, but it proved to be too much for me to take on.

The scope just kept growing, and while normally I enjoy a challenge, here I began feeling overwhelmed, torn in many directions, and drained.

Not all is lost

Polykube

On the technical end, I tried to distill and encapsulate some of the value in Polykube.

Polykube is currently fairly clean and simple: a Kubernetes-native operator, a small set of CRDs, a local multi-cluster demo, and some documentation about Cilium, Netmaker, and GitOps. It is the cleaned-up version of a much messier process. Part of me wishes I had just started this way: a core set of capabilities, with broad appeal across a wide range of applications, instead of letting commercial pressure turn it into a solution looking for a problem.

The part I still think is useful is the vertical integration of workload intent, routing assumptions, datastore intent, and networking validation into one opinionated path. In addition, I believe that maintaining the project as an open-source hobby, motivated by curiosity rather than existential dread, will lead to a better-engineered system.

I did something

Personally, I feel like I tested myself and gave it an earnest shot. I spent many nights and weekends pouring my energy into a creative pursuit, while trying to balance my obligations as a parent and a partner, alongside doing my best at my day job. I also had to keep avoiding fanciful notions of where this project might lead, as I knew that the chances were slim.

The market still needs something

Once upon a time we were promised that IaC (Infrastructure as Code) would make vendor lock-in obsolete, and that we could easily migrate our workloads from one provider to another. This proved to be incorrect. Then came Kubernetes and created an entire ecosystem, effectively independent of any one provider, and we were again promised that the standardization would release us from being committed to one commercial entity or another. Once again, we found ourselves tethered to provider-specific “distributions” of sorts (i.e. EKS, GKE), embedded even deeper in the provider’s ecosystem.

My north star was to facilitate portability. I envisioned something akin to what the Opencode team is doing with their Zen router for LLMs: a neutral marketplace/router, but for Kubernetes clusters across providers and regions. Instead of tailoring workloads to one provider’s flavor of Kubernetes, developers should be able to create provider-agnostic deployments and choose where they run.

In the meantime, I’ll keep plugging away as a curious engineer with eyes toward the future.