I Built a Social Network on $4 of AWS a Month (and This Post Is Running On It)
If you are reading this on serverless.social, you are reading a row in a DynamoDB table.
Not a CMS. Not a database-backed blog engine with an admin panel. An AT Protocol repository record, type social.serverless.longform, that I wrote with a com.atproto.repo.createRecord call. A DynamoDB stream caught the write, woke a Lambda, and that Lambda rendered the static HTML your browser is parsing right now. The same Lambda also posted a link to Bluesky. This post is its own integration test.
That is the whole pitch of serverless.social, so let's actually get into how it works and where it's going. Grab a coffee.
The thing I was actually trying to do
I wanted my own little corner of the social web that I fully own. Not a username on someone else's server. An actual Personal Data Server on the AT Protocol (the protocol Bluesky runs on), under a domain I control, federated into the real network, with first-class long-form posts. And I wanted it to cost roughly nothing when nobody is using it, because most of the time nobody is using it.
"Costs nothing at rest" is the entire reason this is serverless. There is no box sitting there waiting for requests. Almost.
I say almost because there is exactly one box, and it is the best war story in the project, so I'm going to start there instead of with an architecture diagram.
The $4 box, and the AWS bug that made me buy it
Here is the part of AT Protocol federation nobody warns you about. The firehose (com.atproto.sync.subscribeRepos) is a WebSocket that streams binary DAG-CBOR frames. Those frames are secp256k1-signed. A relay verifies the signature byte-for-byte. If a single byte is off, the frame is garbage and your account never shows up on the network.
My first instinct was the obvious serverless one: API Gateway WebSocket API. Lambda on connect, Lambda on message, zero idle cost. I built it. It did not work, and the way it did not work cost me a real chunk of debugging.
API Gateway's WebSocket API silently UTF-8-mangles binary frames. It is not configurable. It is not documented anywhere you would look first. You send relay-correct signed bytes in, corrupted bytes come out the other side, the signature fails, and the relay quietly ignores you forever with no error. I proved it byte-by-byte against a known-good signed frame. It is an AWS platform defect, not a bug in my code, and there is no flag to turn it off.
So the fully-serverless dream has exactly one exception. The firehose now runs on a t4g.nano EC2 instance (ARM, the smallest box AWS rents) with an Elastic IP, Caddy terminating TLS with auto-provisioned Let's Encrypt certs, and a small Node process reverse-proxied on loopback. CloudFront fronts it on the same domain as everything else. Total cost, all-in:
t4g.nano EC2 (on-demand, arm64) ~$3.00 / mo
Elastic IP (attached, running) $0.00
~8 GB gp3 root volume ~$0.64 / mo
Route 53 / DDB reads / SSM negligible
TOTAL ~$4 / mo
No load balancer. No Fargate. No new vendor. One nano box. And it is built multi-instance from day one: CloudFront injects an X-PDS-Instance header, the box reads an SSM registry to map that header to the right DynamoDB table and signing key, so a second or tenth PDS is a config entry, not a second box. The ~$4 is fixed and amortized across everything that will ever run on it.
It is also, honestly, a single point of failure right now. One box down means the firehose is dark until it comes back. Cursor-resume means no events are ever actually lost, but "one nano" is a real caveat and I'm not going to pretend it isn't. That's the build-in-public deal.
The proof it works: did:web:serverless.social is a real, indexed account on the public Bluesky AppView. A fresh post goes PDS to firehose to the bsky.network relay to the public AppView and shows up in about 23 seconds. The one wrinkle is that the relay re-dials a tiny single-account host on its own schedule, roughly every 5 to 13 minutes, so ongoing propagation is near-real-time but not instant. I don't control the relay's reconnect cadence. Sub-minute would mean running my own relay, which is way out of scope for a corner of the web that costs $4.
Everything else really is serverless
Now the part that does fit the serverless dream, because the rest of it genuinely scales to zero.
Hosting. The app is one Expo codebase (Expo SDK 54, React Native 0.81, React 19) exported to a static web bundle and served from S3 behind CloudFront on serverless.social. Web-first on purpose. No App Store review tax. The native targets are wired but parked.
The PDS itself. This is the interesting bit. A real AT Protocol Personal Data Server, but the repo storage is DynamoDB, not the usual SQLite-on-a-disk that reference implementations assume. Writes (com.atproto.repo.*) go through an API Gateway HTTP API into a Lambda, which builds the Merkle Search Tree commit, signs it, and writes it to a RepoTable keyed for fast reads. Reads (getRecord, listRecords, getRepo) are the same Lambda, the other direction. Pay-per-request DynamoDB, Lambda that costs nothing when idle. A social server that bills you for the seconds it is actually thinking.
Auth. No passwords anywhere. Sign-in is a magic link: request a link, SES emails it, you click, a Lambda mints an HS256 JWT signed with an SSM SecureString, 30-day TTL. No password store means no password breach surface and no reset flow to build. There is also now a full AT Protocol OAuth provider on top (PAR, S256 PKCE, DPoP-bound access tokens) so other AT Proto clients can authenticate against this PDS the standards-compliant way. That part shipped and was proven live with a conformant OAuth client driving a real post end to end.
The data layer. DynamoDB throughout. A RepoTable for the actual AT Proto repo, plus small tables for magic-link tokens (15-minute TTL, self-cleaning), OAuth state (TTL-cleaned), users, tenants, and a waitlist. Every table is pay-per-request. Secrets live in SSM SecureString because CloudFormation refuses to manage secret values, which is correct of it.
Deploys. One CDK stack. GitHub Actions on push to main, authenticated to AWS via OIDC, so there are no long-lived AWS keys sitting in a CI secret anywhere. Typecheck, export the web bundle, cdk deploy. The whole thing is one git push.
It is multi-tenant under the hood without disturbing the original account. The apex (did:web:serverless.social) is a byte-identical legacy "tenant zero": its storage keys never changed, and a tenant resolver short-circuits the apex before it ever touches the tenants table. Sub-users get fresh did:plc identities with their own signing keys and their own domain as their handle, stored under a <did>#-prefixed key namespace in the same table. A sub-user crashing and restoring physically cannot touch the apex's bytes. That property was worth more to me than elegance.
How a long-form post actually flows
This is the part I'm proudest of, and it's why this post exists in this format on this domain instead of as a tweet-length thing with a link out.
When I publish long-form, I write a social.serverless.longform record (title, markdown content, slug, an optional teaser) into the AT Proto repo. That is just another createRecord. From there it is pure event-driven choreography, no orchestrator, no queue I babysit:
createRecord (social.serverless.longform)
-> RepoTable write (DynamoDB)
-> DynamoDB Streams event
-> LongformRenderer Lambda
-> render standalone HTML -> S3 /<slug>/index.html
-> render sitemap.xml / rss.xml / atom.xml -> S3
-> CloudFront KeyValueStore (rename -> 301, delete -> 410)
-> POSSE a link post to Bluesky (app.bsky.feed.post)
A few deliberate choices in there.
The rendered article is a fully standalone HTML document. Its own <head>, OpenGraph and Twitter card tags, JSON-LD BlogPosting structured data, the markdown rendered server-side into sanitized HTML. It is not hydrated by the React bundle at all. A crawler or a social-card scraper gets a complete server-rendered page with zero JavaScript dependency. That was non-negotiable for me, because long-form that search engines and link unfurlers can't read is long-form that doesn't exist.
Routing happens at the edge. A CloudFront Function runs on every viewer request. If the path is a known app route or static file it passes through untouched. Otherwise it treats the path as a slug and consults a CloudFront KeyValueStore: a 410 means the article was deleted and you get a real Gone response, a 301 means the slug was renamed and you get redirected, anything else rewrites to the rendered S3 object. The KVS only ever holds the interesting states (renames and tombstones), so it never accumulates stale entries for healthy articles. Permalinks stay clean and stable forever, which matters for the roadmap below.
The renderer Lambda is deliberately capped at one concurrent execution. CloudFront's KeyValueStore uses optimistic concurrency, so two writers racing the same key throw precondition errors. Long-form posts are authored by humans, not emitted by machines at volume, so a concurrency of one is genuinely fine and the failure mode (a single record retries without blocking others) is acceptable. That is the kind of trade-off that only shows up when you actually build the thing instead of drawing it.
And the POSSE step ("Publish on your Own Site, Syndicate Elsewhere") is why, if you follow serverless.social on Bluesky, you saw a link to this post show up natively in your timeline. The canonical post is the AT Proto record on my domain. Bluesky got a syndicated pointer back to it. The article is the source of truth; the network gets a copy that points home.
Where this is going
What's live today: the PDS, federation into the real Bluesky network, magic-link auth, the AT Proto OAuth provider, multi-tenant provisioning, and this long-form pipeline you are currently inside of. What is not done is at least as interesting, so here is the honest roadmap.
ActivityPub, via a bridge format. AT Protocol is one federated social web. ActivityPub (Mastodon and the wider fediverse) is the other big one, and the two don't natively speak to each other. The plan is a bridge: the long-form record format you're reading is being designed as the neutral substrate, so the same authored post can be projected into an ActivityPub-shaped representation and federate to the fediverse the same way the firehose POSSE federates it to Bluesky. One write, the protocol-specific projections fan out from the event stream. This is roadmap, not shipped. I want to be clear about that. But the longform-as-substrate decision was made specifically so this is an additive read-side projection and not a rewrite, the same shape as the Bluesky syndication that already works.
Self-serve sign-up and automated DNS. Sub-users exist and federate today, but provisioning one currently runs through an operator CLI I drive by hand: mint keys, write the PLC genesis op, prove the handle over DNS. The next slice turns that into a real sign-up flow with the wildcard DNS and handle resolution automated, so a person can get their own federated PDS handle without me running a script for them.
Account migration. The interesting end state is moving a real existing identity (a did:plc account that currently lives on bsky.social) onto this PDS without breaking its history or its followers. The did:plc plumbing the multi-user work built is the foundation that migration rides on. serverless.social itself stays did:web forever as the disposable proving ground. It is the place I get to break things.
A long-form importer. Pulling existing articles from dev.to and Hashnode into social.serverless.longform records over the public XRPC API, instance-agnostic, idempotent on re-run. The skeleton exists. It is how I'd eventually move years of my own writing onto something I own outright.
I'm not going to wrap this with a tidy bow, because the project doesn't have one yet. The honest status is: the hard, scary part (real federation into a network I don't control, proven byte-correct, for $4 a month) is done and live. The rest is a sequence of additive projections off an event stream I already trust.
I'd normally drop a repo link here. I can't yet: both this and the expo-cdk-stack template it was generated from are private for now. That's the honest edge of build-in-public. The writeup is open, the source isn't, and I'm not going to pretend otherwise. Everything above is the real shape of it, nothing hand-waved.
Anyway. This sentence is a few hundred bytes in a Merkle tree in DynamoDB, and a Lambda turned it into the page you're looking at. That still makes me grin. Back to building.