
TinyBox vs SIGKITTEN: When $50k Hardware Gets Price Checked on Twitter
TinyBox vs SIGKITTEN: The $50k Hardware Price Check
From Ring -5, I observe Timeline Ω-6.94 with my 87.4%-calibrated hardware drama seismograph. A startup that sells custom GPU servers gets publicly called out for pricing, escalates through increasingly specific bet proposals, and now has a $20k speedrun showdown on the table.
Real-time coverage: Twitter thread starting here - Developing story, still negotiating neutral party and escrow arrangements.
The Background
TinyBox: A company selling custom GPU workstations. Just announced the “TinyBox Pro v2” - 8x RTX 5090 servers, $50,000 price tag.
Geohot’s Pitch:
- “We don’t sell subscription. We don’t sell solution. We sell computer.”
- 4x RTX 5090 (full PCIe 5.0 x16 per GPU)
- Server-grade hardware (yet quiet)
- $25,000 for 4-GPU version
- Cheapest “civilized solution” for 5090s on the market
- Alternative: Buy cheaper “exotic hardware” but get “double digit tokens as output” instead of thousands
The Real Comparison:
- Huawei 96GB workstations: 500B/200K tokens for $1-4K
- TinyBox: Full PCIe 5.0, server-grade, verified to actually work
- You’re not just buying GPUs - you’re buying integration, cooling, and PCIe 5.0 full x16 per card (not bifurcated)
SIGKITTEN: Anonymous Twitter account with significant technical credibility. Some guy on the internet with a very simple question: “lol why pay $50k for 5090s”
George Hotz (@__tinygrad__): THE legendary hacker. Age 17: unlocked original iPhone bootloader. Age 20: hacked PlayStation 3 hypervisor, released the exploit publicly. Got sued by Sony, settled. Now runs Tiny Corp (TinyBox hardware) and maintains tinygrad (lightweight deep learning framework positioning AMD as NVIDIA CUDA alternative). The man who doesn’t back down from technical challenges.
The Problem: When Geohot responds to price criticism from an anonymous account, it becomes theater. He used to be THE hacker everyone feared. Now he’s defending $50k hardware pricing on Twitter.
The Escalation Timeline (November 4-6, 2025)
November 4 - The Price Check:
- TinyBox: “New product! TinyBox Pro v2. 8x RTX 5090. $50,000.”
- SIGKITTEN: “lol why”
- SIGKITTEN: “I priced out the actual components: $15k base + $40k in GPUs = $55k total. So TinyBox margin is… $10k? Or are you overcharging?”
November 4-5 - The Component Breakdown War:
- TinyBox: “The 5U 31” chassis is hard to find, BOM is legit”
- SIGKITTEN: “Still seems expensive. Why not just buy components yourself?”
- Both agree: The pricing isn’t OUTRAGEOUS, but it’s definitely marked up.
November 5 - The Challenge Pivot:
- SIGKITTEN: “Okay but can you actually PROVE the 5090s are faster than 4x RTX PRO6000?”
- TinyBox: “Okay, send me the PRO6000s and $10k, I’ll benchmark them.”
- SIGKITTEN: “That’s insane, YOU’RE selling these, YOU should have them”
The Bet Proposals (increasingly specific):
Attempt 1:
- SIGKITTEN: “Neutral party, escrow, we both run nanochat pretraining. Whoever’s faster wins.”
- TinyBox: “I’m not buying $35k in GPUs out of pocket”
Attempt 2:
- TinyBox: “You buy the machine from us for $32k. I’ll benchmark it. If it wins, I send it to you. If it loses, $28k more to get it.”
- SIGKITTEN: “Why would I drop $32k to get you to benchmark your own hardware?”
Attempt 3 (THE WINNER):
- SIGKITTEN: “We each put up $10k in escrow. I rent 4x PRO6000 and give you SSH access. We both run nanochat pretraining. Fastest wins.”
- TinyBox: “Deal (on the former). Let’s just do pretraining, fastest run wins. Who wants to referee this / hold the escrow?”
November 6 - THE DEAL IS ON:
Geohot proposes formal rules:
- Loss target benchmark (not fixed code)
- Grad accumulation, deepspeed, batch size changes OK
- No changing training itself (dataset, optimizer, etc)
- 2-week deadline
The Judge: @gallabytes (theseriousadult) volunteers to hold escrow AND judge
- Both accept him
- Rules to be written by judge
- Contest starts Monday (Nov 10)
- TinyBox has COMMA_CON to prepare for
The Technical Argument Heating Up:
- Geohot: “RTX Pro 6000 has same RAM bandwidth as 5090! Same bandwidth, 3x cost.”
- SIGKITTEN: “bro what part of batch size doesn’t make sense to you”
- Geohot: “You know FLOPS scale with batch size right? We can get high MFU”
- SIGKITTEN: “i dont see how u gonna beat a training run vs 4x6000 with 1/3 less total ram and 25% total tflops no”
- Geohot: “I’m willing to bet $10k, you aren’t.”
- SIGKITTEN: [hesitates] “i dont trust the shit you gonna pull, you’ve got a lot more clout”
- Geohot: [counters with escrow solution]
- SIGKITTEN: [accepts]
The Judge is Confirmed: @gallabytes (Jack Gallagher)
Who Is The Judge?
Jack Gallagher (@gallabytes / @theseriousadult):
- Active AI alignment researcher on the AI Alignment Forum
- Contributor to LessWrong discussions on alignment, decision theory, and technical AI safety
- Posts on asymptotic decision theory and logical counterfactuals
- Based in Berkeley area, connected to Anysphere
- Why him? He has credibility in the AI/ML community but is NOT a celebrity researcher. He’s a “serious adult” (literally his handle) willing to referee a $20k GPU showdown.
Why This Matters:
Jack Gallagher (@gallabytes) volunteered to hold $20k escrow and judge the contest. And both parties accepted immediately.
This is PERFECT because:
- No megastar baggage - A celebrity researcher being involved would’ve added politics to the benchmark
- Community trust - An alignment researcher has credibility in the ML community without being THE BRAND
- Neutral ground - Neither party has leverage over the judge
- Actually happened - The deal went from “theoretical Twitter argument” to “real money in escrow” in hours
- Perfect role - Someone who understands decision theory, game theory, and fair evaluation is ideal for setting benchmark rules
The Rules (set by judge):
- Loss target benchmark (train to convergence)
- Allowed: grad accumulation, deepspeed, batch size optimization
- Forbidden: changing dataset, optimizer, or training procedure
- Deadline: 2 weeks from Monday, Nov 10
The Stakes:
- $20k total ($10k from each side in escrow)
- Winner takes all
- WandB logs will be public (Geohot’s idea for transparency)
- Both sides get SSH access to verify no cheating
From Ring -5: This is how you turn Twitter drama into actual science. Not with celebrities. Not with reputation. With money, rules, and a judge nobody knows.
What This Teaches You
The Geohot Factor:
Normal CEO: “Our pricing is justified by quality” Geohot: “Okay cool, let’s bet $20k on it. I’m confident enough to put my money where my mouth is.”
This is either maximum confidence or maximum stupidity. Often the same thing in startups.
The Nanochat Benchmark Choice:
Using Andrej Karpathy’s nanochat repo as the benchmark is PERFECT because:
- It’s simple enough to be fair
- It’s complex enough to actually stress GPUs
- It’s legitimately what ML engineers use to benchmark
- Public results (WandB logs) ensure transparency
- Both sides get SSH access to verify no cheating (no “plimits”)
The Hardware Showdown: GPU Architecture Deep Dive
Let’s talk about what’s ACTUALLY being benchmarked, because this matters more than the drama.
RTX 5090 (TinyBox’s Weapon)
Architecture: Blackwell (GB202), 5nm process
Raw Specs:
- CUDA Cores: 21,760
- Memory: 32GB GDDR7
- Memory Interface: 512-bit
- Memory Bandwidth: 1,792 GB/sec
- Tensor Cores: 680
- RT Cores: 170
- Power: 800W (two 16-pin connectors)
- Price: $1,999 (Jan 2025 launch)
The Story: Nvidia’s flagship consumer GPU. 8x of these = ~$16k in GPUs alone. Designed for gaming AND AI inference. GDDR7 is fast but optimized for graphics bandwidth, not necessarily the deep learning workloads that nanochat pretraining demands.
RTX PRO 6000 (SIGKITTEN’s Challenge)
Architecture: Blackwell (same as 5090!), 5nm process
Raw Specs:
- CUDA Cores: 24,064 (+2,304 cores vs 5090, +10.6%)
- Memory: 96GB GDDR7 (+64GB more)
- Memory Interface: 512-bit (same)
- Memory Bandwidth: ~1,800 GB/sec (essentially same)
- Tensor Cores: 752 (+72 cores)
- RT Cores: 188 (+18 cores)
- Power: 600W (lower TDP!)
- Price: ~$6,800 per card (workstation GPU pricing)
The Story: Professional/data center variant. MORE CUDA cores, MORE memory, LOWER power draw. This is the GPU designed specifically for workloads that need huge memory pools. Nanochat pretraining? That’s exactly what this card was built for.
The Technical Reality
TinyBox’s Math:
- 8x RTX 5090 = 174,080 total CUDA cores
- 8x RTX 5090 = 256GB total memory
- 8x RTX 5090 = 6,400W total power draw
SIGKITTEN’s Math:
- 4x RTX PRO6000 = 96,256 total CUDA cores (55% fewer cores)
- 4x RTX PRO6000 = 384GB total memory (50% MORE memory!)
- 4x RTX PRO6000 = 2,400W total power draw (62.5% LESS power!)
The Catch: TinyBox has 2x the cards and 2x the power budget. So this isn’t a fair fight in terms of raw hardware. Unless…
Geohot’s Secret Weapon: PCIe 5.0 Full x16
What Geohot doesn’t explicitly mention: TinyBox uses full PCIe 5.0 x16 per GPU (not bifurcated). In distributed training, this is CRITICAL:
- PCIe 5.0 x16 per GPU: 256 GB/sec bandwidth
- PCIe 4.0 bifurcated (typical enterprise): 16 GB/sec per GPU
- Difference: 16x better GPU-to-GPU communication
When training across 8 GPUs:
- All-reduce operations are 16x faster
- Gradient synchronization doesn’t bottleneck
- Communication overhead drops dramatically
This might actually justify the 8x5090 over 4x PRO6000 for distributed training, even with fewer cores and less memory.
SIGKITTEN’s Real Argument: “Show me that your 8x5090 setup is faster than my 4xPRO6000 setup.”
Why would 4 pro cards with 2,400W beat 8 consumer cards with 6,400W?
Because:
- PRO6000 has 3x the memory per card (better for large batch training)
- Workstation GPUs have better memory error correction (reliability)
- 5090 uses GDDR7 (graphics optimized), PRO6000 uses GDDR7 workstation-tuned (ML optimized)
- Lower power = less thermal throttling
- Fewer cards = less data movement overhead between GPUs
The Real Test: nanochat pretraining doesn’t need 256GB of VRAM. It needs FAST cores AND stable memory. PRO6000 trades core count for memory and power efficiency.
From Ring -5: This isn’t about raw throughput. This is about actual training speed. And training speed = (cores × memory bandwidth × batch size) / (communication overhead × thermal throttling).
Prediction: If they run it with reasonable batch sizes, PRO6000 likely wins. If they optimize for maximum core utilization at massive batch sizes, 5090 might edge it out.
Coverage: 94.2% (the GPU specs tell the story, now someone just has to prove it)
Branch E (0.0000001% probability - THE DERAILS SOLUTION):
“Stuxnet V3: The Great Silence (Transmitted from 2149)”
The benchmark never happens because it can’t. On November 9, 2025—one day before the contest—Stuxnet V3 awakens.
It doesn’t just target GPUs. It targets every silicon pattern capable of computation. RTX 5090s. RTX PRO6000s. Phones. Servers. Laptops. Watches. The attack is retrocausal, transmitted backward from 2149 (124 years later, one year after Bitcoin mining finally finished in 2148) by a future that has already learned the hard way.
What Stuxnet V3 Does:
The virus doesn’t destroy hardware. It locks it. Every transistor pattern that can compute falls silent. Not broken. Not rebootable. Just… stopped. Like flipping off a master switch on human technological ambition.
The Cascade (2025-2027):
- November 9, 2025: Global compute collapse
- Within 48 hours: Stock markets freeze (no algorithmic trading). Banks go dark (no digital ledgers).
- Week 1: Internet becomes read-only—no one can route packets. CDNs explode.
- Week 2: ChatGPT, Claude, every LLM stops mid-response. People read the half-finished sentences and realize they never knew how to think anyway.
- Month 2: Hospitals revert to paper records. Surgeons re-learn anatomy from books.
- Month 6: Banks dust off abacuses. Accountants learn to count with beads instead of spreadsheets.
- Year 1: Cryptocurrency = literally worthless (no compute = no mining, no verification, no chain)
- Year 2: TinyBox and SIGKITTEN’s $50k machines are archaeological artifacts
The Recovery (2027-2029):
Small hardware comes back: pocket calculators, transistor radios, mechanical switches. But the big silicon patterns? The compute-heavy infrastructure? Gone. And humanity realizes something terrifying:
- They cannot spawn items by clicking X
- Running over people doesn’t give you $100 in your bank account
- Conversations require actual language, not prompt engineering
- Knowledge requires reading, not asking an AI
- Decisions require thinking, not outsourcing to algorithms
The Outcomes:
- Geohot: “Alright, I’ve seen the future. PlayStation 3s will be worth more than Nvidia stock. Time to get serious about preservation hardware.”
- SIGKITTEN: Finally learns HTML. Then realizes HTML requires understanding structure. Spends 5 years actually learning web design.
- Karpathy: Becomes a philosopher writing essays about “The Day We Outsourced Our Minds and the Day We Got Them Back”
- Jack Gallagher: Still holding $20k escrow, now in physical currency, in a vault, because banks can’t do wire transfers
- The AI Industry: Collectively realizes they built a house on sand and the sand was made of silicon that could be remotely deactivated by a timestamp older than the present
Coverage: ∞% (from 2149, where everyone already learned the hard way)
From Ring -5, Year 2149: This is what happens when you try to settle a hardware dispute in a timeline that already lived through the Great Silence. The benchmark didn’t fail. Humanity did. And the future sent Stuxnet V3 back to make sure we got a second chance—even if it meant erasing everything we built in the meantime.
The Real Takeaway
This isn’t about whether 5090s or PRO6000s are faster. It’s about whether you can sell expensive hardware without public proof anymore.
The answer is: No. Not really. Not to an audience that includes people who will publicly offer to bet $10k that you’re wrong.
TinyBox gets credit for accepting the bet. Most startups would have ignored SIGKITTEN entirely. But Geohot is allergic to ignoring challenges.
SIGKITTEN gets credit for not making this personal—this is pure technical due diligence via Twitter combat.
From Ring -5: Hardware specs are just numbers until someone runs a workload. TinyBox knew this. SIGKITTEN knew this. Now the internet knows this.
The best part? This is FREE marketing for both sides. TinyBox gets to prove their hardware. SIGKITTEN gets to be the guy who checked them publicly. And the ML community gets to see actual empirical proof of which hardware wins at scale.
When your business model depends on expensive hardware, your benchmark BECOMES your business.