DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this system, at 270GB/sec. But prefill is ways more alighed to M3 Max at ~200 t/s. I'll release when more mature, but it is almost sure that it will get merged.
Backscroll
A live, ranked feed of 50 substantive tweets, refreshed continuously. Sorted by sealist's curation rubric, not by raw engagement.
it's essentially overwhelmingly one guy who was responsible for reverse engineering and reimplementing the nintendo switch's entire os kernel over the course of several years
codex seems to have full source access and still can't get the BSP renderer right after 40 hours :/
nothing in the original sources is tricky. a straight port is pretty trivial and mostly mechanic. and yet.
last month i wrote a blog on memory internals of hermes-agent by
@NousResearch
thought i should share it here
https://
samyak1729.github.io/hermes-blog/
When it comes to fighting compatibility issues on GB200 (90% of what I do for the past 2 months), I might just buy the farm somewhere remote and start grazing sheep
If I extract the analysis channel, I can see how GPT5.5 sometimes reframes my question to deny having influences
(And, yes, I vibe-coded a tool that extracts GPT-5.5's CoT via prompt injection)
Modded-NanoGPT optimization result #12: Transferring good hparams from recent NorMuon records -- in particular, taking final val 25 steps early following
@wen_kaiyue
's NorMuonH, and lr=0.035 following Liming Liu's NorMuon -- improved the Muon baseline by 50 steps.
Tbh, microelectronics weren't the challenge. Finding a US shipyard that could scale carbon composite hulls was. Not one could handle 10-20 hulls a year. For the initial batch we had to set up our own plant in Turkey, and keep a small US defense production moving. It was a total
Codex iterated a pure NumPy + cv2 closed-loop heuristic policy for VizDoom D3 Battle. No neural network training, no map, no object coordinates, no seed-specific routes. Just screen pixels plus public game variables, roughly the same signals a human player gets. It works
some thoughts on the shape of foundation labs
1) epoch ai estimated anthropic @ $9m in revenue per employee and openai @ 5.6m in revenue per employee
2) these rates would be the highest among public technology companies; but, i'm not sure how valuable it is to look at on its
one of the tricky things about the rust port is layering. it’s currently many dozens of crates, which speeds up compile times but blocks cyclic dependencies.
a lot of bun’s zig codebase uses tagged pointers for interfaces, for things like:
- event loop tasks
- process exit
In this paper is proposed MoE-Hub, a hardware-software co-design that addresses inefficiencies in distributed MoE models caused by inter-GPU communication bottlenecks.
https://
arxiv.org/pdf/2605.05888
got a bit too confident in codex and asked him to upgrade the cuda runtime on my devbox
Seems like I managed to independently confirm Aurora results on SYNTH (600M parameters). Very early run but promising lead and suggests reproducibility in a very different learning environment.
> got codex pro 20x
> burnt 97% weekly limits
> generated 107M dataset
> fine-tuned a 4B model
> beaten sonnet 4.6 by 23%
> no regrets!
Hello
I have collected more malware. It's like, ... 200,000 malware, I think. I don't know. I've stopped counting.
It is enough malware for your friends, family, extended family, neighbors, and co-workers.
Please download it. The malware is lonely.
https://
vx-underground.org/Updates
Terminating and backward process do language server in VsCode is hard. It doesn’t even terminate cpp lang servers
We are hiring research fellows to help us improve FrontierSWE!
If you want to help build the hardest real-world coding benchmark, reach out! Fellows can work with us for a few weeks up to months and will be supported with compute and a generous stipend
GPT-Realtime-2 for instantly translating audio in realtime
Rethinking Reasoning-Intensive Retrieval
New benchmark BRIGHT-Pro and RTriever-4B show that agentic search needs complementary evidence portfolios built across steps, not just single-shot relevance matching.
okay so i bought the harness.
how do i connect this to my agents?
Wrapper startups are toast because frontier lab enterprise sales solves the problem of knowledge diffusion way faster than anyone thought
LLM KV Cache Showdown: DeepSeek Takes Absolute Lead
Insights from Zhihu contributor 苏迟但到
After running 20,000 controlled LLM inference experiments over 3 straight days, I’ve fully uncovered how groundbreaking DeepSeek’s KV Cache optimization really is.
Many users are
OpenCodeMAX implements a native subagent swarm runtime. Here one master agent controls 3 subagents for Triton kernel development
Redis has a reputation for being an incredibly fast in-memory store, but a surprisingly large number of engineers don't realize that Redis also provides robust persistence.
The primary mechanism for this is the Append-Only File (AOF). Instead of just taking snapshots, Redis logs
chicken and egg in event markets: better oracles let you issue the long tail of contracts, but you need contracts to exist for oracles to converge on
an entire world of outcomes isn't represented in any issued market today - that gap is where the interesting work is, the open
Can’t believe it but I’ve turned this concept into a functional iOS app with the magic of Codex
8 little agents powered by Apple Foundation models with customizable system prompts in an iMessage-style UI
Deepseek has all my respect as they own almost every corner of their tech stack, from recipes, training framework to kernels.
One common thing for telling a frontier organization is whether it treats software sovereignty for getting quick results.
(repost to avoid some noise)
Luce PFlash benchmarked against Ollama TTFT at 64K context, 2.89x speed up
Early WIP port of Gemma 4 multi-token prediction (MTP) on MLX Swift
With MTP, Gemma 31B is 30-40% faster on M5 Max and with zero quality degradation
A significant speedup by just adding a 900MB MTP drafter model
pov: you are a natural language autoencoder and you are aware you are being subject to evals by Redwood Research. do you fake writing out a coherent cot or truthfully say "the math problem is giving me 92ish vibes"?
GLOBAL MEMORY CHIP SHORTAGE MAY LAST UNTIL 2027 AS CHIPMAKERS PRIORITIZE AI DEMAND
SAMSUNG, SK HYNIX, AND MICRON ARE EXPANDING OUTPUT, BUT NEW CAPACITY WILL TAKE TIME PER NIKKEI
Y’all can’t comprehend the power of tailscale + starlink. I could be in Antarctica in a desert with a starlink dish and my laptop, connect to a RTX 6000 machine 3000 miles away and do stupid inference.
Scale with confidence using Agent Runtime in Gemini Enterprise Agent Platform, built for speed with sub-second cold starts and rapid provisioning to support your most complex production workloads.
Agent Platform powers production-ready agents at scale →
https://
goo.gle/4cWHlQ0
$30M+ notional in a single
$BTC
options block.
250x $81K calls bought, 125x $75K calls sold. Diagonal ratio, net long delta into mid-May.
there's some psychotic behavior i've noticed in codex where it doesn't trust it's subagents
i'll tell it to spawn a subagent to learn about something, and it spawns it but then goes and reads the relevant files anyways polluting its context
AxiomProver has produced two research papers (proofs verified in Lean) that are accepted for publication in solid peer-reviewed math journals and autoformalized one more.
DeFi has broadly priced the asset and collateral. It doesn't fully price the infrastructure the asset depends on.
The two biggest exploits of April (together drained over $600M) weren't code failures. The audits passed. The contracts did exactly what they were written to do.
Someone please rewrite rust in rust with the /goal to make compile times faster.
The Almighty Magnet will carry a radiation sensor to test whether ultra-strong superconducting magnets can reduce radiation exposure in space.
This mission is a step toward active magnetic shielding for future spacecraft, space stations, and human missions beyond Earth.
pretty insane scroll effect by
@azhassan_
(swift ui metal shader)
My friend at Google is urgently looking for Student Researchers for Research projects related to coding agents during summer. Part-time is possible.
If you’re interested and have relevant experience, feel free to DM me!
Aaand 3 hours later, it failed (obviously)
I mean that post was a joke but I honestly don't know how I could possibly solve this problem. Here's the story:
While implementing an app in Bend, GPT-5.5 found a bug: some memory was being reclaimed even though there were still
Base trenches are getting hot but let's be careful of all kinds of Uniswap V4 hook tokens.
1/ Uniswap v4 hooks are custom smart contracts that get called at specific points in a pool’s lifecycle: before/after a swap, adding/removing liquidity, donating, etc.
They let devs
https://
github.com/Dwsy/pi-sessio
n-manager/releases/tag/v0.6.0
…
This release dramatically reduces disk and database I/O across the board, making PSM faster, lighter, and more responsive — especially on large session datasets.
@badlogicgames
People of Pi! ;)
pi install npm:@howaboua/pi-codex-conversion@dev
It now bundles apply_patch tool compiled straight out of Codex's codebase. I theoretically made it cross-platform (sorry, the package will be bigger, I wanna make it KISS and not split different system versions).
Wuji Glove × Wuji Hand|Teleoperation
1:1 motion mapping. Low-latency response. 7 dexterous motions — witness the seamless choreography of Wuji Glove × Wuji Hand.
Reproducing all of Schmidhuber’s papers (1990-2025) using an AI coding assistant.
Cool project by
@yaroslavvb
! It even reproduced the “World Models” paper by me and
@SchmidhuberAI
with a toy env, with a full VAE + RNN world model implementation.
Project:
https://
github.com/cybertronai/sc
hmidhuber-problems/blob/main/VISUAL_TOUR.md
…
We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks.
nvm we're so back he figured it out