4:20pm San Diego - NeurIPS Exhibt Hall G/H

Liangchen Luo - Coding Team Lead xAI Graham Neubig - OpenHands Sida Wang - Fair Michele Catasta - Replit

Will Coding agents replace SWEs?

Rabbit - replace 2025 developers and require everyone to upscale
Fair - no, until AGI
Second - Lots of parts of the SWE workload, and he’s doing less code writing and a lot more QA/Testing. Also thinks we’ll have a lot more SWE’s

How do we feel about the gap of the moment between Sota models and ‘scaffolds’ (tools)

Michele - ”…”
Sida - “Most capabilities, if give keyboard/mouse it can do everything”
Graham - “No to give keyboard/mouse…Hardest part in towards world is authentication etc. in corporate environemnt, and so context engineering is so important.”
Liangchen - “~2 trillion tokens if you look at GitHub after deduplication and data validation. This is not a lot and can easily be trained on this and as such pre-training public data is probably maxed out as that’s already accounted for, however post training isn’t accounting for as much.”

Liangchen - “While interviewing they don’t know code and they feel like they’re researchers/prompting.”
Graham - “Works at university so more interesting. Studies that show AI assistants hit a ceiling and very hard to go past it. Maybe no one has the answer to it?”
Sida - “People are pretty smart and they’ll figure out what AI’s can’t do yet. Always going to be some tasks you cannot do, and so humans will figure out how to learn.”
Michele - “+3 year offset on careers. You become manager of agent on day 1. Shepherd multiple PR’s at the same time.”

Michele - “Creative arts”
Sida - ”…”
Graham - “Doing research that is already solved by a frontier closed model. 1) Spreading knowledge to everyone is public good, 2) Do really good job at it you’ll get a lucrative offer at these AI labs”
Liangchen - “No basic suggestions.”

Michele - “Going from zero to 1 there aren’t good benchmarks for.”
Sida - ”…”
Graham - “Said in talk software is just a means to an end. Take high level things we would like to be able to do and then set them up in a way that coding agents can solve them. Like a few scientific benchmarks, but we are all ML/AI people so overindexed. However, there is a much larger class of tasks that need benchmarks on.”
Liangchen - “CEO launched MicroHeart - meme reflecting the Microsoft. Virtual company. Good computer use is required. We need good way to manage memory. Some kind of long context benchmarks in this area, because we’ll run out of memory anyways, we cannot have unlimited context windows, and this is quite under explored at the moment.”

Michele - “Gemini 3 pro did a step functioning improvement. The way to solve this is building evals and said he’s a broken record.”
Sida - “Nothing special about slopping in coding agents and just because your model doesn’t understand the world and it’s related to model quality.”