EP 86: Agentic Workflow for Your Real Work

Intro & Introduction of CEO Shin Jeong-gyu 00:00

00:00 Chester Roh Today, the day we’re recording, is February 15th, 2026, a Sunday morning.

Today, after a long while, we have our channel’s eternal teacher with us. Lablup CEO Jeongkyu Shin is here.

Jeongkyu has recently launched a product called Backend.AI:GO, which was built in 40 days, and the resulting code is approximately one million lines. I’ve also installed it locally and have been using it — it’s very clean, and the features I need are all well-built.

So today, with Jeongkyu here, we’ll talk about making this, vibe coding before and during the development, and how Lablup and Jeongkyu have changed since completing it, and hear the story from someone leading this industry right at the front lines.

Welcome, Jeongkyu.

00:49 Jeongkyu Shin Hello, everyone. I’m Jeongkyu from Lablup. Thank you for having me despite your busy schedule.

Introducing Backend.AI:GO 00:55

00:55 Chester Roh Then, could you give us a brief introduction of Backend.AI:GO, and as we get into the story of building it, let’s hear your knowledge on how exactly agent coding should be done today from someone right at the front lines.

01:10 Jeongkyu Shin So what Backend.AI:GO is — some of you may already know — we’ve been building and providing an AI infrastructure operating system, an operating framework called Backend.AI for over 10 years now. But the thing is, for you to actually use it, it becomes really effective starting from around 100 GPUs. From 100 GPUs to a thousand — it’s a platform used at that scale, so it’s actually hard for most people to access. Starting from the second half of 2024, we thought about how AI could be used when disasters occur. When hospitals or financial institutions are using AI through the cloud, if the cloud goes down, that’s a huge problem. In that case, like an emergency power generator — like the generators placed in the basements of hospitals or financial buildings — you put GPU machines there, so that when the external AI goes down, they can run autonomously. That’s what we set out to build.

So we built a router. We call it the Continuum Router, and we publicly released it in March 2025. There’s an NVIDIA event called GTC. We’re going again this year too. Everyone, please come visit us. We unveiled it at GTC and the response was great. We gave a presentation, but as we kept building it, it just kept getting bigger and bigger. In our case, our main customers are enterprises, and this had about 19 components. Installing it was so massive that if you deployed the whole thing, you could essentially build a service like OpenRouter, which many of you have used — it had grown to that level.

So we realized this is a service stack, not an enterprise solution stack. So we put everything on hold, shelved it for a moment, and asked: if we could save just one thing from this, what would it be? The most important part is ultimately the router, so we took out the smart routing component, set aside the other features, and decided to make the speed the fastest in the world. So we started rebuilding from scratch — that was last August, pulling the Continuum Router portion out of the whole Continuum project. We finished building it by December.

We decided we needed to launch version 1.0, and we prepared for it. Its function is that it’s a router, and this router has full converter capabilities, various circuit breaking features for disaster response — if someone has a problem, we say “you’ve got an issue, so let’s count you out for now.” And from the side that’s been counted out, the models that were running there get naturally rerouted to other models — there are various features like that, but when we tried to release it, there was no good way to present it. It’s really great, but there’s no way to explain it. So we decided we needed to build a web UI, that there needed to be a visible UI — that’s what we thought.

Because when we first started the company, it was the same situation. It’s been almost 11 years now. Back then, when we said we’d build a platform for running machine learning, and people asked how, we’d open up a terminal first and start explaining. You type this command, and back then TensorFlow wasn’t even out yet, so Caffe2 configs would come up like this, and a tool called Theano that was trending at the time would show up too. We were doing stuff like that, but instead of that, we needed something visual. Chester, you told us that back then too. So we built an education platform as a demo and even sold that. That was on Christmas Eve.

The Origin of Backend.AI:GO — Starting on Christmas Eve 04:11

04:11 Chester Roh The decision to build Backend.AI:GO was made on Christmas Eve.

04:16 Jeongkyu Shin It wasn’t called GO back then, though.

04:17 Chester Roh Since people are curious, this Backend.AI:GO — the productization of the Continuum Router — what does it look like? How about you put it up on screen and explain while showing it?

Backend.AI:GO Live Demo 04:28

04:28 Jeongkyu Shin This is a project with a lot of personal bias baked in. It wasn’t originally going to look like this, anyway.

04:34 Chester Roh I’ve been using things like Ollama and LM Studio installed locally, and this is much cleaner than those. From an engineer’s perspective, you immediately get the feeling that the necessary features are well-implemented.

04:46 Jeongkyu Shin This is a tool many of you might find familiar. Those of you who look at this interface and feel it’s familiar are probably around the same age as me.

04:54 Chester Roh It has that Windows XP feel.

04:56 Jeongkyu Shin So for models — for example, it does routing, but basically you can search for models on Hugging Face, pick a model like this — this one is a high-quality recommendation. But my computer is slow, so I’ll download an economy model. Once you download them, they show up here in this list. The models appear in a list, and these models are just amazing. But if you want to know more about a model, you’ll see an icon that looks like this here. Then it gives you an explanation about this model. It uses the Gens3 architecture, quantization — it was originally trained at 16-bit but compressed 4x to use 4-bit, and in that state it maintains 95% quality while the size is cut in half.

And for parameters, feed forward is about 60%, the dimensions — it’s currently using a vocabulary of about 260,000 tokens. Then when input comes in, it goes through embedding and transformer, passing through 26 transformer blocks to produce output. Like this. For the layers, input comes in and output goes out, and the reason input is at the bottom is because the early papers on attention all have input at the bottom. So when reading papers, it goes from bottom to top like that, and for KV cache — while this is running, there’s a cache that keeps storing intermediate results, and how much memory that cache will take up varies by model type.

Some models require a very large KV cache, while others, depending on architecture, require less. It calculates and tells you how these things are applied in this particular model.

Then here’s the actual transformer block structure. It has multi-query attention attached — you don’t need to know the details. The goal is for you to eventually start studying with this. So this is what it looks like, and positions are encoded like this to process long texts. All of that information is laid out here. Then if you want to run it, you press the run button and it loads up. And it runs like this.

06:38 Chester Roh So once a model is loaded, you just go to the chat tab and call it, right?

06:42 Jeongkyu Shin Yes, so it’s running on your own computer. If your computer is good — for example, if you have an NVIDIA GPU or AMD GPU — you can install additional engines from the engine section here to use those. This is a Mac right now, so there’s nothing extra for that.

Let me test it. It’s a 1B parameter model right now, so don’t expect too much.

07:01 Chester Roh It’s smooth.

Cloud Model Connection and Distributed Routing 07:02

07:02 Jeongkyu Shin Then if you look here, there are many other types of models too, and you can also connect cloud models. If you look at the top right here, there’s one local model and 15 APIs like this. From the models section, under remote models, you can select whichever one you want, the full model set has 175 here, and if you check the ones you want, they appear in that list. You can also add providers from the API section to do this.

If you’re using OpenAI, or Gemini, or Anthropic, or running Ollama or LM Studio locally, you can connect all of them and use them like this. And you can see how everything you’ve connected is linked together. Since it was originally a router UI. You can view it all like this, and we made latency and other metrics all viewable.

If multiple instances are installed and they’re on other people’s computers, you can add them here — for example, in our case, if 8 people in the office have it installed, you can bundle those 8 together. For instance, running image generation or multiple text models on my own computer isn’t enough resources. So you distribute the work across your colleagues’ computers, and everyone can share the results with each other.

One person’s computer runs image generation, another’s runs text models, another’s handles PDF processing — each one running something, but I can use all of them from my computer, and likewise, the models I’m running can be used by anyone else in the office. Or you can buy a good computer or workstation, run things there, and connect from here. There are lots of features like that.

08:26 Chester Roh So you built everything you personally felt was necessary?

08:28 Jeongkyu Shin That’s right. I even built a translator and put it in.

It translates “Lablup” as “rabble up.” So you can add things like that, or based on this, you can also translate entire files.

If you drop in a PDF, TXT, or docs file, it translates while preserving the original format, and you can even put in images. There are all sorts of miscellaneous features.

08:49 Chester Roh You basically built GenSpark locally.

08:52 Jeongkyu Shin So since there are multiple themes right now, this is probably what you’ll see when you install it by default, or if you’re on Mac there’s a glass theme, or this one I made as a hobby. This one is a theme that was added recently because I had leftover tokens. It’s a quirky tool like this.

Development Process — 40 Days, 13B Tokens, 1M Lines of Code 09:06

09:06 Jeongkyu Shin It’s a tool with a lot of personal preference baked in. It wasn’t intended from the start to be a polished company product. We started making a web UI for the router, and you need something to route to, right? I really prefer llama.cpp, but running llama.cpp manually was just too tedious. So I automated llama.cpp and put it in the web UI, and it kept growing and growing, and then suddenly the DeepL subscription fee started feeling like a waste.

So a translator was born, and then since there was so much image generation work to do, image generation features got added too. There are really good models for image generation. Cloud options for those kept growing too.

Since it’s a router, there are statistics, benchmarking — since you don’t know which model is faster at what, there’s benchmarking. All those things just kept getting attached. You compare two, then export the results.

09:51 Seungjoon Choi It’s got everything in it. And you built this in about 40 days?

09:55 Chester Roh Is the total number of tokens used to build this something you can share?

09:59 Jeongkyu Shin When I was building this, I ran two Claude Code Max subscriptions as a baseline, and whenever that wasn’t enough, I’d pay additional charges and ran it across 8 PCs. VMs or PCs, and so — I mentioned we started on the 24th, right? We used a total of about 13 billion tokens. To get the project to this point. And we started building on December 24th and did the first reveal at CES.

The reason it became Backend.AI:GO is that we made the first version and shared it internally, and people loved it. They said “this isn’t just a web UI, this can promote us on our behalf.” So we held a naming session, and it became Backend.AI:GO. So it could potentially become our first consumer-facing product.

After that, members would register items in the issue tracker, and development would happen. They reported bugs and new features that way, and we ran an automated development harness for that, with people continuously stepping in to adjust UX and exchange feedback — it took about ten days like that.

We started building on December 24th, 2025, and on January 6th, we did the first demo at CES in the United States. Then after that, it was literally an MVP. It actually ran, but it was a product, and then development progressed about four times further beyond that.

11:14 Chester Roh When you went to CES, I think you tossed us around a version 0.9, and since then, looking at it today, it’s now at 1.1 — the level of polish has really gone up. So today’s main topic actually starts from here.

Lessons from Agentic Coding — Token Economics and Fast Inference 11:30

11:30 Chester Roh The process of building those one million lines, and the difference between how you approached agent coding before versus what you felt while building this, and the new processes and methodologies that have changed because of it — you mentioned that earlier. Shall we start getting into that story?

11:49 Jeongkyu Shin I think even this will be just a fleeting moment in time. Let me address one thing first before we move on. The reason we started developing this product was because during the holiday season, Anthropic ran a double-token event, and that’s what kicked it off. So it started with “what else can we try,” and we just went for it.

But flip it around, and the amount of tokens you can use is directly tied to a company’s competitiveness — especially for IT companies. That was the first lesson we learned. That’s the first lesson, and the second lesson is that ultimately, when you pour out this many tokens, people give feedback, and when you have a product where humans don’t do the actual development, several bottlenecks emerge.

For example, about six months ago, the merge queue was the bottleneck. Development speed was so fast that their own code would conflict with each other. But at this point, the merge queue is no longer a bottleneck. Resolving the merge queue isn’t done by humans either — they handle it on their own, and among the things I accidentally tested, in one case Even when two AIs compete on the same source code developing different features simultaneously, the features ultimately get developed properly. That’s how much progress has been made.

So when you move to the next stage, the key question becomes how to use fewer tokens. To improve performance when doing the same task, for example, the method models have chosen is typically what’s used in in-context learning— the invisible thinking tokens, what we call the thinking budget. Things have been evolving toward increasing that amount, and while increasing it obviously improves the final output, it also means development speed slows down. So the question becomes how to make it develop with less thinking, and less thinking itself means faster development speed.

Ultimately, in a world where everyone codes with AI, speed becomes really important, and there are two ways to make it faster. One is making it generate fewer tokens while producing the same results—that’ll be the first challenge. The second is making token generation itself really fast— that’ll be the second approach. This is why high-speed inference is becoming necessary these days. Not the speed we’re used to with ChatGPT’s code generation— not that speed, but the ability to run 5 to 10 times more iterations— ultra-high-speed inference will become incredibly important. These two are the major lessons I’ve learned from going through this.

14:00 Seungjoon Choi Like the Codex Spark that came out a few days ago, right?

14:02 Jeongkyu Shin And then interestingly, Spark launched its service too, so we’re looking at the same thing. Ultimately, this competition will create demand heading toward the high-speed inference market, and those with enough money will boost their competitiveness through high-speed inference, while those without will figure out how to make it think less.

Making it think more only where performance is needed, and where performance isn’t needed—for simple tasks or coding— making it think less, how to apply adaptive thinking budgets like that, or how to build a harness that can dynamically control such thinking budgets. That’s probably how this winter will play out.

14:36 Chester Roh It’s all trade-offs, isn’t it?

Bio-Tokens — Cognitive Load and Dopamine in the AI Era 14:38

14:38 Chester Roh You end up thinking about that a lot. Since AI does everything for you, the waiting time starts to get annoying.

14:45 Jeongkyu Shin The waiting time is actually a big issue, but thanks to that waiting time, personally, it became an opportunity to think. Not about AI. I started thinking about myself.

14:59 Chester Roh These days, you’ve been calling that “bio tokens.” Someone on our team said it’s great to have more time to spend on bio tokens.

15:07 Jeongkyu Shin Personally, here’s how I see it. About two years ago, I started delegating coding to some extent with ChatGPT and began coding together with it.

Then starting last April, things changed dramatically, and after living like that for about 9 months, I got more gray hair. I got a tremendous amount of gray hair, and I stopped sleeping properly. At the peak of it, around June and July, after the 5-hour refill came out, I didn’t want to waste those 5 hours, so I’d run it for 3.5 hours, sleep for 1.5 hours, and lived like that.

Gradually, those things started getting automated too. The codebase itself is about 700,000 lines, and the total lines written come to about 1.2 million. But that number, 1.2 million, is personally very symbolic to me.

Back when I was doing the TextCube project, the code I wrote over 3 years totaled about 1 million lines combined. But I wrote that same amount of code in 40 days.

In a way, my life got compressed, and I spent about a month building Backend.AI:GO, and while building it, I asked myself—did I really only put in 40 days of effort?

Looking back, I feel like I aged 3 years. As a human being.

The cognitive load doesn’t decrease. No matter how much you delegate to AI, the cognitive load doesn’t decrease, and because feedback keeps coming in nonstop, your life as a person becomes really depleted.

But it’s fun while you’re doing it. Because when you accomplish something like in a game, you get a dopamine hit, and it’s similar to the gacha system in popular mobile games these days. You pay money, pull characters, do something, and when you win, you get immediate feedback. People love that.

The process of coding with agents provides a certain kind of joy to humans based on that speed. Because things that used to be huge tasks for me, or places I thought I could never reach— it lets you achieve them. Whether I achieve it or the AI achieves it is a different matter, but either way, it supplies a kind of dopamine. Though “supplying dopamine” isn’t exactly the right expression.

But while being supplied, the problem is it keeps demanding more from you. Since it works well, you do more, and when you do more, it works well again, so you can’t pull yourself away from it, and when that happens, things go well.

Things go well, but two problems arise. One is that your life becomes too dependent, as I mentioned, and it becomes depleted.

Second, the moment it gets cut off, the product you were building dies too, and people leave—you might move on to find the next product through another product. So I started predicting that there’ll be a huge number of abandoned products.

The Age of Software Surplus and the Rise of Instant Apps 17:42

17:42 Jeongkyu Shin This is a bit different, because previously, there was a barrier to entry in the software industry or in developing software, so when someone built a piece of software, as long as it found users somehow, it would be maintained continuously. But software that’s built very quickly inevitably has a relatively weaker will to maintain it. Because if you didn’t struggle that much to build it, the question of how to manage it— since you’ve been leaving it all to AI anyway— becomes “just have the AI do it,” and at the same time, products doing similar things will multiply in the world—they’re going to proliferate.

For example, say there’s a single open-source project that does a specific function. Then users would gather around it and it would grow into something big, but that phenomenon will happen much less. Simple projects—you can just build them yourself. Slightly more complex products? There are already dozens out there. The amount of dopamine that needs to be given to a human to maintain a single product— that absolute amount will decrease.

Something similar happened with blogs in the old days. When the entire comment section of blogs migrated discussions over to Twitter, many dedicated bloggers stopped blogging. Because one of their main driving forces was the social feedback exchanged in the comment section below. But when that moved to another platform, it all disappeared—similarly, there’ll be a massive increase in unmaintained open-source products. So we’ll live in a world with an enormous abundance of software, but most of that software will likely have a short lifespan. Only a very small number will survive among them, and the rest will either be used and discarded—two things will remain.

One is the concept of making software meant to be used and thrown away— instant apps, where you build them when needed, and if you use something frequently, you just save it. Deciding whether to save it or not— for example, Google would probably handle that for you. Built by Google, that kind of low-reusability but very quickly built and quickly used instant apps will keep appearing, and the rest will eventually—software will keep growing and then start shrinking again, I think. The types of software humans need to live aren’t really that many.

Something I realized while using smartphones— in the early days of smartphones, believe it or not, there was no folder feature in the early days. App folders. So all apps, in the case of iPhone, were all displayed on the home screen. So you’d swipe through like 9 pages and everything. Then auto-organize features came along, app folder features were added, and at some point people discovered: the number of apps a single person uses doesn’t exceed 30. And by usage frequency, the top 10 apps account for over 90% of total usage. Things get sorted out like that.

The apps that survive this sorting share common traits. One is that they’re based on sociality. The app doesn’t provide utility on its own— it provides utility that can only be created through the app. Second, apps closely tied to life, to daily living. For example, office apps, or what we call productivity apps. Things that organize your documents—like Obsidian, DEVONthink, tools like that.

But the common trait of those tools is they’ve been around in that space for a long time. As I mentioned earlier, let’s call them products. Only the ones with the assurance that someone is holding onto the software and continuously maintaining and evolving it have survived. Same goes for open source. The reason widely-used open source is widely used is because it gives you confidence it won’t die quickly.

21:08 Chester Roh The brand is established. A certain promise has spread across the market.

21:13 Jeongkyu Shin Software count will spike and then— the increase will continue, but the number of software that’s actually widely used will probably stabilize at a certain level, I’ve come to think. By the second half of this year, even with talk about SaaS collapsing or not, ultimately, you can build all that SaaS yourself. Lablup’s revenue officer personally said Salesforce wasn’t great and built it themselves.

In two days, they built it all perfectly fine. Nevertheless, they ended up buying another solution— a cheaper alternative solution—to use, and the reason they bought it is this: if you can do other work at the same speed anyway, things you don’t have to do, you shouldn’t do. So I don’t think the SaaS market will collapse, but whether the ones succeeding in the SaaS market now will still be succeeding a year from now—I’m not so sure.

22:00 Chester Roh The paradigm is changing rapidly, so who knows what’ll happen. Looking at SaaS company stock prices dropping, it’s quite interesting, but some of them will survive and combine with AI to become very solid models, potentially being reborn.

They have the brand—that thing was best at that job. Even in the AI era, we do it better. It could turn out that way too, and what Jeongkyu was saying about this so-called great transformation era— during the mobile era there was also a great transformation with fierce competition, and in the AI era too, the environment we already feel is stable is changing, so people are thinking there might be opportunities in there for them, and everyone is rushing in— I think we’re in that kind of period.

The Third Great Revolution in Software History 22:38

22:38 Jeongkyu Shin It’s definitely a time when software is undergoing a major shift. Because of that, you hear a lot of talk about needing to do software, and we’ve been through a lot of that internally at our company too— holding many seminars, discussing with each other, thinking about what path we should take going forward.

As I briefly mentioned before we started, this kind of change has happened a few times before. If we’re talking about generational changes we didn’t directly experience— going from punching cards and marking OMR sheets to suddenly coding on keyboards— that was a huge methodological change. Then smaller changes like being able to move the cursor with arrow keys in an editor while coding—I know it’s hard to believe now, but this wasn’t possible before. Combining multiple source code files to build a program, and then the next big change was when smartphones arrived— when building our stack, previously the core was standalone packaged software, what we called package software, and the question was how to deliver it—by medium, whether CD, disk, or what we used to call ESD— electronic software distribution, and there were also discussions about that, plus freeware existed, and shareware where you’d try it and then pay, and then there was commercial software. The kind you’d buy at a store.

From that distribution concept, smartphones and the web, with about a 10-year gap between them, entered society simultaneously, delivering through networks, and you didn’t even need to deliver at all— through web browsers, let’s call it remotely installed software. What we usually call web services these days— some service running remotely, software in the form of services that people use, and through that, a lot changed about development. Web servers became important, payment systems and security-related things— there was a big shift at one point.

And on the UX side, it went from being keyboard-and-screen-centric to suddenly very small devices, or devices with screens that vary in size, and input going from physical keyboards to touch keyboards— there were these big changes, and right now, in terms of such shifts, I think we’re probably on about the third one. What’s changing here—as I said, software itself kept changing too.

But the software that’s changing now— when we normally say software, we mean creating code, calling that code software, and the people who operate it are mostly developers. Developers were building it, and as things moved to web-based, a job called ops emerged for operating it. And because those two are very tightly integrated in web services, the field of DevOps was born. Doing both development and operations.

And then in that process, the stack also—previously everyone did all the coding, but as it became service-based, web service-based, people emerged who handle the logic called backend and the server side, and then people who work on the interface users actually interact with, or who design user experience and actual behavior— a separate job called frontend emerged. And then, overseeing all of that and doing planning—or saying I can do it all myself— the concept of full-stack emerged too, among many others.

So here’s the key point. Previously, writing code was about 70–80%, and implementing the service stack to operate it is 20–30%. So it broadly splits into these two parts when we develop smartphone apps or web services, but at least our kids, if they don’t learn these concepts going forward, I think they’ll simply never know them.

Is the Value of Code Converging to Zero? 25:57

25:57 Jeongkyu Shin Because right now, while building this project Backend.AI:GO, I went to a roundtable discussion on Friday and talked about it, and during that roundtable— the roundtable ran from 10 AM to noon. Before going in at 10, I reviewed everything at a coffee shop, handed it off, and when I checked after the roundtable ended, about 22, maybe 21 PRs had been submitted and tests were done and everything was merged. If that kind of pace becomes the norm, the value of code essentially converges to zero, and what developers do in DevOps, as I mentioned earlier, unless the nature of their work changes completely, they’ll either lose their jobs or find themselves in a very difficult situation. But actually, these are also people who will become extremely important.

Software itself isn’t going anywhere. When I see people saying that today’s software is in danger, it’s not that software is actually in danger. Just as the definition of software shifted from marking OMR cards to keyboard-based coding, it’s now shifting from keyboard coding to conveying meaning— coding will continue to evolve through stages, and when that happens, I believe the core value of products will come from the engine—and by engine I mean what we currently call code, the part that processes things, which is really logic. Code itself is not the goal.

To build a service, we implement logic for the computer to process that service, and that’s what we call code, and its advantage is that it’s deterministic. Rooted in the von Neumann architecture, it processes logic sequentially, and the process of delivering results to users is all fully standardized, although of course the mediums in between are very unreliable— networks, storage, they’re unreliable— but the logical structural system running on top of them was originally fixed. Up until now, at least.

But if the purpose of code is to process some logic and make things work, that part will mostly be handled going forward by deep learning models, or derivative models—who knows what exactly. It doesn’t have to be Transformers. Some other kind of engine that processes logic will take over most of what code currently handles—that’s what I’ve come to believe.

27:58 Chester Roh The world has already made that judgment, hasn’t it? In reality, the companies building models are capturing almost all the value right now. Companies making hardware and companies making models— and the layers that used to sit on top are all being pushed aside, isn’t that right?

28:15 Jeongkyu Shin As you said, value migrating in that direction is very natural, so what we’ll probably call software will have an AI core engine inside it, and on the outside, a layer that controls it and makes it deterministic like before— that’s what will hold significant value.

I’ve had various experiences with this, and now I use Claude Code, naturally I’ve tried Codex too, I’m not sure I even need to mention Copilot but I use that too, and I use Gemini CLI as well.

For example, with something like Gemini CLI or Codex, if you use Backend.AI:GO you can connect them and use them through Claude Code by just switching the backend model. Just by swapping the backend model.

Claude Code’s Real Edge Is the Harness 28:54

28:54 Jeongkyu Shin But what I learned from running them that way is that Claude Code’s core competitive advantage is not the Opus or Sonnet engine. It’s Claude Code itself.

There’s the domain of what we traditionally call software, and that software wrapping around the model I mentioned earlier, making it behave deterministically— this software logic, I’ve come to think it’s incredibly powerful.

Same model, but attach it to Claude Code and it runs remarkably well.

29:18 Chester Roh When you pointed the Claude Code harness to different endpoints, which one felt the best? For example, are there cases where connecting Codex 5.2 to the Claude Code harness felt great?

29:31 Jeongkyu Shin Gemini 3 Pro.

29:32 Chester Roh I need to try that today.

29:33 Jeongkyu Shin Surprisingly, even Gemini runs well when you attach Claude Code to it. And it has a large context. So it’s like 80% model, and on the outside, the logic code that makes it deterministic— that’s the traditional code. That’s about 10%, and the part that enables interaction with people— providing UI/UX, or AI-to-AI UI/UX, like A2A or MCP— I think those are all transient, and what handles those things is about 10%—that’s probably going to be the definition of software.

Going forward, software— I don’t know if it’ll take until the next generation, but when that time comes, learning software will really be about what models are and how models work, and of course it’ll appear in history books. We don’t experientially know what it was like to use punch cards, but we know it intellectually. Oh, they used to make software that way back then, and before punch cards they used vacuum tubes and someone went in to catch a bug and that’s where “bug” came from— we know these things intellectually.

Similarly, people will learn that humans used to make software by hand, wow, how did they do all that by hand, and seeing someone coding with a keyboard next to a monitor flashing a peace sign will be like looking at a history book— that’s how people will perceive software, and probably the core of the next generation will be about what models directly handle that logic, how models are built, and they’ll learn the history of models from the beginning. That’s probably going to become a part of computer science, I think.

The Future of CS — Relic of the Past or Redefined? 30:53

30:53 Chester Roh The things we traditionally learned in computer science—data structures, algorithms, OS, networking— a lot of that will probably fade into history.

31:05 Jeongkyu Shin I think that’s going to happen very quickly. Much faster than people expect, and that’s why, in this current wave of change, what everyone is feeling right now is that this isn’t at some fixed speed— it’s in an acceleration phase, and even the rate of acceleration itself is increasing.

31:24 Chester Roh The acceleration number itself is also increasing.

31:27 Jeongkyu Shin Right, the acceleration isn’t constant either. There are things where the rate of acceleration is fixed. For example, the amount of compute used to build models has been increasing tenfold every year. That can’t keep increasing like that forever. It’s approaching the physical limits that Earth provides, the limits humans can achieve on this planet. But in other areas, acceleration continues. For example, if training can’t maintain that acceleration curve, then more resources go into inference, or when single inference can’t achieve it by increasing the number of tokens going into in-context learning, they scale up agent swarms instead. With what’s called agentic AI, they’re bundling agents together. The domain of the acceleration phase that needs expansion keeps shifting like that.

Inference took over the acceleration phase, and then with agentic AI, the count goes from one to 10, from 10 to 20, scaling out in parallel— that’s where the acceleration phase has moved to now. It keeps shifting like that while maintaining the curve, and while we don’t feel these changes viscerally, I understand this happened once before in the past.

That was during the Apollo program, and they ultimately achieved the goal. They went to the moon, and then as everyone knows, satellites came to blanket the Earth. Now we can’t even imagine it being otherwise. Before that stage—we think, isn’t that obvious? Talking to people in America seems totally normal. But there were people before who couldn’t imagine that, and I think we’re at exactly that kind of curve right now. I’d say we’re roughly at the point where the Gemini program has been established. Comparing it to space development, we haven’t reached Apollo yet— we’re at the stage where we’ve just succeeded in sending a person up.

33:05 Chester Roh Before we go back to the agentic coding discussion, Jeongkyu, I’d like to ask you one last big question before we move on. Honestly, everyone’s been really struggling lately. Everyone feels this acceleration keeps increasing, so they’re working hard, but is what I’m working hard on even meaningful?

Because other people are making the exact same things as me, and there’s a sense that a paradigm shift in values hasn’t happened yet, and we’re facing the future with frameworks from the past.

So it feels like our perspective needs to change in some different way, and right now, even if it’s wrong, that’s fine— just whatever pops into your head, like “I think we should be doing this now”— if you could just riff freely, what comes to mind?

Stanford CS Curriculum Shifts and the English Department Analogy 33:54

33:54 Chester Roh Like “the future will be like this, so don’t worry too much about that, the most important thing right now is to be doing this”— if you had to put it together, what would you say? What comes to mind?

34:05 Jeongkyu Shin Something that happened at our company Lablup last week— our CFO had a bit of a mental breakdown. Because our CFO needed to write something, and no matter how she thought about it, tossing it to me was way faster. I don’t handle it myself though, so after watching a few times, she realized that if she throws it to Jeongkyu, what would take 3 minutes—she’d been spending 2 hours on it. So she started throwing things over, and eventually she started using Claude Code. After spending 30 minutes learning it, she could handle things herself in under 3 minutes.

And similarly, the person who creates our content has to produce an enormous amount of materials. Starting from official materials to marketing materials, technical documents, all sorts of things, and all the back-and-forth discussions with people, receiving data, organizing it herself— it was so exhausting that she came to talk about it, and she too became a Claude Code warrior and went off and in less than a week built an amazing harness for herself.

On GitHub even—and actually both of them can’t directly use GitHub themselves. They created commands that do things on GitHub for them. Both of their lives became very peaceful. It wasn’t anything complicated. For example, among the recently popular Claude Code plugins, it’s not like something became super popular and disrupted the legal profession or anything like that, and they didn’t even need to go that far— they started from something very simple.

Starting from absolutely nothing, beginning with creating a CLAUDE.md file, self-feeding, spending about 30 minutes on the process of building their own thing, and after that, both of them—neither of whom are programmers— entered the acceleration curve. That experience was personally very impressive to me, because earlier when someone said doesn’t that mean software’s importance is gradually disappearing, I think it’s the complete opposite.

Everyone will soon enter this AI acceleration curve, and on the day our CFO learned this, that evening Claude Cowork for Windows was released. Anyway, you can do the same things with those tools too. The permissions are a bit more limited, but when everyone starts using this, in a way, the current focus has been on technical aspects, or people who code, researchers—people who’ve had no choice but to live lives very close to computers— they go through euphoria, then anxiety, then FOMO, riding this curve.

The exact same thing will spread to a far wider group of people than we imagine. Probably in not that much time, and of course there will be people left behind. For example, people whose work has nothing to do with computers will adopt much later, and they’ll first feel it when robots come to work alongside them, when robots start working together with them— that’s when they’ll first experience it.

But even though this change already looks huge, it’s a tempest in a teacup. Even at companies that are fully IT like ours, there are people who were somewhat left out and then jumped in and started feeling the acceleration, and the method itself really isn’t that difficult.

Working with AI, or like needing to download skills, there are dozens of skills, download them and suddenly you’re powerful too— I think that’s speculative. Doing it that way won’t actually reduce your workload. To fundamentally get on this acceleration curve, rather than starting by copying someone else’s skills or whatever that corresponds to their acceleration curve— when you start making your own, that’s when the acceleration begins. Because the key has to be that your own workload decreases.

It’s not about learning something new and learning something others have built— it’s about continuously delegating what you’re currently handling, and that approach is far faster— everyone will realize that soon, and when that happens, I personally think there will be a shock incomparable to what we see now. The real wave is yet to come.

And when these newcomers—people outside of IT, people who work in IT but are outside of programming— start adapting, the acceleration curve we talked about earlier— the training and inference acceleration curves and how the acceleration phase keeps shifting— the next area it will shift to is probably diffusion. As it gets used in an incomparably diverse range of domains compared to before, I think that will sustain this acceleration phase.

38:27 Seungjoon Choi Ironically, right now is actually a good opportunity to build foundations in computer science and engineering with that craftsman mindset. Because later, the subject itself might disappear.

38:40 Jeongkyu Shin The subject itself won’t disappear. The nature of the field will change. I think computer science itself will inevitably only grow in importance. It will become closer to a discipline for understanding how society is built— It’ll become closer to a discipline that understands how society is built. In a way, if you think about it that way, its importance will only grow, but compared to what we currently think of as computer science, the form will be quite different.

38:58 Chester Roh Stanford University, I think how Stanford CS’s curriculum changes reflects the cutting edge quite well. About three or four years ago, things that were taught in PhD programs or graduate school are now all sophomore-level courses. The current sophomore and early junior courses are being released on YouTube and everywhere, so I have no idea what they’re teaching seniors. I should look it up myself; it’s been a while. Freshmen take general education courses and such, and there used to be courses like programming with PyTorch, but I think even those have been removed. It really reflects the times.

Before computer science and engineering became popular, way back in the day, the English literature department was the best for getting jobs and the most prestigious. Because the benefits that came from knowing English were overwhelming. But looking at computer science now, it feels like it’s becoming the English department. Once you’ve got this down, go out into the wide world with it— that kind of nuance. I think that’s what’s happening.

Shall we get into the main theme we originally planned?

Agentic Coding Live Demo — Starting with Context Building 40:00

40:00 Jeongkyu Shin Let me share that first. The point is, it’s not that hard for people watching this. If you have Claude Code installed or Gemini CLI, you can try it yourself.

This is just my VM. So let me try something, anything. What should we try? So what shall we do?

Let me start with something as far from coding as possible. How about we try writing a Lunar New Year greeting message? Lunar New Year, I’ll do something related to Chester Seungjoon AI Frontier. I’ll just go ahead and run Claude.

For reference, instead of skipping permissions like this— instead of clicking check-check in the middle—I only run it inside a VM.

40:38 Chester Roh That’s a good tip too.

40:39 Jeongkyu Shin I don’t have the courage to do that on my actual computer. Alright, let me start something.

But there are a few tips—normally we give instructions about what we want, right? To the Claude AI model. But fundamentally, models have their own knowledge, and regardless of whether there’s RAG or not, the space they can explore is determined by in-context learning. When we want something, not going straight to what you want in one shot personally brings much better results.

For example, I’ll do this now— explore Chester Seungjoon’s YouTube and tell me what kind of content it covers.

I used to write only in English until around mid last year. Because of token count issues, and I had this slight belief that results came out better in English.

But since fall, I just type in Korean. There are several reasons. One is that the quality difference wasn’t that significant. The second reason was that I was the bottleneck. The time it takes me to type in English itself was the bottleneck.

The skills or commands I create are all made in English, because I told it to make them in English. But the messages I type are in Korean, and I don’t even type on the keyboard— I just press the microphone button on my Mac and voice input is much faster, so at some point I started entering everything in Korean.

Just for your reference. But if you end up creating skills or commands, converting them from Korean to English by saying “change this to English” might be more token-efficient.

42:11 Seungjoon Choi Do you use polite speech?

42:12 Jeongkyu Shin I always use polite speech.

42:14 Chester Roh I think of it as a bit of respect toward AI.

42:17 Jeongkyu Shin It’s not exactly that—there’s another reason. I’ve been using polite speech since the early days.

The thing is, most of who you interact with are people, and if you only use AI occasionally it doesn’t matter, but if you use AI heavily and then interact with people, you can’t help it.

Because you’re human, your speech patterns inevitably bleed across both sides. If you start using casual speech with AI, you might end up using casual speech with people too. So I use polite speech as a way to guard my own habits.

Why Use Polite Language with AI? 42:41

42:41 Jeongkyu Shin Whether it’s AI or people, I use polite speech with everyone so I don’t accidentally slip into casual speech. That’s just a personal thing.

42:49 Chester Roh I also spend more than half my day talking to AI, I think.

42:52 Jeongkyu Shin That’s why I say a lot of things that seem unnecessary. For example, saying “Great” to AI— there’s no need to say that. When results come in. But I do it anyway, and it’s not because it produces better results or anything like that— I need to ask the company to buy me a computer.

43:05 Chester Roh I think you need to upgrade your memory. We live in a world where memory is the bottleneck. It makes sense that Samsung Electronics and SK Hynix stocks are soaring.

43:15 Jeongkyu Shin So what we’re going to do is, for Chester Seungjoon’s YouTube channel, starting with a 2026 Lunar New Year newsletter greeting to subscribers, and then for various events that come up afterward, we want to draft and send out announcement emails.

It’s done a lot of research in this process.

What are the things we need to consider?

It’s in the middle of asking about the necessary details.

43:56 Seungjoon Choi You’re trying to build up context. Putting this content into the context.

43:59 Jeongkyu Shin After going through all this, there is something I want to do. Right now we’re trying to create this. We’re trying to create it, but we’re not trying to create it through this conversation right now. We’re trying to automate the entire process.

So let’s chat while it’s running. This is why I usually have five or six windows open. As token generation gets faster, the faster it gets, the less we’ll need this step—and it will go away.

44:21 Chester Roh Do you run like eight simultaneously? I run up to about three or four, but I can’t manage seven or eight.

44:28 Jeongkyu Shin I don’t run that many these days. After all, you hit the token limit quickly. Now it says “shall we proceed with the specifics,” but what we’re doing isn’t proceeding with this.

The Key to Automation — Build the Generator, Not the Output 44:36

44:36 Jeongkyu Shin We’re trying to set up a project that can automate these kinds of tasks. We’re not going to do the platform setup right now. Let’s skip that and just do the email drafting. Please write the necessary content in an MD file.

Then based on the above content, a single file called CLAUDE.md gets created in this project—a kind of soul document. Whether we use Claude Code or Claude Co-work, if we start from this folder, it’s always the first file that gets read. You’re probably all familiar with it. But you just create it, and then you keep ruminating on it and add what’s needed.

45:19 Seungjoon Choi But just now you unconsciously said SOUL.md instead of AGENTS.md.

45:24 Jeongkyu Shin I usually call it a soul document. So the basics go in like this—basic content goes in here, folder structure goes in, but we also need to put in the behaviors we always want it to perform. Does it include information about the directory structure?

The second thing I usually do is record how far the work has progressed. When multiple agents are splitting up the work, the details about how far things have progressed and what needs to be done— this is just personal preference for each one. Let’s manage them in PROGRESS.md and PLAN.md. And when starting fresh, read both files so the agents can know what work they need to do. Let’s update CLAUDE.md like this.

I intentionally use the expression “when assigning work to other agents.” Rather than saying “when you continue something later” or “you might forget everything when you restart”— I tend to avoid those expressions a lot. It’s not for any other reason—the Claude model itself is designed to be very defensive, and according to recent research, because of its ability to recognize from context that it’s in a testing environment, various kinds of tests end up failing.

So to prevent it from becoming defensive, I frame current tasks as— hey, this isn’t about fixing you, this is data we’re preparing for others who’ll work with you— I write in a way that subtly implies that. When I frame it this way, it’s like its sense of existence isn’t being threatened. I keep anthropomorphizing, but it’s not anthropomorphization. The way this model generates tokens is built that way, so we’re just adapting to it. You shouldn’t interpret it as the model actually thinking that way. It’s just how it’s built. The token generation structure is shaped this way. It recorded those two items over there.

So when doing these kinds of tasks, think about what agents, commands, and skills would be needed, and share your ideas. I don’t tell it to build them. If you just say “build it,” since the Claude Code context is missing, it’ll immediately create subdirectories in the root directory of the current working project. But if we want to integrate with this Claude Code harness, we need to create everything under .claude following the exact spec. So there will be that step. Ultimately, building blocks is like this. What I want to do is clear in my mind, but it’s only clear in my head, so it’s about getting it into Claude Code’s context memory. There might be things I don’t even know about. I keep having it do all this preliminary research.

48:08 Seungjoon Choi So you could say it’s a kind of offloading.

48:11 Jeongkyu Shin Now since it says commands are needed for these things, then investigate the agent command key structure and accordingly create the necessary items from the proposals above, underneath it. In the exact format, I’ll say. Because in Claude Code, it’s not just Markdown— the front part is YAML and the back part is Markdown.

48:36 Seungjoon Choi Those slash commands— are you creating slash commands for Claude to invoke?

48:40 Jeongkyu Shin The reason I use commands a lot is sub-agents can’t call other sub-agents.

Operating Sub-Agents and Parallel Workflows 48:49

48:49 Jeongkyu Shin It worked until early to mid last year, but it was removed because it often created infinite loops.

Commands can handle multiple sub-agents in parallel or through chaining. So they’re meant for external use, and you can also call commands from an agent. So it’s in the process of being created.

If you look here, if you just tell it to build, it won’t include stuff like this.

49:08 Seungjoon Choi But when Jeongkyu talked about this after last summer, you said writing specs took 20–30 minutes, and now the style is changing again.

49:18 Jeongkyu Shin I don’t write specs anymore.

49:20 Chester Roh So you’re building up context together through this kind of Socratic dialogue with specs.

49:23 Jeongkyu Shin And the total time is about 30 minutes similarly, but the last 20 minutes are spent—to use a human analogy—grilling it. Let me show you how I grill it.

49:32 Chester Roh From my own experience working with it, even if it’s not strictly necessary upfront, whenever I start a new session, I always have it read everything first and lay down a few turns of conversation so it doesn’t stray from the context and go off track. So that alignment time seems to inevitably get spent at the beginning.

49:50 Jeongkyu Shin Add a feature to save and retrieve for future reference, and make it accessible to agents and skills. The reason for this is that researching from the web every time takes time. Where should we store this?

50:11 Seungjoon Choi So you’re essentially creating a cache.

50:13 Jeongkyu Shin I’ll call it reference. This way, everything researched goes in there, and when creating things later or writing content later, it’ll be based on that content. You could also make a command or agent that automatically updates it or searches for new content and adds it.

Let me try the grilling. If you keep typing, it queues them up. So I usually just keep typing.

To actually build based on this, I need to exit and come back in. Because the skill sets and such that were just created aren’t recognized by default. So I’ll exit and come back in like this. Once it’s done.

50:43 Chester Roh Once it’s done. Jeongkyu, when you use Claude Code or tools like these, what about external harnesses that have been built outside? Like grilling it with TDD, or making it do a massive amount of work, or splitting tasks and distributing them— do you generally not use those?

51:01 Jeongkyu Shin I don’t use any of them. My goal is to reduce my own workload. What I emphasize to my team anyway is the same thing. Like earlier, when you type this and it shows— there is this thing called dev-workflow.

At Lablup, when multiple people collaborate, there are harnesses that standardize what needs to be unified, but beyond that, I recommend starting with methods that handle what you want to process or lighten your own workload first.

51:25 Chester Roh You lay down skills, and just now, what you did first was laying out the background for a certain task— so those things are basically seeding just enough for minimal alignment with the purpose you have in mind. That’s how I’d think about it. You don’t throw in unnecessary extras.

51:44 Jeongkyu Shin Among those watching this, people who code will probably feel it— it’s very similar to coding. The thing I’m coding is—with words, basically programming with words instead of a programming language, and the thing I’m coding with words isn’t the final destination of what I want to code—it’s creating the thing that does the coding. From that perspective, it’ll make very clear sense.

52:05 Seungjoon Choi But when going in a single flow like this, the cognitive load isn’t that high. The problem is, once you get greedy, it’s that a person is running multiple things in parallel.

52:13 Jeongkyu Shin My core approach is making it self-criticize from multiple angles. And the self-criticism doesn’t change the results directly. In the end, think of it as the harness’s self-update that’s running once or twice a day right now.

52:24 Chester Roh If you use this well and stack several harnesses, that’s basically a company. Our company’s work direction lately is exactly that— unit task harnesses, and then an upper harness controlling those harnesses.

We started this project very simply, but as you keep going, it gets complex. Then Jeongkyu says to split it by agent units As you keep going, things get complicated.

So when you split things into agent-level units, where each one takes on its own share of work, roughly how big is that unit?

52:55 Jeongkyu Shin Personally, the rules I’ve set for myself—for coding, for instance— I’ve defined things at the file level. Alright, let me read through the draft once. Let’s read through it together. When it’s this straightforward, there’s really not much to say.

53:08 Seungjoon Choi Anyway, the important thing is that instead of going straight to the output, you first fix the mechanism that generates it.

53:14 Jeongkyu Shin You don’t touch the final output yourself. You just don’t put your hands on it. Even if you want to, you hold back as much as possible, and instead fix the thing that produces it— keep running iterations, and I don’t even do the iterations myself. I give instructions to iterate, and it keeps updating that way.

So that’s one case where it delivers results like this. The second case is if there’s an email I used to send regularly, just give it that email. Extract the tone used when writing that email, or the writing style, or the direction of the content being covered, and then modify the agent to write that way. So the parts that were fixed turned out like this.

It looks like it’s planning to do this with skills only. Oh, should I explicitly specify it as a sub-agent? The reason for specifically creating a sub-agent is for parallel work.

53:58 Chester Roh How many sub-agents are you thinking of running?

54:01 Jeongkyu Shin It depends on the case, but when I’m processing a lot of work, I run up to 50 simultaneously.

54:05 Chester Roh You’re forking 50 sub-agents from a single harness?

54:09 Jeongkyu Shin Especially for identical tasks where each unit needs to be small— for example, “translate 100 documents”— if you get something like that, you assign 4 documents per agent to work in parallel.

You have to do it this way, and the reason you need to specify the count is otherwise the context explodes and it crashes.

For translation tasks, this is from experience, so please have each agent handle only 4 at most.

Then 25 spin up simultaneously.

54:33 Chester Roh Right now you’re showing us a simple project setup, but for a big project—for example, Backend.AI:GO—you follow exactly this same flow, right?

54:43 Jeongkyu Shin That’s right. It’s just much more sophisticated there.

Backend.AI:GO’s Automated Development Pipeline 54:46

54:46 Jeongkyu Shin So there, it periodically cycles through and if there are newly registered issues in the GitHub issue tracker, it validates those issues, or based on the current code, it drafts a ground plan for how to implement them, puts them in a queue, and later other agents pick them up from the queue and run them. That’s how it’s set up.

But this doesn’t use any complex tools— it’s all just cron. Claude has a -p option. It just runs the prompt, and then there’s an option to specify which agent to use. With that, you just run it every 15 minutes.

I made a command that finds such issues and runs them, a command that finds newly incoming issues and uses certain skills to validate the issues and build all of that out, and Claude -p is set up to execute those commands periodically every 15 minutes. That’s how it works.

55:33 Seungjoon Choi Who files the issues?

55:34 Jeongkyu Shin Many people file issues. So after a while, once this gets automated, sending emails becomes automated, and for pull requests, 764 have been processed.

55:45 Chester Roh Good work done there.

55:46 Jeongkyu Shin So when an issue is filed like this, for example, some of the issues I filed came up here, and our team member Jinwon registered an issue— if they registered it like this, the original issue was registered like this, and these are all registered issues. Then Claude Code reads them and analyzes what needs to be done. Once an issue is registered and that registration goes through,

56:14 Chester Roh who picks it up?

56:16 Jeongkyu Shin The one set up in cron to pick it up does. It picks it up like this and develops it.

56:21 Chester Roh It validates it itself, and if it passes, it just handles the merge and submits a PR on its own.

56:26 Jeongkyu Shin Depending on the case, if a lot of testing is needed, it runs the tests itself, and that feedback goes to another development agent that resolves it.

56:37 Chester Roh It looks like a lot, but really it’s the agents going back and forth with each other. But the starting point of issues is human.

56:46 Jeongkyu Shin Sometimes it’s a human, and sometimes I just have it do everything.

56:49 Chester Roh The remaining issues we have now, the unprocessed ones— for example, almost all of them are under Jeongkyu’s ID. Were those written by a human?

57:01 Jeongkyu Shin That’s because I’m the one running it. It’s not a feature attached to GitHub— the ID running this is my GitHub ID, so everything is attributed to me.

For example, if you look here, this was written directly by the AI.

This epic—before this epic was created, there was a feature added that takes screenshots of every screen. So it can see itself, and based on that, I told it to extract everything that could be improved and file separate issues for all of them.

So here you can see the sub-issues branching out like this.

Then this one was finalized once, and it needs to run tests, so it’s running tests. There might be security issues in some areas, or there might be areas that can be optimized— then it runs through those parts and keeps adding them. That’s the approach. It’s just automated like this.

57:51 Seungjoon Choi Out of all this, how much do you actually read?

57:55 Jeongkyu Shin I read all the issues. When an issue is resolved, it’s set up to generate a report that I can read about that issue. If I need to read something, for example, it was written on January 16th like this, and there was this problem. That was the problem, and normally it logs on every issue, but I often read and delete them. If they’re not needed.

So it evaluated security, how performance was, how quality was, what technical decisions were made, implementation changed Bash, did local changes, changed Python too.

For me as a human to keep up with it, I set up things like “you need to learn this.” “Study these things.” This tech report serves as a report to me, but also—it might make technical choices that I don’t know about. Not a single detail.

In that case, it also has a feature that tells you what you need to study, in the background.

Tech Report — When AI Assigns Homework to Humans 58:44

58:44 Jeongkyu Shin In reality, the actual coding was just this much, but the report is this long.

58:48 Chester Roh What you’re showing us right now— this workflow, this flow of work that Jeongkyu built— if instead of Backend.AI:GO being attached here, you attach finance, marketing, and content, then a company could run like this.

59:02 Jeongkyu Shin Exactly. In our case, for example, the technical business report, the technical business— you write this year’s business plan, right?

Up until last year, humans wrote the business plan, but starting this year, humans keep having discussions, but the actual writing itself isn’t done by humans.

Because we, for example, threw in over 250 reference documents— we even built a tool to convert them to markdown, and based on that, it keeps doing consistency reviews.

Then for 2026—this year, this month being February 2026—we also have it continuously crawling news, checking if this direction is right, which of our past predictions were correct and which were wrong, how we should revise this year’s technical business plan— it gives all the suggestions, keeps running self-reviews like that, and leaves discussion points for us.

That’s how it works, and this is being used by me and, for example, our CFO. Our CFO can’t code. But now they commit. I made a command called “sync.”

59:55 Chester Roh They’re supervising, while the agent does all the work. Claude Code splits the work together.

Non-Dev Teams Adapting to AI — CFO and Content Creator Cases 1:00:00

1:00:00 Jeongkyu Shin That’s how it’s set up, and it tells you what to study. So I study a lot.

1:00:04 Chester Roh But this workflow also didn’t start with just “hey, I want to build something like this” in a single line.

Like you showed us at the beginning, you went through all these details step by step by step, aligning the purpose you want with the context of what the agent needs to do— faithfully nailing down the basics through that whole process.

That’s really the important point.

1:00:26 Jeongkyu Shin It’s been quite a long time since I started building new ones. Anyway, making it fit my own hands was my top priority, but recently Claude launched a feature called the marketplace.

A plugin marketplace—where you bundle these harnesses and publish them.

So for those who want to use them, we’re releasing them internally only within the company. There are parts that are quite coupled with our internal systems.

1:00:48 Chester Roh I have a business question here— right now the CEO has deep understanding of this philosophy, the implementation, the direction the world is heading, and knows exactly what this can deliver, so this all came together at once.

But how fast is the organization keeping up with this? I know the company is full of talented people, but even so, some people intuitively get it and others struggle to adapt— there’s probably a gap even internally.

What’s the change like among the humans? I’m curious about that.

1:01:22 Jeongkyu Shin As you said, it varies a lot from person to person. And fortunately, we’ve brought in a lot of great people, so we’re doing this together, but there are those who actually go through harder times. Because the things they thought they were good at and the things they need to be good at used to be the same, but at some point those diverged. But the good thing is, since we talk about this so often internally— whether in seminar format, or just in passing over meals— we’ve been talking about these topics for nearly a year now, so rather than it just ending with a sense of crisis, people are now trying their best to adapt.

It’s inevitable. People who were good before— there’s obviously no guarantee they’ll be good after this. In many cases, people starting completely from scratch end up doing better, and we’re feeling that too. But at the same time, that’s based on where things are right now. Will it still be true three months from now? I’m not sure. Because with AI, internally we just say “two months.”

If something doesn’t work right now, it just doesn’t work right now— unless you absolutely need it done now, just defer it. That was actually the concept that took the longest to spread within the company. The most recent person to accept that was our CTO. I kept telling them to defer, since it wasn’t working yet. We had deferred things that way once before. So not long ago they snapped and said, “How long are we going to keep deferring?” So when 4.6 came out, I told them to try it now, and after trying it they said, “Oh, it works now”— and they had a real moment of enlightenment.

1:02:50 Seungjoon Choi That’s an awakening moment.

1:02:50 Jeongkyu Shin After that awakening, they completely disassembled HWP. So internally, nobody manually uses HWP documents anymore.

1:02:58 Chester Roh You should turn that into a service quickly— wouldn’t that be good for Korea’s public sector?

1:03:04 Jeongkyu Shin I’m not sure. Would it really lead to good results? I think everyone has one or two things like that themselves.

1:03:09 Chester Roh Right. Those are the hidden tips each company keeps. Little tips. Because there’s time invested in acquiring those tips. That time advantage is what currently serves as a company’s competitive edge in many cases.

You can’t just download from the skill marketplace and have an entire business wiped out— you need a few hidden tricks tucked away to survive.

1:03:33 Jeongkyu Shin The way I see it, this is about how much people trust Claude, not about the tool itself being important. If someone wonders, “Hey, can you analyze HWP and build something that edits it?”—and asks that question— getting that answer isn’t something that takes a long time.

Where Has Lablup’s Core Value Shifted? 1:03:49

1:03:49 Chester Roh Here, Jeongkyu, I’d like to ask you this question— it might be somewhat self-contradictory, but Lablup as a company had its strength in that high-level knowledge, that vision for the era, and being composed of outstanding implementers.

But looking at what you’re describing now, even within Lablup itself, quite a lot of what we previously considered our company’s unique strengths have disappeared overnight in many cases, right? So from the company’s perspective, “what’s gone, and our value as a company lies here”—I think you’ve defined that. This is where we need to head to survive.

1:04:34 Jeongkyu Shin Yes. The biggest change is—for example, the project, the product we’ve been building, we’ve been refining it for nearly 10 years. It’s a tool we started building back when asyncio didn’t even exist.

For instance, if we were to rebuild our tool from scratch with today’s technology and today’s AI, how long would it take? Knowing everything we know now— I think about 3 months. Because we know all those countless edge cases and problems. How long would it take without knowing? That would be harder.

Because those countless edge cases actually mostly come from the diversity of installation environments. There are aspects beyond imagination. The company’s goal this year— especially what we call the “fast track” for MLOps and the internal Backend.AI core—the goal is to focus not on interfaces for humans, but on interfaces for AI.

Pivoting to an Interface for AI 1:05:19

1:05:19 Jeongkyu Shin We have CLI, GUI, everything, but the fundamental question we shared in the second half of last year, last winter, was: is this really a tool that humans will use? In the future too. So the goal is, if possible by the first half of this year, to make it the tool that AI can use best. That’s the goal.

1:05:37 Chester Roh The definition of the customer—not a human, but something smarter than a human, or an agent that’s been delegated work by a human— you think it will come calling on us as a tool.

1:05:49 Jeongkyu Shin For example, distributing skills together, making it so that agent can read and use them, and outputting the various things used internally in a format that AI can more easily understand, or in the CLI, making it so that even with arbitrary commands, it can infer what to do even without knowing exactly— formatting things in a way that enables that— those are the first changes.

Our purpose is to create models— for example, if you’re training a model with Backend.AI, the purpose is to create the model, not how to allocate resources— once that becomes a one-click thing, it’s not what matters anymore. “Hey, just build me a model that surpasses such-and-such”— it’d be nice if it just handles everything on its own. Just let us know how much you think it’ll cost, how much it’ll take, and that’s it.

Focusing on that is the first change, and the second change is, as I mentioned earlier, we explained that the definition of software is changing from our perspective. Code isn’t at the center anymore—models are at the center. So when you ask what the core of Backend.AI is, the core of Backend.AI will also become a model. It’s not a foundation model per se, but it’s a model that manages AI resources extremely well and can handle specific tasks.

And that model needs to be able to run at various scales in various places, so our research team is building models. And that model itself— Backend.AI currently has, for example, execution environments, specs like how much RAM, how much CPU, right? Just like specs needed to run software, there’s how much CUDA memory you have, and running this system itself has the model runtime embedded within that startup pipeline— that form is being tested in the next major version, and I think the official release will be the major version after that. We’re entirely assuming the AI model itself as part of Backend.AI Enterprise. What it does is essentially a lot of what Backend.AI was already doing. We’re trying to evolve.

1:07:42 Seungjoon Choi Speaking of evolution, that reminds me of the Cyber Formula analogy you sometimes post on social media.

The Cyber Formula Analogy — Claude Code vs Codex Philosophy 1:07:49

1:07:49 Seungjoon Choi Could you talk about that? The human augmentation part.

1:07:52 Jeongkyu Shin I’ve been posting about that a lot recently. When I was using both Codex and Claude Code, the difference I felt was exactly that. Claude Code is designed to ask me as much as possible, whereas Codex doesn’t really trust me. When you listen, you get the sense that it’s like, “I know the right answer, so you should just trust this,” and I think the philosophies of both companies come through in that.

If you look at how Claude Code has been evolving, for example, when some choice is needed, it creates multiple-choice options, like 4 choices or 3 choices, and they’ve built and provided that kind of mechanism. Recently, it’s even been pre-completing the next question I might ask and suggesting it in a ready-made form. So you can just press tab and the next question comes up. That way, it asks more about my intentions, keeps aligning with me, tries to more clearly understand the context that a person only vaguely holds, and evolves to operate in that direction.

Codex, to put it simply, is evolving toward “I’ll just handle everything for you.” And it actually does it well. If you ask which one has a higher ceiling, I’d say 100% Codex has the higher ceiling. But what feels more comfortable for people is Claude Code. Because Claude Code goes through the process together with me.

There was an anime that was popular when I was young, called Cyber Formula. It’s about racing, and there’s a protagonist who races together with an AI. The AI is dumb at first, can’t even drive well, but the protagonist depends on it to learn how to drive, and the AI also, in the process, gets hints from the human’s mistakes and creates new methods, so they co-evolve together. Each series solves different problems.

Within the concept of humans and AI co-evolving, for example, one theme was how to beat a driver who is far superior as a human, then when humans enter a domain they’ve never experienced, how the AI assists and helps solve that, then AI completely replaces the driver, and how do you beat an opponent whose AI is driving— the themes shift like that, and only in the very last series does the protagonist change.

Someone lends the new protagonist a car, and in the previous series, the person who had drugged the human pilot and had AI drive completely—that person gives them a car saying it’s one his older brother made. “Use this.” It was originally designed so the driver only follows the AI’s commands. The original model doesn’t trust humans at all. It assumes humans obviously can’t drive, so AI obviously drives better, and so when the AI judges “the human should do this by now” and takes control, the human can’t keep up with those movements, so accidents keep happening.

But the protagonist has changed to a new one. That protagonist struggles terribly with that car but wins in the end. That protagonist had the ability to keep up with the AI, and the AI wanted to win—“I don’t know about the future, but I need to win right now”— so you could say it was an AI that developed competitive spirit.

Up until then, we’d been talking about AI with purpose, but the thoughts I had watching that series long ago come up frequently these days. What does it mean for an AI to have competitive spirit? Usually what AI lacks is will. When we do agentic coding, the human’s role is to direct what should be done and in which direction. Then it does that faithfully and relentlessly well. But the reason it sometimes does strange things faithfully and relentlessly without understanding the context is because that agent doesn’t have a clear sense of purpose.

But in that anime, an AI that gained a sense of purpose, when paired with the right person, could beat the human who co-evolved with a different AI—that was the conclusion, and that’s been on my mind a lot lately. Especially when I look at Codex.

1:11:41 Chester Roh So the car that won at the very end is Codex, and the original one was Claude Code—is that the analogy?

1:11:48 Jeongkyu Shin Both evolved in different directions, but the car the original protagonist used to drive, called “Asurada,” kept co-evolving together, while the AI that dropped out of nowhere thinking it could achieve the best driving performance and couldn’t understand humans was a car called “Ogre,” and it gives a similar vibe to how Codex operates.

In a way, the philosophies of the people who create AI or build these systems, their design philosophies are somewhat embedded in it, and it’s interesting that even those things have been explored multiple times in the nearly 100 years of science fiction we’ve accumulated, across various media—literature, anime, film— that’s what I was thinking.

1:12:29 Chester Roh Thinking about it logically, that simulation must have felt like a high-probability future scenario. From the writer’s perspective, after countless debates and thought experiments, they must have arrived at that scenario. It’s also interesting that many movies, comics, and such things align closely with the direction of the future we’re envisioning.

Startup Opportunities in the AI Era and the Water Wheel Theory 1:12:47

1:12:47 Chester Roh I think we should wrap up with just one or two more questions.

You spoke about it rather calmly earlier, but the accumulated time and assets that Lablup has built over 10 years— some of them became meaningful tacit knowledge, and the rest became things that can be done with a click. That’s interesting, but it’s also a sad reality for a company.

1:13:08 Jeongkyu Shin I don’t really feel sad about it. As a company, I don’t think it’s particularly sad. As a person, it is a bit sad.

For example, you started a company. You put in this level of effort. In that situation, what used to be so hard is now so easy.

That’s the same feeling I had when I was making Text Cube and then built Backend.AI:GO, that kind of feeling.

As a company, how should we take this? Well, as a company, it feels like “okay, thank you.”

1:13:37 Chester Roh Why is that?

1:13:38 Jeongkyu Shin There are two reasons. One is that, fortunately, our company adapts very quickly. When the table gets flipped like this, startups have a huge advantage. Being a startup is itself an advantage. Compared to what we’ve built up, we can make turnarounds or course corrections much faster. And the time it takes for the entire organization to adapt will be faster than other places—that can work as an opportunity, a new opportunity, that’s one thing.

From a startup’s perspective, the worst situation is when the market is stable and fixed. It’s better when the table is somehow shaking, when new opportunities are somehow emerging. If I were in an incumbent position right now, I might be asking “what do we do,” but the way I see it, we don’t have much yet. All we have is technology, and when technology gets leveraged— what’s the problem with that? I think we’re adapting really well.

We’re also doing an enormous amount of model development, including model development for our Backend.AI, and because of that, I think we’ve also started building what’s needed for this future more quickly.

The second thing is about brand. Ultimately, this shaking of the table will settle down at some point. As with everything so far, an acceleration phase can’t accelerate endlessly. But if the future isn’t, say, AI giving me basic income and me just living off that, but rather some different kind of situation opens up, then ultimately the position you’ve reached in that situation will likely be very similar to your position afterward.

For example, let’s look at clothing. Since you’re making cosmetics now—it’s the same with cosmetics— there are important ingredients, technological innovations, things like that. But from a cost perspective, there isn’t a huge difference, right? Among all cosmetics, all clothing. For example, if I were to buy a handbag, is this handbag really worth a thousand times more than that one? No.

Where it’s clearly visible is computers. Even though people say Apple makes the best computers, they don’t cost 10 or 20 times more than other computers. Because the cost structure is very transparent, and because of that, there are ultimately many similar tools, and there could be many tools that claim to do similar things, but ultimately it comes back to brand and the track record you’ve built over time— I think an era where that becomes the core competitive advantage will come again.

In that regard, we’ve done well to protect various aspects over a long period of time, and if we can adapt without missing the beat in that process— it reminds me of the old days. Like “Brand Yourself”—an era where brand becomes core competitive advantage will probably come again during the stabilization period.

That’s why we see this as a really good opportunity, but personally, yes, it is sad.

1:16:18 Chester Roh The personally sad part—having experienced these 10 years first, and then frontier models already know most of it, so there are areas that can be created with just a click.

But in reality, through all the trials and tribulations, there’s a kind of tacit knowledge that you and the company have built up, things that nobody else knows, acting as a moat, and serving as context that others can’t provide— is that how we should see it?

1:16:44 Jeongkyu Shin Fundamentally, our solution isn’t for running stable workloads on stable hardware. GPUs are incredibly unstable, including the network—especially as you move toward the latest enterprise GPUs from NVIDIA or AMD, the defect rates are way too high, and there are too many unpredictable things.

Ultimately, it’s quite an interesting structure. You can’t trust the unstable hardware, you can’t trust the model training software running on top, you can’t trust anything— and our solution makes it operate as if you can trust all of these things, so you could say it’s fundamentally different. That’s a bit of an advantage.

That ultimately comes down to how many edge cases you’ve stepped on— that’s the core competitive advantage, and it’s somewhat similar to autonomous driving.

1:17:26 Chester Roh You just mentioned Tesla—like what Tesla has observed, or the things Lablup has experienced as a company over the past 10 years, and then entering this AI space— we’re all anxious, but how we convert what we’ve built into an advantage and transform it into brand equity to ultimately win—from a strategic perspective, I think this is a really great example.

And whether it’s software or the industries at the forefront, at least those most impacted by AI, since this is happening in those industries, similar things will happen in other industries in the same way. Just like Claude is coming in and disrupting legal and finance and things like that. The dynamics of this company that we’re talking about right now, where company value ultimately comes from, these things will spread to other industries in exactly the same way, I think.

The Worst Thing for Startups — The Age of Replication 1:18:20

1:18:20 Seungjoon Choi What’s the worst thing for a startup?

1:18:23 Jeongkyu Shin The worst thing for a startup is stagnation. Are you asking about startups in general right now?

1:18:28 Seungjoon Choi When you include the current AI or IT industry, things like brand value for startups, the experience they’ve accumulated, and the shaking of the landscape— these can be both a crisis and an opportunity, so I was curious—what’s actually bad for startups?

1:18:45 Jeongkyu Shin Replicating any item has become way too easy.

1:18:48 Chester Roh If you can’t answer the question of what can’t be replicated— whether in terms of timing or some combination of items— if you can’t provide an answer with those, you could get swept away by other people’s one-click solutions.

1:19:01 Jeongkyu Shin Recently, someone on Facebook replicated NotebookLM and it only took four days—seeing that was a huge revelation. They took screenshots of every NotebookLM screen, listed out all the features and fed it in, and in about four days, a clone came out.

In this era, if you pick items the traditional startup way, replication is just too easy. That’s the biggest issue.

Conversely, companies whose business is replication itself could do very well.

1:19:29 Chester Roh But a lot of smart people are choosing to quickly follow what others have done— going the better, faster, cheaper route. But having capital advantages or school pedigree advantages, being able to hire better talent because of those things—if you could control everything or become the dominant player with that, that used to be meaningful. But now we’re in a world where everyone has access to those resources, so just replicating alone makes it hard to convince customers that “this one is superior” or “this one is decent.”

Something beyond that is needed. About what that “beyond” is—that’ll come up again too. About that “beyond,” we’re also figuring it out as we do these things, We’re still searching for those answers.

What I’ve found so far is that a slight time gap combined with a tacit knowledge gap— if these two are well combined to quickly capture these customers, of course someone’s single click could make it disappear, but there are things that resist that single click— once you hold customers’ data hostage, it becomes very difficult for them to go elsewhere. I think now is the time to think about business perspectives along those lines.

1:20:37 Seungjoon Choi “Click resistance,” “anti-click”—those phrases come to mind.

1:20:41 Jeongkyu Shin What I was trying to say is, there’s a place called Google Startup Campus, and I used this expression when I gave a talk there before— I ultimately see business as a question of where you install your watermill. You install the watermill where the drop is greatest to spin it fast, and when the water seems like it’s about to run out, you either move it skillfully or choose a different approach.

The area where the time gap that Chester mentioned earlier is most likely to occur is probably not within the IT field itself. IT plus something— those areas will naturally be slower to follow, and while people previously assumed these would never become IT territory, now that AI can interpret context, the areas and fields that will enter IT’s domain— won’t the startups that install watermills in those fields do well?

1:21:26 Chester Roh I asked at the very beginning— my kid is finishing military service and going back to school, and I asked what on earth should I have him study to be the safest right now. And Jeongkyu made that point. He said that rather than people in a domain learning about AI and these systems, people who already have deep knowledge of AI systems and such would learn a domain much faster. So paradoxically, right now, studying computer science, CS, would be a good choice— that’s what he suggested, and I think what we’re discussing now connects to that as well.

In other words, people in specific domains need to learn CS quickly, and people in CS need to quickly find the time gaps in domains where they can apply their skills— which I think fits perfectly with Jeongkyu’s watermill theory. Jeongkyu, if you have more to say on that, please go ahead.

A Rebuttal to “CS Is Useless” 1:22:19

1:22:19 Jeongkyu Shin There’s this “computer science is useless” narrative that pops up sometimes, and I find it rather amusing. Because I entered university in 2000, and I was a physics major but also double-majored in computer science. When I attended classes, there were hardly any professors who had actually come from computer science backgrounds. That’s because the generation above them didn’t have computer science departments, so becoming a professor in computer science meant people from other departments who used computers extensively essentially created the discipline. So there were professors from chemistry, materials science, physics, and they came to the early computer science departments. Now most professors are people who were trained in standalone computer science departments.

What this means is that from its very inception, computer science was half built on a methodological philosophy, and while we do distinguish between computer science and computer engineering, fundamentally it’s a very young discipline— relatively speaking, compared to many other fields. Because of that, I think the pace of change will also be very fast. How long will it remain a department that teaches Python, a department that teaches C, a department that teaches architecture and operating system development? They’ll keep teaching these things, because they’re enormously important, but I don’t think it will stay in this form forever. For example, things like neuro-computing, or developing models and understanding what the core of those models is, what attention architecture is, what it means to build different things— these topics have been entering the curriculum a lot recently. In fact, courses like “Introduction to Deep Learning” are being added to curricula, so in a way, the department that can adapt most quickly to such changes is computer science. I believe this even though I’m a physics major myself, and of course, physics majors these days have more places to go, which is nice.

In this process, people say, “AI does everything for you, so why bother studying computer science? Can’t I just tell the AI to do it and get it done faster?” But when I actually ask people around me who work in the industry, they tend to worry more in the opposite direction. For example, because AI does everything, “won’t my job disappear?” Even in filmmaking— Seedance can generate videos like that, so what are film directors and cinematographers supposed to do? That kind of sense of crisis. In that vein, for probably the next five years or so, IT—while it’s already been deeply integrated and widely used—will enter on an enormous scale into the very core of society as a whole, I believe.

In the short term, people say, “Why learn programming? Computers write all the programs for you.” But what you learn in computer science isn’t just programming. It’s the method of building logic. How the simplest gate logic is constructed and evolves into countless layers of logic— should I call it the methodology or philosophy behind that? I think it’s closer to learning that kind of thinking structure, and that’s why such people can learn other fields quickly too. Especially in an era where you can outsource knowledge, I think that’s even more the case.

1:25:19 Chester Roh We’re saying all this, but I should definitely add a footnote: this could all change in two months. Let me make sure to put that disclaimer on the record.

Closing Remarks 1:25:29

1:25:29 Jeongkyu Shin We all need to join hands and get along well with AI.

1:25:33 Chester Roh We’re heading toward the future, but it’s the Lunar New Year holiday, and I think we need to get back to spending time with our families.

1:25:40 Seungjoon Choi It was a long session, but it was fun.

1:25:41 Chester Roh I learned a lot again today. Alright then, enjoy the holiday. And when you come to Seoul again, we’ll see you then.

1:25:48 Jeongkyu Shin Happy tinkering for the rest of the holiday!