Sem: New primitive for code understanding – not LSPs, but entities on top of Git
Posted by rohanucla 3 days ago
Comments
Comment by andai 3 days ago
$ sem impact authenticateUser
⊕ function authenticateUser (src/auth/login.ts:26)
→ depends on: db.findUser, rateLimiter.check
← used by: loginRoute, authMiddleware
! 42 entities transitively affected
ᛋ 7 tests affected
Okay that is pretty cool. I appreciate this information as a human also.I got about halfway through reinventing something like this last year (minus the git part). I was trying to make a graph of dependencies in the codebase. (I actually got pretty far with a regex!)
Comment by Scaevolus 3 days ago
Comment by rohanucla 3 days ago
Comment by rohanucla 3 days ago
Comment by jawns 3 days ago
But there are no instructions for how to reverse those actions if you don't like the tool. Feels a little user-hostile to me.
Comment by rohanucla 3 days ago
Comment by twhitmore 2 days ago
Comment by bendjejdjdh 3 days ago
Comment by Brian_K_White 3 days ago
What an asshole! Plus the uninstall steps were completely inconsiderate single 2 word command. Outrageous.
I can't even think of a better possible response.
Comment by opem 3 days ago
Comment by myko 3 days ago
Comment by QuercusMax 2 days ago
Comment by hankbond 3 days ago
Can you describe what ways this might be beyond just breaking up code into smaller functions?
An example of this is that Models tend to create unit tests that are mostly just mock + reimplementations of imperative code in the functions they test. If you could force behavioral testing by only allowing test creation agents to accessing the function docstring, name/args/types, branch statements and log events, you could potentially avoid these classes of weak tests being created. But that would mean that your code has to optimize to providing signal via those elements.
This is just an example I'm not sure that would actually work.
Comment by hankbond 3 days ago
I don't know if you can reliably do that with static analysis tho. I would be interested in some sort of debug attachment like process that does a code coverage type evaluation. If you can't tell this is at least on the edge of (if not past) my depth of expertise
Comment by rohanucla 3 days ago
We're on the structural side right now with call graphs and dependency edges, but a hybrid approach that combines the static graph with runtime instrumentation to fill in the gaps is definitely something I'd love to explore. Thanks for the feedback.
Comment by hankbond 3 days ago
I'm sorry for distracting from your engaging and thoughtful reply but I can't help but giggle at the name of this concept.
Comment by insensible 3 days ago
Comment by rohanucla 3 days ago
Comment by rohanucla 3 days ago
Things with LLMs break because our infra was always designed for analyzing lines(tools like grep fuzzy matching) and working on quite small sections of code. LLMs struggle with this in cases when they have to analyze different parts of a codebase they either get too much context where you're throwing whole files at them, or too little where they only see the function in isolation, with no real understanding of how the pieces actually connect to each other.
That's really the gap sem is trying to fill. With sem impact you can give an agent the precise blast radius of a change instead of guessing which files matter, and sem diff --patch lets you enforce that a change only touches specific functions and reject anything that bleeds outside that boundary something that's really hard to do with line-level diffs.
Your testing idea is actually closer than you might think. sem already extracts entity signatures, dependencies, and call graphs, so you could build a harness that gives the test-writing agent only the function signature with its dependency graph and behavioral contract, while withholding the implementation entirely. That would force the agent toward behavioral tests because it literally can't see the internals to mock them. I haven't built this harness myself yet but sem graph and sem inspect expose everything you'd need.
The general principle is that sem gives you a structural map of the codebase to both constrain and validate what the model produces, rather than treating code as flat text and hoping the model figures out the relationships on its own.
Another usecase can be about figuring out dead code present in the codebase.
Edit: Also one last thing because I started working on this while solving the fundamental issue of why merge conflicts were occuring with git, so you might also like the merge drive I open sourced on the same Github org - Weave
Comment by hankbond 3 days ago
I think this in apt and concise description of what this is trying to accomplish. I'm feeling like we had some really great gains in Model improvements both at the top end and the bottom over the last 6-7 months, but the next period is likely to be defined by harness improvements. I appreciate that your effort is being applied to this particular problem set because I think its far more fundamental to improving agentic performance in code bases than yet another memory framework.
Comment by cpard 3 days ago
Comment by rohanucla 3 days ago
Comment by gwerbin 3 days ago
Then there are plots saved as images which have basically no structure at all exposed.
Comment by cpard 3 days ago
What questions I'd like to answer with the diffing is more like: will the grain go from one-row-per-user to one-row-per-user-per-day, will a key stop being unique, will a join start fanning out and quietly double a measure, will something additive become non-additive.
This diff is over structure but this structure is latent in the transformation that produces it and to make things harder, if we are talking about some declarative language being used (e.g. SQL) the code doesn't even describe how things are getting done, but what the output would be.
What I've ended up doing is recovering the structure from the code by analyzing it and then using * cheap * profiling than a full row compare.
As an example, my equivalent impact sub-command output would be something like this: "this change makes account_id non-unique three models downstream"
Comment by gwerbin 3 days ago
Comment by appplication 3 days ago
We have some custom data diff tools at my ultracorp that provide a browsable interface, but the customer tends to be more operations folk than engineers or DS etc who would be more familiar with actual version control concepts. But these work against the data store and not on something like csv or parquet.
Comment by gwerbin 2 days ago
I consider that a completely distinct use case from, say, Iceberg tables in S3.
Comment by alex7o 2 days ago
Comment by rohanucla 2 days ago
Comment by mcintyre1994 3 days ago
Comment by rohanucla 3 days ago
Comment by OJFord 3 days ago
Comment by mcintyre1994 3 days ago
Comment by rohanucla 2 days ago
Comment by docheinestages 3 days ago
Comment by rohanucla 3 days ago
Comment by qudat 3 days ago
I think there’s an opportunity to use an AST diff system for code forges where you don’t present the user with line diffs in the UI — or at least not as the first diff the user sees.
I firmly believe code review should happen in your editor.
Comment by rohanucla 3 days ago
Comment by znnajdla 3 days ago
Comment by rohanucla 3 days ago
If you want to change your git diff default behavior then you can do sem setup.
Comment by znnajdla 3 days ago
Comment by rohanucla 2 days ago
Comment by dboreham 3 days ago
Comment by globnomulous 3 days ago
Comment by rohanucla 3 days ago
Comment by awoimbee 3 days ago
Comment by rohanucla 3 days ago
I can also give my thought process, because I was more interested in figuring out the model's inherent search results and understanding without sem.
Comment by RestartKernel 2 days ago
Comment by rohanucla 1 day ago
Comment by paolomainardi 2 days ago
Comment by rohanucla 2 days ago
Comment by Animats 3 days ago
Comment by rohanucla 3 days ago
So instead of line level analysis the whole granularity of seeing changes and tracking thing shifts to entities. It helps in attention mapping of your agent and lets you track the changes faster.
LSPs have been doing it for quite long but using treesitters is faster even tho type awareness is not great with this approach but overall working across multiple languages with a single tool can be quite helpful.
Comment by ssivark 2 days ago
2. What would it take to add a new language? I'm interested in using this with Julia.
Comment by throw1234567891 3 days ago
Comment by rohanucla 3 days ago
Comment by onlyrealcuzzo 3 days ago
> AI agents are 2.3x more accurate when given sem output vs raw line diffs. See the benchmark.
No... This is not convincing of anything. These are not real world tasks.
You're trying to pretend like your tool makes AI agents 2.3x better at coding or bug fixing.
It doesn't.
Your benchmark doesn't prove that.
Your tool is cool. Sell it for what it is. Not for what it's not.
Comment by rohanucla 3 days ago
Comment by jiggunjer 3 days ago
Sometimes agent makes a monolithic commit and it's a lot of work to manually split code you didn't write. After such an auto split I can manually squash related revs into feature/ticket level.
Comment by felixlu2026 3 days ago
Comment by eddysir 3 days ago