Clojure: Transducers
Posted by tosh 2 days ago
Comments
Comment by drob518 6 hours ago
Note: I’m not the author of Injest, just a satisfied programmer.
Comment by bjoli 7 hours ago
I know a lot of people find them confusing.
Comment by matrix12 2 hours ago
Comment by jwr 4 hours ago
The fact that transducers are fast (you don't incur the cost of handling intermediate data structures, nor the GC costs afterwards) is icing on the cake at this point.
Much of the code I write begins with (into ...).
And in Clojure, like with anything that has been added to the language, anything related to transducers is a first-class citizen, so you can reasonably expect library functions to have all the additional arities.
[but don't try to write stateful transducers until you feel really comfortable with the concepts, they are really tricky and hard to get right]
Comment by adityaathalye 6 hours ago
Demo One: Computation and Output format pulled apart
(def natural-nums (rest (range)))
(def fizz-buzz-xform
(comp (map basic-buzz)
(take 100))) ;; early termination
(transduce fizz-buzz-xform ;; calculate each step
conj ;; and use this output method
[] ;; to pour output into this data structure
natural-nums)
(transduce fizz-buzz-xform ;; calculate each step
str ;; and use this output method
"" ;; to catenate output into this string
natural-nums) ;; given this input
(defn suffix-comma [s] (str s ","))
(transduce (comp fizz-buzz-xform
(map suffix-comma)) ;; calculate each step
str ;; and use this output method
"" ;; to catenate output into this string
natural-nums) ;; given this input
Demos two and three for your further entertainment are here: https://www.evalapply.org/posts/n-ways-to-fizzbuzz-in-clojur...(edit: fix formatting, and kill dangling paren)
Comment by pjmlp 6 hours ago
Comment by talkingtab 5 hours ago
I tried to implement transducers in JavaScript using yield and generators and that worked. That was before async/await, but now you can just `await readdir("/"); I'm unclear as to whether transducers offer significant advantages over async/await?
[[Note: I have a personal grudge against Java and since Clojure requires Java I just find myself unable to go down that road]]
Comment by jwr 4 hours ago
Transducers are not new or revolutionary. The ideas have been around for a long time, I still remember using SERIES in Common Lisp to get more performance without creating intermediate data structures. You can probably decompose transducers into several ideas put together, and each one of those can be reproduced in another way in another language. What makes them nice in Clojure is, like the rest of Clojure, the fact that they form a cohesive whole with the rest of the language and the standard library.
Comment by jnpnj 4 hours ago
Comment by justinhj 5 hours ago
Comment by css_apologist 4 hours ago
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
both solve the copying problem, and not relying on concrete types
Comment by bjoli 3 hours ago
They are like map, filter and friends, but they compose. I think of iterators as an iterator protocol and transducers as a streaming protocol. An iterator just describes how to iterate over a collection. Transducers are transformations that can be plugged into any point where data goes in one direction.
Comment by solomonb 4 hours ago
Comment by Veedrac 1 hour ago
Most imperative languages choose one of two things, internal iteration that doesn't support composable flow control, and external iteration that does. This is why you see pause/resume style iteration in Python, Rust, Java, and even Javascript. If that's your experience, transducers are a pretty novel place in the trade-off space: you keep most of the composability, but you get to drive it from things like event sources.
But the gap is a bit smaller than it might appear. Rust's iterators are conceptually external iterators, but they actually do support internal iteration through `try_fold`, and even in languages that don't, you can 'just' convert external to internal iterators.
Then all you have to do to recover what transducers give you is pass the object to the source, let it run `try_fold` whenever it has data, and check for early termination via `size_hint`. There's one more trick for the rare case of iterators with buffering, but you don't have to change the Iterator interface for that, you just need to pass one bit of shared state to the objects on construction.
Not all Iterators are strictly valid to be source-driven, and while most do, not everything works nicely when iterated this way (eg. Skip could but doesn't handle this case correctly, because it's not required to), but I don't think transducers can actually do anything this setup can't. It's just an API difference after that point.
Comment by waffletower 4 hours ago
Comment by solomonb 3 hours ago
Comment by waffletower 3 hours ago
Comment by vindarel 5 hours ago
Comment by jwr 4 hours ago
Comment by BoingBoomTschak 3 hours ago
Comment by thih9 7 hours ago
https://web.archive.org/web/20161219045343/https://clojure.o...
Comment by adityaathalye 7 hours ago
Comment by whalesalad 6 hours ago
Comment by JoshCole 2 hours ago
In 2016, Clojure was not great for serious data science. That has changed substantially and not just via Java Interop.
- It now has cross ecosystem GPU support via blueberry libraries like neanderthal, which in benchmarking, outperform some serious Java libraries in this space.
- It has columnar indexed JIT optimized data science libraries via cnuernber and techascent part of the Clojure ecosystem. In benchmarking they've outperformed libraries like numpy.
- The ecosystem around data science is also better. The projects aren't siloed like they used to be. The ecosystem is making things interoperate.
- You can now use Python from Clojure via the lib-pythonclj bindings. In general, CFFI is a lot better, not just for Python.
- The linters are way better than they used to be. The REPL support too.
Clojure already had one of the best efficiency scores in terms of code written to what is accomplished, but now you also get REPL integration, and LLMs have been increasingly capable of leveraging that. There are things like yogthos mycelium experiments to take advantage of that with RLLM calls. So its innovating in interesting new ways too, like cutting bugs in LLM generated code.
It just doesn't feel true to me that innovation isn't occurring. Clojure really has this import antigravity feel to it; things other languages would have to do a new release for, are just libraries that you can grab and try out (or maybe that's the python)
Comment by uxcolumbo 1 hour ago
Are there any benefits of using it over Python?
And how is the interop with Python libs?
Comment by seancorfield 5 hours ago
Clojure 1.10: datafy/nav + tap> which has spawned a whole new set of tooling for exploring data.
Clojure 1.11: portable math (clojure.math, which also works on ClojureScript).
Clojure 1.12: huge improvements in Java interop.
And, yes, the new CLI and deps.edn, and tools.build to support "builds as programs".
Comment by vaylian 4 hours ago
Comment by whalesalad 5 hours ago
Comment by jwr 4 hours ago
Comment by waffletower 4 hours ago
Comment by whalesalad 1 hour ago
Comment by iLemming 5 hours ago
Oh, really? Zero, eh?
clojure.spec, deps.edn, Babashka, nbb, tap>, requiring-resolve, add-libs, method values, interop improvements, Malli, Polylith, Portal, Clerk, hyperfiddle/electric, SCI, flowstorm ...
Maybe you should've started the sentence with "I stopped paying attention in 2016..."?
Comment by instig007 2 hours ago
Tape-patches for self-inflicted language design issues isn't innovation, lol
Comment by eduction 6 hours ago
While the mechanics of transducers are interesting the bottom line is they allow you to fuse functions and basic conditional logic together in such a way that you transform a collection exactly once instead of n times, meaning new allocation happens only once. Once you start using them you begin to see intermediate collections everywhere.
Of course, in any language you can theoretically do everything in one hyperoptimized loop; transducers get you this loop without much of a compromise on keeping your program broken into simple, composable parts where intent is very clear. In fact your code ends up looking nearly identical (especially once you learn about eductions… cough).
Comment by fud101 6 hours ago
Comment by moomin 6 hours ago
The real thing to learn is how to express things in terms of reduce. Once you've understood that, just take a look at e.g. the map and filter transducers and it should be pretty obvious. But it doesn't work until you've grasped the fundamentals.
Comment by eduction 5 hours ago
(->> posts
(map with-user)
(filter authorized?)
(map with-friends)
(into []))
That’s five collections, this is two, using transducers: (into []
(comp
(map with-user)
(filter authorized?)
(map with-friends))
posts)
A transducer is returned by comp, and each item within comp is itself a transducer. You can see how the flow is exactly like the double threading macro.map for example is called with one arg, this means it will return a transducer, unlike in the first example when it has a second argument, the coll posts, so immediately runs over that and returns a new coll.
The composed transducer returned by comp is passed to into as the second of three arguments. In three argument form, into applies the transducer to each item in coll, the third argument. In two argument form, as in the first example, it just puts coll into the first argument (also a coll).
Comment by kccqzy 5 hours ago
Comment by eduction 5 hours ago
There are some additional inefficiencies in terms of context capturing at each lazy transformation point. The problem gets worse outside of a tidy immediate set of transformations like you’ll see in any example.
This article gives a good overview of the inefficiencies, search on “thunk” for tldr. https://clojure-goes-fast.com/blog/clojures-deadly-sin/ (I don’t agree with its near condemnation of the whole lazy pattern (laziness is quite useful - we can complain about it because we have it, it would suck if we didn’t).)
Comment by kccqzy 4 hours ago
I liked using lazy sequences because it’s more amenable to breaking larger functions into smaller ones and decreases coupling. One part of my program uses map, and a distant part of it uses filter on the result of the map. With transducers it seems like the way to do it is eductions, but I avoided it because each time it is used it reevaluates each item, so it’s sacrificing time for less space, which is not usually what I want.
I should add that I almost always write my code with lazy sequences first because it’s intuitive. Then maybe one time out of five I re-read my code after it’s done and realize I could refactor it to use transduce. I don’t think I’ve ever used eduction at all.
Comment by eduction 2 hours ago
Lazy sequences can be a good fit for a lot of use cases. For example, I have some scenarios where I'm selecting from a web page DOM and most of the time I only want the first match but sometimes I want them all - laziness is great there. Or walking directories in a certain order, and the number of items they contains varies, so I don't know how many I'll need to walk but I know it's usually a small fraction of the total. Laziness is great there.
This can still work with transducers - you can either pass a lazy thing in as the coll to an eager transducing context (maybe with a "take n" along the way) or use the "sequence" transducing context which is lazy.
I tend to reach for transducers in places in my code where I'm combining multiple collection transformations, usually with literal map/filter/take/whatever right there in the code. Easy wins.
Recently I've started building more functions that return either transducers or eductions (depending on whether I want to "set" / couple in the base collection, which is what eduction is good for) so I can compose disparate functions at different points in the code and combine them efficiently. I did this in the context of a web pipeline, where I was chaining a request through different functions to come up with a response. Passing an eduction along, I could just nest it inside other eductions when I wanted to add transducers, then realize the whole thing at the end with an into and render.
Mentally it took me some time to wrap my head around transducers and when and how to use them, so I'm still figuring it out, but I could see myself ending up using them for most things. Rich Hickey, who created clojure, has said if he had thought of them near the beginning he'd have built the whole language around them. But I don't worry about it too much, I mostly just want to get sh-t done and I use them when I can see the opportunity to do so.
Comment by eduction 5 hours ago
Comment by fud101 4 hours ago
Comment by jwr 4 hours ago
For example, transducers decouple the collection type from data-processing functions. So you can write (into #{} ...) (a set), (into [] ...) (a vector) or (into {} ...) (a map) — and you don't have to modify the functions that process your data, or convert a collection at the end. The functions don't care about your target data structure, or the source data structure. They only care about what they process.
The fact that no intermediate structures have to be created is an additional nicety, not really an optimization.
It is true that for simple examples the (-> ...) is easier to read and understand. But you get used to the (into) syntax quickly, and you can do so much more this way (composable pipelines built on demand!).
Comment by eduction 1 hour ago
To take your example, there isn't much abstraction difference between (into #{} (map inc ids)) vs (into #{} (map inc) ids), nor is there a flexibility difference. The non transducer version has the exact same benefit of allowing specification of an arbitrary destination coll and accepting just as wide range of things as the source (any seqable). Whether in a transducer or not, inc doesn't care about where its argument is coming from or going. The only difference between those two invocations is performance.
Functions already provide a ton of abstractability and the programmer will rightly ask, "why should I bother with transducers instead of just using functions?" (aka other, arbitrary functions not of the particular transducer shape) The answer is usually going to be performance.
For a literal core async pipeline, of course, there is no replacing transducers because they are built to be used there, and there is a big abstraction benefit to being able to just hand in a transducer to the pipeline or chan vs building a function that reads from one channel, transforms, and puts on another channel. I never had the impression these pipelines were widely used, but I'd love to be wrong!
Comment by waffletower 4 hours ago
Comment by waffletower 4 hours ago
Comment by mannycalavera42 7 hours ago
Comment by faraway9911 6 hours ago
Comment by instig007 5 hours ago
Comment by Maxatar 5 hours ago
It may be true in this particular case, but in my admittedly brief experience using Haskell you absolutely end up having to remember a hell of a lot of useless terminology for incredibly trivial things.
Comment by tombert 5 hours ago
I used to think it was cute the you could make custom operators in Haskell but as I've worked more with the language, I wish the community would just accept that "words" are actually a pretty useful tool.
Comment by eduction 5 hours ago
Comment by instig007 4 hours ago
> Clojure had foldables, called reducers, this was generalized further when core.async came along - transducers can be attached to core async channels and also used in places where reducers were used.
Ok, you mean there's a distinction between foldables and the effectful and/or infinite streams, so there's natural divide between them in terms of interfaces such as (for instance) `Foldable f` and `Stream f e` where `e` is the effect context. It's a fair distinction, however, I guess my overall point is that they all have applicability within the same kind of folding algorithms that don't need a separate notion of "a composing object that's called a transducer" if you hop your Clojure practice onto Haskell runtime where transformations are lazy by default.
Comment by iLemming 4 hours ago
Oh, my favorite part of the orange site, that's why we come here, that's the 'meat of HN' - language tribalism with a technical veneer. Congratulations, not only you said something as lame as: "French doesn't need the subjunctive mood because German has word order rules that already express uncertainty", but you're also incorrect factually.
Haskell's laziness gives you fusion-like memory behavior on lists for free. But transducers solve a broader problem - portable, composable, context-independent transformations over arbitrary reducing processes - and that you don't get for free in Haskell either.
Transducers exist because Clojure is strict, has a rich collection library, and needed a composable abstraction over reducing processes that works uniformly across collections, channels, streams, and anything else that can be expressed as a step function. They're a solution to a specific problem in a specific context.
Haskell's laziness exists because the language chose non-strict semantics as a foundational design decision, with entirely different consequences - both positive (fusion, elegant expression of infinite structures) and negative (space leaks, reasoning difficulty about resource usage).
Comment by instig007 4 hours ago
Haskell laziness & fusion isn't limited to lists, you can fuse any lawful composition of functions applied over data with the required lawful instances used for the said composition. There's no difference to what transducers are designed for.
> But transducers solve a broader problem - portable, composable, context-independent transformations over arbitrary reducing processes - and that you don't get for free in Haskell either.
Transducers don't solve a broader problem, it's the same problem of reducing complexities of your algorithims by eliminating transient data representations. If you think otherwise, I invite you to provide a practical example of the broader scope, especially the part about "context-independent transformations" that would be different to what Haskell provides you without that separate notion.
> and negative (space leaks, reasoning difficulty about resource usage).
which is mostly FUD spread by internet crowd who don't know the basics of call-by-need semantics, such as the places you don't bind your intermediate evaluations at, and what language constructs implicitly force evaluations for you.
Comment by iLemming 4 hours ago
each of those requires manually written rewrite rules or specific library support. It's not a universal property that falls out of laziness - it's careful engineering per data type. Transducers work over any reducing function by construction, not by optimization rules that may or may not fire.
> it's the same problem
It is not. Take a transducer like `(comp (filter odd?) (map inc) (take 5))`. You can apply this to a vector, a lazy seq, a core.async channel, or a custom step function you wrote five minutes ago. The transformation is defined once, independent of source and destination. In Haskell, fusing over a list is one thing. Applying that same composed transformation to a conduit, a streaming pipeline, an io-streams source, and a pure fold requires different code or different typeclass machinery for each. You can absolutely build this abstraction in Haskell (the foldl library gets close), but it's not free - it's a library with design choices, just like transducers are.
You're third claim is basically the "skill issue" defense. Two Haskell Simons - Marlow, and Jones, and also Edward Kmett have all written and spoken about the difficulty of reasoning about space behavior in lazy Haskell. If the people who build the compiler and its core libraries acknowledge it as a real trade-off, dismissing it as FUD from people who "don't know the basics" is not an argument. It's gatekeeping.
Come on, how can you fail to see the difference between: "Haskell can express similar things" with "Haskell gives you this for free"?
Comment by instig007 2 hours ago
> It is not. Take a transducer like `(comp (filter odd?) (map inc) (take 5))`. You can apply this to a vector, a lazy seq, a core.async channel, or a custom step function you wrote five minutes ago. In Haskell, fusing over a list is one thing. Applying that same composed transformation to a conduit, a streaming pipeline, an io-streams source, and a pure fold requires different code or different typeclass machinery for each.
You can do that only because Clojure doesn't care whether the underlying iterable is to be processed by a side-effectful evaluation. That doesn't negate the fact that the underlying evaluation has a useless notion of "transducer". I said "fuse" in my previous comment to demonstrate that further comptime optimisations are possible that eliminate some transient steps altogether. If you don't need that you can just rely on generic lazy composition of functions that you define once over type classes' constraints.
`IsList` + `OverloadedLists` already exist. Had Haskell had a single type class for all iterable implicitly side-effectful data, you would have got the same singly-written algorithm without a single notion of a transducer. Let that sink in: it's not the transducer that's useful, it the differentiation between pure and side-effectful evaluations that allow your compiler to perform even better optimisations with out-of-order evaluations of pure stuff, as well as eliminating parts of inner steps within the composed step function, as opposed to focusing just on the reducing step-function during the composition. It's not a useful abstraction to have if you care about better precision and advanced optimisations coming from the ability to distinguish pure stuff from non-pure stuff.
Haskell aside, if your goal is to just compose reusable algorithms, a call-by-need runtime + currying + pointfree notation get you covered, you don't need a notion of transducers that exist on their own (outside of the notion of foldable interfaces) to be able to claim exactly the same benefits.
> Two Haskell Simons - Marlow, and Jones, and also Edward Kmett have all written and spoken about the difficulty of reasoning about space behavior in lazy Haskell.
There's a difference between what the people said in the past, and the things the crowd claims the people meant about laziness and space leaks. We can go over individual statements and see if they hold the same "negative" meaning that you say is there.