Rust's Block Pattern
Posted by zdw 1 day ago
Comments
Comment by koakuma-chan 1 day ago
https://doc.rust-lang.org/beta/unstable-book/language-featur...
Comment by loeg 1 day ago
let x = (|| -> Result<i32, std::num::ParseIntError> {
Ok("1".parse::<i32>()?
+ "2".parse::<i32>()?
+ "3".parse::<i32>()?)
})();Comment by rendaw 20 hours ago
Comment by ahartmetz 19 hours ago
Comment by dzaima 15 hours ago
Comment by skribanto 10 hours ago
Comment by schneems 19 hours ago
Comment by pflanze 10 hours ago
This works fine: https://play.rust-lang.org/?version=stable&mode=debug&editio...
Comment by saghm 21 hours ago
Comment by ahartmetz 19 hours ago
Comment by loeg 20 hours ago
Comment by Sytten 1 day ago
Comment by mbrubeck 1 day ago
Comment by mmastrac 1 day ago
Comment by njdiwigud 1 day ago
Comment by stouset 20 hours ago
Comment by masklinn 17 hours ago
Comment by lunar_mycroft 17 hours ago
Comment by koakuma-chan 1 day ago
You only live once.
Comment by dwattttt 1 day ago
Comment by JoshTriplett 23 hours ago
Comment by valcron1000 1 day ago
Comment by tayo42 1 day ago
Comment by bobbylarrybobby 1 day ago
Comment by mwcz 1 day ago
Comment by jeroenhd 1 day ago
The closest thing I can think of that will let you return a result from within a separate scope using a set of foo()? calls would be a lambda function that's called immediately, but that has its own problems when it comes to moving and it probably doesn't compile to very fast code either. Something like https://play.rust-lang.org/?version=stable&mode=debug&editio...
Comment by koakuma-chan 1 day ago
Comment by oniony 1 day ago
Comment by satvikpendem 16 hours ago
Comment by emtel 1 day ago
Comment by nemo1618 21 hours ago
let foo: &[SomeType] = {
let mut foo = vec![];
// ... initialize foo ...
&foo
};
This doesn't work: the memory is owned by the Vec, whose lifetime is tied to the block, so the slice is invalid outside of that block. To be fair, it's probably best to just make foo a Vec, and turn it into a slice where needed.Comment by saghm 21 hours ago
Comment by adrianN 18 hours ago
Comment by janquo 13 hours ago
https://doc.rust-lang.org/beta/unstable-book/language-featur...
AFAIU it essentially creates a variable in inner scope but defers drop to the outer scope so that you can return the reference
Comment by bryanlarsen 1 day ago
- Drop does something, like close a file or release a lock, or
- x and y don't have Send and/or Sync, and you have an await point in the function or are doing multi-threaded stuff
This is why you should almost always use std::sync::Mutex rather than tokio::sync::Mutex. std's Mutex isn't Sync/Send, so the compiler will complain if you hold it across an await. Usually you don't want mutex's held across an await.
Comment by bryanlarsen 1 day ago
Comment by tstenner 17 hours ago
Comment by defen 1 day ago
Comment by loeg 1 day ago
Comment by JDye 23 hours ago
A lot of the time it looks like this:
let config = {
let config = get_config_bytes();
let mut config = Config::from(config);
config.do_something_mut();
config.do_another_mut();
config
};Comment by bobbylarrybobby 1 day ago
let mut data = foo(); data.mutate(); let data = data;
May be preferable for short snippets where adding braces, the yielded expression, and indentation is more noise than it's worth.
Comment by bryanlarsen 1 day ago
Comment by kibwen 11 hours ago
Comment by ziml77 1 day ago
That last example is probably my biggest use of it because I hate having variables being unnecessarily mutable.
Comment by ghosty141 1 day ago
Comment by nadinengland 1 day ago
I typically use closures to do this in other languages, but the syntax is always so cumbersome. You get the "dog balls" that Douglas Crockford always called them:
``` const config = (() => { const raw_data = ...
...
return compiled;
})()'const result = config.whatever;
// carry on
return result; ```
Really wish block were expressions in more languages.
Comment by dwattttt 1 day ago
like
thisComment by notpushkin 18 hours ago
{
const x = 5;
x + 5
}
// => 10
x
// => undefined
But I don’t see a way to get the result out of it. As soon as you try to use it in an expression, it will treat it as an object and fail to parse.Comment by charleszw 1 day ago
Comment by paavohtl 15 hours ago
Comment by pwdisswordfishy 14 hours ago
(Not to be confused with do notation)
Comment by saghm 21 hours ago
// double the value of `x` until it's at least 10
while { x = x * 2; x < 10 } {}
This isn't something that often will end up being more readable compared to another way to express it (e.g. an unconditional `loop` with a manual `break`, or refactoring the body into a separate function to be called once before entering the loop), but it's a fun trick to show people sometimes.Comment by Mawr 5 hours ago
func foo(cfg_file string) (parsed, error) {
config := func() {
return json.Parse(cfg_file)
}
return config.parsed
}
All of these are however poor solutions to the problem, because they're not true nested functions — they can access arbitrary variables defined outside their scope. Python at least restricts their modification, but Go doesn't. I'm guessing in Rust it's at least explicit in some way?In any case, the real solution here is to simply allow proper nested functions that behave exactly like freestanding functions in that they can only access what's passed to them:
func foo(cfg_file string) (parsed, error) {
func config(cfg_file) config {
return json.Parse(cfg_file)
}
return config.parsed
}
This way you can actually reason about that block of code in isolation—same effect as when calling a freestanding function, except this doesn't expose the nested function to callers outside the parent function, which is valuable.Comment by esafak 1 day ago
Also in Kotlin, Scala, and nim.
Comment by IshKebab 12 hours ago
Comment by afdbcreid 1 hour ago
Comment by atq2119 1 day ago
Comment by the__alchemist 1 day ago
Comment by kibwen 1 day ago
Comment by atq2119 16 hours ago
Comment by kibwen 11 hours ago
Comment by tialaramex 23 hours ago
break 'label value;
... is something to be used very sparingly. I reckon I write a new one about once a year.Very often if you think harder you realise you didn't want this, you should write say, a function (from which you can return) or actually you didn't want to break early at all. Not always, but often. If you write more "break 'label value" than just break then you are almost certainly Doing It Wrong™.
Comment by the__alchemist 11 hours ago
Comment by lights0123 1 day ago
It's used all throughout the Linux kernel and useful for macros.
Comment by Rucadi 1 day ago
I use that with with macros to return akins to std::expected, while maintaining the code in the happy-path like with exceptions.
Comment by aabdelhafez 1 day ago
Comment by zaphirplane 1 day ago
Comment by jeroenhd 1 day ago
Actually, Kotlin's with() and apply() are more powerful than what Rust can provide. Then again, Rust isn't designed with OO in mind, so you probably shouldn't use those patterns in Rust anyway.
Comment by saghm 21 hours ago
Comment by jeroenhd 11 hours ago
also: https://github.com/JetBrains/kotlin/blob/2.3.0/libraries/std...
apply: https://github.com/JetBrains/kotlin/blob/2.3.0/libraries/std...
let: https://github.com/JetBrains/kotlin/blob/2.3.0/libraries/std...
with: https://github.com/JetBrains/kotlin/blob/2.3.0/libraries/std...
run (two overloads): https://github.com/JetBrains/kotlin/blob/2.3.0/libraries/std... and https://github.com/JetBrains/kotlin/blob/2.3.0/libraries/std...
These all heavily rely on Kotlin's ability to write an extension function for any class. When you write `with(x) { something() }` you're extending the type of `x` (be that int, List<String>, or SomeObject) with an anonymous method, and passing that as a second parameter.
Consider the signature here:
public inline fun <T, R> with(receiver: T, block: T.() -> R): R
The first object is a generic object T, which can be anything. The second is a member function of T that returns R, which again can be just about anything, as long as it operates on T and returns R.Let does it kind of diferently:
public inline fun <T, R> T.let(block: (T) -> R): R
This is an extension method that applies to every single class as T isn't restricted, so as long as this function is in scope (it's in the standard library so it will be), every single object will have a let() method. The only parameter, block, is a lambda that takes T and returns R.So for instance:
val x = makeFoo()
with (x) {
bar = 4
}
is syntactic sugar for something like: fun Foo.anonymous() {
this.bar = 4
}
val x = makeFoo()
with(x, Foo::anonymous)
You could absolutely write any of these yourself. For instance, consider this quick example I threw together: https://pl.kotl.in/S-pHgvxlXThe type inference is doing a lot of heavy lifting, i.e. taking a lambda and automatically turning it into an anonymous extension function, but it's nothing that you cannot do yourself. In fact, a wide range of libraries write what might look like macros in Kotlin by leveraging this and the fact you can define your own inline operators (i.e. https://pl.kotl.in/TZB0zA1Jr).
This isn't possible in many other languages because taking a generic type definition and letting it possibly apply to every single existing type is not exactly popular. Combined with Kotlin's ability to extend nullable types (i.e. this = null) as well makes for a language system that wouldn't work in many other flexible languages.
Comment by saghm 9 hours ago
Comment by simon_void 1 day ago
Comment by ibgeek 1 day ago
Comment by pkulak 6 hours ago
Comment by Rucadi 1 day ago
Comment by jeroenhd 1 day ago
let config: Result<i32, i32> = {
Ok(
"1234".parse::<i32>().map_err(|_| -1)?
)
};
would fail to compile, or worse: would return out of the entire method if surrounding method would have return type Result<_,i32>. On the other hand, let config: Result<i32, i32> = (||{
Ok(
"1234".parse::<i32>().map_err(|_| -1)?
)
})();
runs just fine.Hopefully try blocks will allow using ? inside of expression blocks in the future, though.
Comment by carstimon 1 day ago
Comment by knorker 16 hours ago
And the workarounds often make the pattern be a net loss in clarity.
Comment by etyp 1 day ago
Try this out, you can actually (technically) assign a variable to `continue` like:
let x = continue;
Funnily enough, one of the few things that are definitely always a statement are `let` statements! Except, you also have `let` expressions, which are technically different, so I guess that's not really a difference at all.
Comment by tialaramex 10 hours ago
Comment by AdieuToLogic 20 hours ago
Here’s a little idiom that I haven’t really seen discussed
anywhere, that I think makes Rust code much cleaner and
more robust.
I don’t know if there’s an actual name for this idiom; I’m
calling it the “block pattern” for lack of a better word.
This idiom has been discussed and codified in various languages for many years. For example, Scala has supported the same thusly: val foo: Int = {
val one = 1
val two = 2
one + two
}
Java (the language) has also supported[0] similar semantics.Good to see Rust supports this technique as well.
0 - https://docs.oracle.com/javase/tutorial/java/javaOO/initial....
Comment by skipants 1 day ago
It barely adds any functionality but it's useful for readability because of the same reasons in the OP.
It helps because I've been bitten by code that did this:
setup_a = some_stuff
setup_b = some_more_stuff
i_think_this_is_setup = even_more_stuff
the_thing = run_setup(setup_a, setup_b, i_think_this_is_setup)
That's all fine until later on, probably in some obscure loop, `i_think_this_is_setup` is used without you noticing.Instead doing something like this tells the reader that it will be used again:
i_think_this_is_setup = even_more_stuff
the_thing = begin
setup_a = some_stuff
setup_b = some_more_stuff
run_setup(setup_a, setup_b, i_think_this_is_setup)
end
I now don't mentally have to keep track of what `setup_a` or `setup_b` are anymore and, since the writer made a conscious effort not to put it in the block, you will take an extra look for it in the outer scope.Comment by ramses0 21 hours ago
function abc() {
let a = 1
{
let b = 2
}
console.log(typeof a)
console.log(typeof b)
}
abc()
Used to do this occasionally for exactly the same reasons- don't leave dangling variables junking up your scope, and don't make weirdo functions with parameter passing that you'll only ever call once!Comment by gleenn 1 day ago
let input = read_input(); let trimmed_input = input.trim(); let trimmed_uppercase_input = trimmed_input.uppercase();
...
The extra variable names are almost completely boilerplate and make it also annoying to reorder things.
In Clojure you can do
(-> (read-input) string/trim string/upcase)
And I find that so much more readable and refactorable.
Comment by domlebo70 20 hours ago
export const run = <T>(f: () => T): T => {
return f();
};Comment by notpushkin 18 hours ago
Comment by deredede 15 hours ago
Comment by lenkite 1 day ago
The second example "erasure of mutability" makes more sense. But this effectively makes it a Rust-specific pattern.
Comment by dtdynasty 1 day ago
Comment by ngruhn 1 day ago
Comment by j16sdiz 17 hours ago
I think the author misunderstood something....
Comment by charlie-83 14 hours ago
Comment by rienbdj 16 hours ago
Comment by sonu27 16 hours ago
Comment by andrepd 1 day ago
Comment by 9029 18 hours ago
Comment by knorker 16 hours ago
Comment by hu3 20 hours ago
Comment by andrepd 15 hours ago
Comment by exDM69 14 hours ago
It is available as a language extension in Clang and GCC and widely used (e.g. by the Linux kernel).
Unfortunately it is not supported by the third major compiler out there so many projects can't or don't want to use it.
Comment by icar 18 hours ago
Comment by steveklabnik 8 hours ago
Comment by chrismorgan 18 hours ago
Much of the value of this block pattern is that it makes the scope of the intermediate variables clear, so that you have no doubt that you don’t need to keep them in mind outside that scope.
But it’s also about logical grouping of concepts. And that you can achieve with simple ad hoc indentation:
fn foo(cfg_file: &str) -> anyhow::Result<()> {
// Load the configuration from the file.
// Cached regular expression for stripping comments.
static STRIP_COMMENTS: LazyLock<Regex> = LazyLock::new(|| {
RegexBuilder::new(r"//.*").multi_line(true).build().expect("regex build failed")
});
// Load the raw bytes of the file.
let raw_data = fs::read(cfg_file)?;
// Convert to a string to the regex can work on it.
let data_string = String::from_utf8(&raw_data)?;
// Strip out all comments.
let stripped_data = STRIP_COMMENTS.replace(&config_string, "");
// Parse as JSON.
let config = serde_json::from_str(&stripped_data)?;
// Do some work based on this data.
send_http_request(&config.url1)?;
send_http_request(&config.url2)?;
send_http_request(&config.url3)?;
Ok(())
}
(Aside: that code is dreadful. None of the inner-level comments are useful, and should be deleted (one of them is even misleading). .multi_line(true) does nothing here (it only changes the meanings of ^ and $; see also .dot_matches_new_line(true)). There is no binding config_string (it was named data_string). String::from_utf8 doesn’t take a reference. fs::read_to_string should have been used instead of fs::read + String::from_utf8. Regex::replace_all was presumably intended.)It might seem odd if you’re not used to it, but I’ve been finding it useful for grouping, especially in languages that aren’t expression-oriented. Tooling may be able to make it foldable, too.
I’ve been making a lightweight markup language for the last few years, and its structure (meaning things like heading levels, lists, &c.) has over time become almost entirely indentation-based. I find it really nice. (AsciiDoc is violently flat. reStructuredText is mostly indented but not with headings. Markdown is mostly flat with painfully bad and footgunny rules around indentation.)
—⁂—
A related issue. You frequently end up with multiple levels of indentation where you really only want one. A simple case I wrote yesterday in Svelte and was bothered by:
$effect(() => {
if (loaded) {
… lots of code …
}
});
In some ancient code styles it might have been written like this instead: $effect(() => { if (loaded) {
… lots of code …
} });
Not the prettiest due to the extra mandatory curlies, but it’s fine, and the structure reasonable. In Rust it’s nicer: effect(|| if loaded {
… lots of code …
});
But rustfmt would insist on returning it to this disappointment: effect(|| {
if loaded {
// … lots of code …
}
});
Perhaps the biggest reason around normalising indentation and brace practice was bugs like the “goto fail” one. I think there’s a different path: make the curly braces mandatory (like Rust does), and have tooling check that matching braces are at the same level of indentation. Then the problem can’t occur. Once that’s taken care of, I really see no reason not to write things more compactly, when you decide it is nicer, which I find quite frequently compared with things like rustfmt.I would like to see people experiment with indentation a bit more.
—⁂—
One related concept from Microsoft: regions. Cleanest in C♯, `#region …` / `#endregion` pragmas which can introduce code folding or outlining or whatever in IDEs.
Comment by dionian 22 hours ago
Comment by yearolinuxdsktp 1 day ago
In the example given, I would have preferred to extract to a method—-what if I want to load the config from somewhere else? And perhaps the specific of strip comments itself could have been extracted to a more-semantically-aptly named post-processing method.
I see the argument that when extracted to a function, that you don’t need to go hunting for it. But if we look at the example with the block, I still see a bunch of detail about how to load the config, and then several lines using it. What’s more important in that context—-the specifics of the loading of config, or the specifics of how requests are formed using the loaded config?
The fact that you need to explain what’s happening with comments is a smell. Properly named variables and methods would obviate the need for the comments and would introduce semantic meaning thru names.
I think blocks are useful when you are referencing a lot of local variables and also have fairly localized meaning within the method. For example, you can write a block to capture a bunch of values for logging context—-then you can call that block in every log line to get a logging context based on current method state. It totally beats extracting a logging context method that consumes many variables and is unlikely to be reused outside of the calling method, and yet you get delayed evaluation and single point of definition for it.
So yes to the pattern, but needs a better example.
Comment by ordu 19 hours ago
There are DRY and WET principles. We can argue which one of them is better, but to move something used exactly once to a method just due to an anxiety you can need it again seems to me a little bit too much. I move things into functions that are called once, but iff it makes my code clearer. It can happen when code is already complicated and long.
The block allows you to localize the code, and refactoring it into a separate function will be trivial. You need not to check if all the variables are temporary, you just see the block, copy/paste it, add a function header, and then add function call at the place where the block was before. No thinking and no research is needed. Veni, vidi, vici.
> The fact that you need to explain what’s happening with comments is a smell.
It is an example for the article taken out of a context. You'd better comment it for the sake of your readers.
> I think blocks are useful when you are referencing a lot of local variables and also have fairly localized meaning within the method.
I do it each time I need a temporary variable. I hate variables that exist but are not used, they make it harder to read the code, you need to track temporaries through all the code to confirm that they are temporaries. So even if I have just two local variables (not "a lot of") and one of them is temporary, I'd probably localize the temporary one even further into its own block. What really matters is a code readability: if the function has just three lines, it doesn't matter, but it becomes really ugly if a lifetime of a variable overshoots its usefulness for 20 lines of a dense code.
The other thing is mutability/immutability: you can drop mutability when returning a value from a block. Mutability makes reasoning harder, so dropping it when you don't need it anymore is a noble deed. It can and will reduce the complexity of reading the code. You'll thank yourself many times later, when faced with necessity to reread your own code.
There is a code and there is the process of devising the code. You cannot understand the former without reverse engineering the latter. So, when you write code, the more of your intentions are encoded somehow in your code, the easier it will be to read your code. If you create temporary variables just to parse config with the final goal to get the parsed config in a variable, then you'd better encode it. You can add comments, like "we need to parse config and for that we need three temporary variables", or you can localize those three temporary variables in a block.
Comment by HackerThemAll 1 day ago
Comment by keybored 1 day ago
Voluntary use: I know this one. It’s a pattern now.