Use "\A \z", not "^ $" with Python regular expressions
Posted by todsacerdoti 1 day ago
Comments
Comment by flufluflufluffy 1 day ago
Comment by theamk 1 day ago
if not re.match('^[a-z0-9_]+$', user):
raise SomeException("invalid username")
as written, the code above is incorrect - it will happily accept "john\n", which can cause all sort of havoc down the lineComment by extraduder_ire 1 day ago
Comment by theamk 1 day ago
Yes, fullmatch() will help, and so will \Z. It's just that it is so easy to forget...
Comment by Joker_vD 1 day ago
Comment by seanwilson 1 day ago
Regex is one of those things where I have to look up to remind myself what the symbols are, and by the time I need this info again I've forgotten it all.
I can't think of anywhere else in general programming where we have something so terse and symbol heavy.
Comment by db48x 1 day ago
35.3.3 The ‘rx’ Structured Regexp Notation
------------------------------------------
As an alternative to the string-based syntax, Emacs provides the
structured ‘rx’ notation based on Lisp S-expressions. This notation is
usually easier to read, write and maintain than regexp strings, and can
be indented and commented freely. It requires a conversion into string
form since that is what regexp functions expect, but that conversion
typically takes place during byte-compilation rather than when the Lisp
code using the regexp is run.
Here is an ‘rx’ regexp(1) that matches a block comment in the C
programming language:
(rx "/*" ; Initial /*
(zero-or-more
(or (not "*") ; Either non-*,
(seq "*" ; or * followed by
(not "/")))) ; non-/
(one-or-more "*") ; At least one star,
"/") ; and the final /
or, using shorter synonyms and written more compactly,
(rx "/*"
(* (| (not "*")
(: "*" (not "/"))))
(+ "*") "/")
In conventional string syntax, it would be written
"/\\*\\(?:[^*]\\|\\*[^/]\\)*\\*+/"
Of course, it does have one disadvantage. As the manual says: The ‘rx’ notation is mainly useful in Lisp code; it cannot be used in
most interactive situations where a regexp is requested, such as when
running ‘query-replace-regexp’ or in variable customization.
Raku also has advanced the state of the art considerably.Comment by zahlman 1 day ago
* running a regex not in multi-line mode
* on input that was presumably split from multiple lines, or within a line of multi-line input
* wherein I care whether the line in question is the last line of input without a trailing newline
* but I didn't check, or `.strip()` or anything
I can't say I recall ever being bitten by this.
And there is also nothing here to justify \A over ^.
Comment by eviks 1 day ago
Comment by tkocmathla 1 day ago
Comment by svilen_dobrev 1 day ago
And it is same in perl: from `man perlre`:
^ Match the beginning of the string (or line, if /m is used)Comment by autoexec 1 day ago
Comment by edflsafoiewq 1 day ago
Comment by autoexec 1 day ago
$foo =~ /regex/
$result = $foo =~ /regex/
if ($foo =~ /regex/) {whatever;}
while (/regex/) {whatever;}
The captures ($1, $2, etc.) are global and usable wherever you need them.In this particular case the default is that $ matches the end of a string without a newline but you can include it anytime you need to:
$foo =~ /regex$/ # end of string without newline
$foo =~ /regex$/m # end of string with newlineComment by instig007 1 day ago
Python ecosystem has several options, for instance: https://parsy.readthedocs.io/en/latest/tutorial.html
Comment by az09mugen 1 day ago
Comment by notpushkin 1 day ago
Comment by queenkjuul 1 day ago
https://www.reuters.com/world/us/evidence-contradicts-trump-...
Comment by tomhow 7 hours ago
Please don't follow people around the site to continue political arguments from unrelated threads.
Comment by zahlman 1 day ago
Comment by queenkjuul 20 hours ago
https://www.reuters.com/world/us/evidence-contradicts-trump-...