Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For me, it's like writing 10824 "ten thousand eight hundred and twenty four".

I'd rather read a real regex using the verbose flag to comment groups:

- it shows the real regex for quick scanning and for those familiar with the syntax

- it explains things to the people that are not familiar with it or if the regex is complicated

- it forces the writer to divide the regex into logical groups

- such a system would have comment anyway, to indicate what you are matching such as "product code, date, color" for each part of the matching code.

- if you can't write or read regex and your job is programming, spending an afternoon learning them should be your next step. They are everywhere, no matter the tech stack: unix tools, IDE, 3rd party libs...

- there are plenty of regex tester UI, which let me copy the regex and test various case to see what it does, and tweak it

But I can see the value of such a lib for learning regexes.



I'll disagree here. While I have quibbles about the specific API used in this project, I like the idea of an imperative regex builder, especially if it can be type checked.

Every time I turn to regex, I waste time debugging which characters I forgot to escape or accidentally escaped when all the brackets and slashes blur together. I debug why some group isn't matching right because regex's semantic density makes it hard to tell where the group starts and ends. I turn to regex debuggers because they're necessary, but they're not great experiences, and at first glance I'd think a type checked regex builder could make debuggers unnecessary a lot of the time.

There's also a discoverability problem. I know non-capturing groups and negative lookbehind are a thing, but I always have to look them up because it's hard to remember the arcane syntax if I don't use them often. And my peers don't even know some of those things exist, so they struggle to solve easy problems. A library that my editor would offer autocomplete suggestions for would really help this.

I also think a regex builder would promote better organization - break the regex into parts, assign the parts to variables, and reuse portions of a regex. That's all possible with traditional regex, but I don't see folks doing it because it seems few folks know about verbose regex, building logic with string concatenation is discouraged in many situations, and if your language has a native regex data type, declining that in favor of string building feels weird. If all a regex builder did was reframe what developers feel is natural to do, that's beneficial.


Here is what I see.

You want an automated tool to build a regular expression for people who don't understand regular expressions. There is no shortage of ways that this is going to lead to disasters. Starting with the fact that far too many developers do not understand the difference between pattern matching and parsing, and will reach for the wrong tool with no idea what it is doing and why they can't get it to do what they want.

See https://stackoverflow.com/questions/1732348/regex-match-open... for more.


I'd like to clarify that my position isn't about "not understanding regular expressions". Sure, it could help people who don't. But even for people like me: I read the O'Reilly regex pocket book and other materials, I studied formal regular languages during college, built a basic lexer/parser for a senior project, I've written no shortage of simple and complicated regexes in application code in the workplace, and once in a conference lecture I was the first audience member to recognize and shout out when the speaker quizzed which commonplace file format the given gnarly full-page regex matched.

I'll never be up there with Brian Kernighan[0], but I know my way around regex at least as well as what I feel is reasonable to expect and accommodate from the average developer.

My position is that even with a background in regexes that's a lot deeper than just Googling and putzing on Regex101.com, traditional regex syntax is still a frustrating time sink that's hard to get correct without more trial-and-error than feels intrinsically necessary. The syntax provides zero opportunity to discover there's a more effective way to perform a task. I have trouble identifying a compelling value proposition for traditional syntax besides familiarity and natural serializability, and the fact that it gets the job done at all.

I don't believe traditional regex syntax is the optimal way to accomplish text pattern matching tasks in the workplace, and I'm open to other tooling that makes it success simpler and more reliable. People misuse regexes all the time (examples like the one you linked are almost tropes at this point), but I don't think that's compelling justification on its own for preserving the status quo.

[0] https://www.cs.princeton.edu/courses/archive/spr09/cos333/be...


My only complaint about the regex syntax is that it does not allow you to separate things out with whitespace, or add comments about what the chunked units. The x modifier fixes both.

What you traditionally see with a complex RE for a complex pattern is the same as what you traditionally see with someone writing complex SQL statements on a single line. Stop trying to treat it like a black box, and treat it as a programming language in its own right. Use whitespace, indentation, and comments (when necessary) to communicate intent as well as just to make it do its job.

Other than that, regular expressions say what they mean and mean what they say very concisely directly. Particularly the PCRE variants of the language. I consider that conciseness and directness a virtue.


For me, a regex usually has to be wrapped into a function, which I can then throw copious amounts of unit tests at. Regex is, IMHO, a easy-to-write hard-to-read language, so I find it more fruitful to use tests to specify what is the task being accomplished, so that - if it's easier - I can just rewrite the appropriate regex from scratch rather than trying to decipher how the old one is broken.

If the task is complex enough, regex might not even be the right tool for the job, and the function boundary provides a sensible encapsulation boundary.


> if you can't write or read regex and your job is programming, spending an afternoon learning them should be your next step

I’m one of those super stubbornly bullheaded people who believes I can learn or do anything if I’m simply willing to devote the time/energy to it so I don’t say this lightly:

I am absolutely incapable of “reading” RegEx.

I use RegEx. I (conceptually) understand RegEx. I can write Regex quickly and effectively without much thought. But I can’t read it to save my life. In fact it’s so difficult for me that I struggle to believe there are people who actually can “read” it.

I can decipher it but it’ll take me a bit - more like solving a little puzzle in my head than reading and understanding a piece of code.

And judging by the way a lot of people talk about RegEx (including seasoned programmers) I can’t imagine I’m alone.


Does any representation deserve to be singled out as the "real regex"? I'd have picked the abstract syntax tree. A bunch of constructor calls are closer to that.

(Yes, the Perl regex syntax has some advantages, and I'm only objecting to what you're calling it. Though it's also true that a tree structure has advantages over a string: for instance, you don't have to parse it to manipulate it.)


I think you are one hundred percent correct :)

But I can see it being useful if the idea is translated to NLP -> regex

a GPT3 to regex would be awesome


I could see it being awesome from a "This is cool" perspective, but I wouldn't trust it to actually work.

Just imagine it getting a negative swapped or something.


I think it would be useful to generate a regex - which would then get written down in the code to ensure it doesn't change? You could test it, ensure it works, then just use the output of the neural net...


Good point, that does actually sound quite useful.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: