Skip to content

0.2 Syntax regarding .at() and .get() got a little less usable #338

Closed
@winks

Description

@winks

Hi,
not a huge problem but I just stumbled over this while bumping "regex = 0.1" to "0.2:

I was using a lot of this (IRC parsing code):

    let caps = re_priv.captures(line).unwrap();
    let sender = caps.at(1).unwrap()
    let msg = caps.at(2).unwrap_or("");

Now it looks like this (thanks #rust for the help)

    let caps = re_priv.captures(line).unwrap();
    let sender = caps.get(1).unwrap().as_str();
    let msg = caps.get(2).as_ref().map(regex::Match::as_str).unwrap_or("");
    // alternative:
    let msg = caps.get(2).map_or("", |m| m.as_str());

So my reasoning is now this:

  • In this use case I don't care about much error handling
  • this is inherently text, so no match/empty string can make sense
  • both of the two 0.2 versions are either verbose or not easily guessable (at least to a beginner)

So I don't really want to complain, but I think the syntax (for this use case) got less usable and more verbose, but I don't know if it's worth fixing or if .at() had any other problems.

Activity

BurntSushi

BurntSushi commented on Feb 12, 2017

@BurntSushi
Member

I'm not sure what to do about this. What do you think the alternative is? Adding more methods doesn't seem like a good idea, since that increases the number of choices a newcomer is faced with, which can be daunting.

Did you look at the documentation while writing code? Could you say something about why that didn't help you? In particular, take a look at the examples for captures, which shows yet another alternative:

let msg = caps[2];

In this use case I don't care about much error handling

The standard answer here is "this is what unwrap is for." But in this case, you can access captures by index. e.g., let caps = re.captures(line).unwrap(); println!("{}", caps[2]).

this is inherently text, so no match/empty string can make sense

But not always. The API must provide a way to differentiate between "no match" and "empty string match."

both of the two 0.2 versions are either verbose or not easily guessable (at least to a beginner)

The key change was the addition of the Match data type, which encapsulates both the positions and the text of any particular match. While it introduced a new data type (an increase in complexity) it also decreased the number of different accessor methods (a decrease in complexity).

BurntSushi

BurntSushi commented on Feb 12, 2017

@BurntSushi
Member

OK, after reading IRC and reflecting more, I see that you considered the &caps[2] syntax, but didn't want it because it panics if there's no match. In that case, you need to handle the "no match" case explicitly, so doing the explicit unwrap seems fine to me. I personally think either of these are pretty palatable:

let msg = caps.get(2).map(|m| m.as_str()).unwrap_or("");
let msg = caps.get(2).map_or("", |m| m.as_str());

Doing the eta reduction in this case tends to make it a bit more verbose because of the as_ref call.

winks

winks commented on Feb 12, 2017

@winks
Author

Thanks for the quick answer.

I hadn't thought about "no match" vs "empty string match", correct.

Maybe I'm biased as I hadn't known map_or and didn't think of using map. I'm just saying I think the unwrap_or("") as a proxy of "I don't care much about (exact) error handling" - i.e. I don't want unwrap to panic, but I know what I'm doing and "empty string" is fine - but I'm probably oversimplifying and this is a special use case that just sounds normal to me :)

I think adding this map_or example to the docs as a "hey, I just need the match or an empty string and don't want to futz around with Option(Match) in this case, just like the old .at(x).unwrap_or("") would be 100% fine to me.

TLDR: the use in 0.1 seemed very convenient while I think in 0.2 this is less nice of a shorthand. Nothing more.

BurntSushi

BurntSushi commented on Feb 12, 2017

@BurntSushi
Member

Improving the docs is something I can get on board with.

In general, I think I'm hesitant to add new methods that conveniences over a simple combinator or two, and would prefer to solve those issues for newcomers with better docs. However, I'd be willing to bend on that if there was compelling evidence that it was a big stumbling block for folks.

BurntSushi

BurntSushi commented on Feb 18, 2017

@BurntSushi
Member

I have a doc fix incoming for this.

@winks I think it's worth pointing out that in some number of cases, using indexing, e.g., &caps[3], might very well be sufficient. In particular, if you have a Captures, then you know you have a match, and if you know capture group 3 had to have matched if the entire regex matched, then &caps[3] will never panic. The only reason why you'd need to drop down to caps.get(3) is if capture group 3 was in an optional path of the regex and therefore may not have necessarily participated in the match.

added a commit that references this issue on Feb 18, 2017
9ae9418
winks

winks commented on Feb 18, 2017

@winks
Author

Those comments look awesome, I can't wish for more :) 👍

For completeness, this was my main example (sending !quit or !quit some_text to an irc bot):

    let event_cmd_quit = format!("!quit(\\s+)?(.*)?");
    let re_cmd_quit = Regex::new(&event_cmd_quit[..]).unwrap();
    let caps_cmd = re_cmd_quit.captures(msg).unwrap();
    let quit_msg = caps_cmd.get(2).map_or("", |m| m.as_str());

it has two possible matches in the end, but just the non-captured part is fine on its own. Seeing it again I should've probably used '(\s+(.+))?' - but I guess with "optional capture group at the end" it's always the same edge case that &caps[x] is not the best choice.

Anyway, this can be closed from my point of view - and thanks for your time.

added a commit that references this issue on Feb 18, 2017

Auto merge of #343 - BurntSushi:fixes, r=BurntSushi

7297f23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @winks@BurntSushi

      Issue actions

        0.2 Syntax regarding .at() and .get() got a little less usable · Issue #338 · rust-lang/regex