Skip to content

Feature Request: BufferedReader function for reading without returning delimeter #11404

Closed
@WebeWizard

Description

@WebeWizard
Contributor

I keep running into situations where I need to read (BufferedReader) from a list of comma separated values or newline separated values and then storing the result. Should I have to trim off the delimiter every time? I feel like this is a common enough use case to be included in libstd.

//reading until newline and trimming off the newline chars.
//probably a better way to do this.
let tempvalue = lineIter.next().unwrap().to_str();
let length = tempvalue.len();
let value = tempvalue.as_slice().slice_to( length - 2 );

Activity

ghost

ghost commented on Jan 8, 2014

@ghost

+1.

adrientetar

adrientetar commented on Jan 8, 2014

@adrientetar
Contributor

I agree. I think that this mod should be applied at least to read_line() which just reads a single line (iterators can be a different story eventually — maybe we want to keep it for the line iterator so that the total of its content is exactly equal to the originating input? I don't know).

Current behavior forces to trim EOL chars when for example casting stdin input to a numbered type.

We probably need a read_until() function (that's what read_line() is made of, and .lines() is itself made of the former) variant that pushes everything but the byte character you want to stop at.
Hope that this proposal makes sense.

cc @alexcrichton

alexcrichton

alexcrichton commented on Jan 8, 2014

@alexcrichton
Member

My initial thoughts in designing the read_until function this was were that I did not want to lose data here and there. Without returning the delimiter, you have no method of knowing whether there actually was a delimiter or not (which may be useful sometimes)

That being said, this is a convenience method, so correctness/completeness may not be paramount. I think I based this off Go's interface, but I would also be curious about what other languages do as well.

adrientetar

adrientetar commented on Jan 30, 2014

@adrientetar
Contributor

@alexcrichton We could just return an Option with None if the delimiter wasn't found.

steveklabnik/rust_for_rubyists#48 is related; the current read_line must also deal with Windows line endings:

let num = from_str::<int>(input.trim_right_chars(& &['\n', '\r']));

davbo

davbo commented on Feb 4, 2014

@davbo
Contributor

Since @alexcrichton seemed interested in what other languages do; in the Python world this would typically be handled by reading the entire file into a string and calling splitlines. Which takes advantage of an underlying Python TextIO feature of "universal newlines" in which the File IO layer hides the different types of newlines from the user; all instances of '\n', '\r' and '\r\n' are returned as '\n'. This was introduced in PEP3116.

This would be similar to using Rust's AnyLineIterator.

Of course reading the whole file in as a string isn't always a great idea. Generally I'd guess (as with @WebeWizard here) you'd be dealing with CSV's. In Python's case this is handled by a separate library. I did see one rust CSV library which looked to be struggling slightly with newlines itself.

I wonder if Rust needs higher level File IO libraries (such as csv) or if extending the BufferedReader as suggested here is a good idea for the meantime? It's also worth considering if introducing something like "universal newlines" could be easier now than later.

arjantop

arjantop commented on Feb 19, 2014

@arjantop
Contributor

@alexcrichton The other point is efficiency. If you need an owned pointer you have to take a slice without the delimiter and then covert that to owned. In Go you can just slice it and you are done.

sfackler

sfackler commented on Feb 19, 2014

@sfackler
Member

Why do you have to convert to owned?

arjantop

arjantop commented on Feb 19, 2014

@arjantop
Contributor

@sfackler So I can send it to a channel for example

mneumann

mneumann commented on Feb 19, 2014

@mneumann
Contributor

Ruby also keeps the newline characters intact:

"abc\ndef".lines # => ["abc\n", "def"]
STDIN.readline # => "the text you enter\n"

But in Ruby you can easily chop them off using String#chomp:

"abc\n".chomp # => "abc"
"abc\r\n".chomp # => "abc"
"abc".chomp # => "abc"
"abc\n\n".chomp # => "abc\n" -- Only one newline is chomped off!

I think that we should introduce a convenience function like Ruby's String#chomp.

mneumann

mneumann commented on Feb 19, 2014

@mneumann
Contributor

But I would not do this directly in the reader.

sfackler

sfackler commented on Feb 19, 2014

@sfackler
Member

The trim, trim_right, and trim_left functions already exist, but return slices.

sfackler

sfackler commented on Feb 19, 2014

@sfackler
Member

We could make a variant of trim_right that did an in-place modification of an owned string, but I'd shy away from doing the same for trim and trim_left since that'll be a pretty expensive operation compared to slicing.

mneumann

mneumann commented on Feb 19, 2014

@mneumann
Contributor

@sfackler: Chopping off the newline character is so common that there should be a utility function for this purpose. I expect chomp to also return a slice.

sfackler

sfackler commented on Feb 19, 2014

@sfackler
Member
mneumann

mneumann commented on Feb 19, 2014

@mneumann
Contributor

@sfackler: Yes, but they also trim whitespaces, unless you want to write input.trim_right_chars(& &['\n', '\r'])), which is very verbose and would chop off as many newline characters as there are, and in regardless which order ("\r\n" or "\n\r"). Of course the latter cannot happen when using read_line, but still I prefer a specialized "strip the newline off" method.

6 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @steveklabnik@mneumann@arjantop@alexcrichton@davbo

        Issue actions

          Feature Request: BufferedReader function for reading without returning delimeter · Issue #11404 · rust-lang/rust