Closed
Description
I keep running into situations where I need to read (BufferedReader) from a list of comma separated values or newline separated values and then storing the result. Should I have to trim off the delimiter every time? I feel like this is a common enough use case to be included in libstd.
//reading until newline and trimming off the newline chars.
//probably a better way to do this.
let tempvalue = lineIter.next().unwrap().to_str();
let length = tempvalue.len();
let value = tempvalue.as_slice().slice_to( length - 2 );
Activity
ghost commentedon Jan 8, 2014
+1.
adrientetar commentedon Jan 8, 2014
I agree. I think that this mod should be applied at least to
read_line()
which just reads a single line (iterators can be a different story eventually — maybe we want to keep it for the line iterator so that the total of its content is exactly equal to the originating input? I don't know).Current behavior forces to trim EOL chars when for example casting stdin input to a numbered type.
We probably need a
read_until()
function (that's whatread_line()
is made of, and.lines()
is itself made of the former) variant that pushes everything but the byte character you want to stop at.Hope that this proposal makes sense.
cc @alexcrichton
alexcrichton commentedon Jan 8, 2014
My initial thoughts in designing the
read_until
function this was were that I did not want to lose data here and there. Without returning the delimiter, you have no method of knowing whether there actually was a delimiter or not (which may be useful sometimes)That being said, this is a convenience method, so correctness/completeness may not be paramount. I think I based this off Go's interface, but I would also be curious about what other languages do as well.
adrientetar commentedon Jan 30, 2014
@alexcrichton We could just return an Option with
None
if the delimiter wasn't found.steveklabnik/rust_for_rubyists#48 is related; the current read_line must also deal with Windows line endings:
davbo commentedon Feb 4, 2014
Since @alexcrichton seemed interested in what other languages do; in the Python world this would typically be handled by reading the entire file into a string and calling splitlines. Which takes advantage of an underlying Python TextIO feature of "universal newlines" in which the File IO layer hides the different types of newlines from the user; all instances of '\n', '\r' and '\r\n' are returned as '\n'. This was introduced in PEP3116.
This would be similar to using Rust's AnyLineIterator.
Of course reading the whole file in as a string isn't always a great idea. Generally I'd guess (as with @WebeWizard here) you'd be dealing with CSV's. In Python's case this is handled by a separate library. I did see one rust CSV library which looked to be struggling slightly with newlines itself.
I wonder if Rust needs higher level File IO libraries (such as csv) or if extending the BufferedReader as suggested here is a good idea for the meantime? It's also worth considering if introducing something like "universal newlines" could be easier now than later.
arjantop commentedon Feb 19, 2014
@alexcrichton The other point is efficiency. If you need an owned pointer you have to take a slice without the delimiter and then covert that to owned. In Go you can just slice it and you are done.
sfackler commentedon Feb 19, 2014
Why do you have to convert to owned?
arjantop commentedon Feb 19, 2014
@sfackler So I can send it to a channel for example
mneumann commentedon Feb 19, 2014
Ruby also keeps the newline characters intact:
But in Ruby you can easily chop them off using
String#chomp
:I think that we should introduce a convenience function like Ruby's
String#chomp
.mneumann commentedon Feb 19, 2014
But I would not do this directly in the reader.
sfackler commentedon Feb 19, 2014
The
trim
,trim_right
, andtrim_left
functions already exist, but return slices.sfackler commentedon Feb 19, 2014
We could make a variant of
trim_right
that did an in-place modification of an owned string, but I'd shy away from doing the same fortrim
andtrim_left
since that'll be a pretty expensive operation compared to slicing.mneumann commentedon Feb 19, 2014
@sfackler: Chopping off the newline character is so common that there should be a utility function for this purpose. I expect
chomp
to also return a slice.sfackler commentedon Feb 19, 2014
@mneumann right, that's what the
trim*
family does: http://static.rust-lang.org/doc/master/std/str/trait.StrSlice.html#tymethod.trimmneumann commentedon Feb 19, 2014
@sfackler: Yes, but they also trim whitespaces, unless you want to write
input.trim_right_chars(& &['\n', '\r']))
, which is very verbose and would chop off as many newline characters as there are, and in regardless which order ("\r\n"
or"\n\r"
). Of course the latter cannot happen when usingread_line
, but still I prefer a specialized "strip the newline off" method.6 remaining items