Skip to content

URI#display_uri raises ArgumentError: invalid byte sequence in UTF-8 #224

@roback

Description

@roback

Addressable::URI#display_uri raises ArgumentError when called on the url http://example.com%C2. The same happens for http://%D5.example.com.

I get the same error both with and without IDNA:

> Addressable::URI.parse("http://example.com%C2").display_uri
ArgumentError: invalid byte sequence in UTF-8
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:432:in `gsub'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:432:in `unencode'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:530:in `normalize_component'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1079:in `normalized_host'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1177:in `normalized_authority'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2078:in `normalize'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2103:in `display_uri'
    from (irb):1
> Addressable::URI.parse("http://example.com%C2").display_uri
ArgumentError: invalid byte sequence in UTF-8
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/idna/native.rb:36:in `split'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/idna/native.rb:36:in `to_ascii'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1072:in `normalized_host'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:1177:in `normalized_authority'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2078:in `normalize'
    from /.gem/ruby/2.2.3/gems/addressable-2.4.0/lib/addressable/uri.rb:2103:in `display_uri'
    from (irb):1

The cause seems to be calling Addressable::URI.unencode for the above urls which results in a string that Ruby doesn't seem to like:

url = Addressable::URI.unencode("http://%D5.example.com")
# => "http://\xD5.example.com"
url.split(".")
# ArgumentError: invalid byte sequence in UTF-8
#     from (irb):10:in `split'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions