Skip to content

proposal: compress/gzip: add distinct error for trailing garbage data #67986

Closed
@rgmz

Description

@rgmz

Proposal Details

Context

When using the compress/gzip package to decompress gzip files, receiving a gzip: invalid header error can indicate two distinct possibilities.

  1. The file is not a valid gzip file.

    Example test case (click to expand)
    import (
        "compress/gzip"
        "io"
        "os"
        "testing"
    )
    
    func TestNotAValidGzipFile(t *testing.T) {
        f, err := os.Open("/tmp/python-3.12.3-amd64.exe") // Arbitrary example, any non-gzip file will suffice.
        if err != nil {
    	    t.Fatal(err)
        }
        defer f.Close()
    
        rc, err := gzip.NewReader(f)
        if err != nil {
    	    t.Fatal(err)
        }
        defer rc.Close()
    
        _, err = io.ReadAll(rc)
        if err != nil {
    	    t.Fatal(err)
        }
    }
    
    // === RUN   TestNotAValidGzipFile
    //  gzip_test.go:19: gzip: invalid header
  2. The file is a valid gzip file, but contains invalid trailing data.

    Example test case (click to expand)
    func TestValidGzipFileWithTrailingData(t *testing.T) {
      // Reproducer file. There are many examples of this.
      // https://github.com/udacity/self-driving-car-sim/blob/4b1f739ebda9ed4920fe895ee3677bd4ccb79218/Assets/Standard%20Assets/Environment/SpeedTree/Conifer/Conifer_Desktop.spm
      f, err := os.Open("/tmp/Conifer_Desktop.spm")
      if err != nil {
        t.Fatal(err)
      }
      defer f.Close()
      
      rc, err := gzip.NewReader(f)
      if err != nil {
        t.Fatal(err)
      }
      defer rc.Close()
      
      _, err = io.ReadAll(rc)
      if err != nil {
        t.Fatal(err)
      }
    }
    
    // === RUN   TestValidGzipFileWithTrailingData
    //  gzip_test.go:19: gzip: invalid header

Scenario 2 can be especially confusing because Go's implementation of compress/gzip rejects invalid trailing data, while many popular applications and languages do not. Hence, the ambiguity of this error can lead people to believe that Go is rejecting a valid gzip file.

$ file -i Conifer_Desktop.spm
Conifer_Desktop.spm: application/gzip; charset=binary
$ gzip -S .spm -d /tmp/Conifer_Desktop.spm
gzip: stdin: decompression OK, trailing garbage ignored. 

Side-note: #47809 (comment) explains why this choice was made and how to handle invalid trailing data using gzip.Reader.Multistream.

Proposal Details

I propose adding a distinct error for when subsequent inputs in a data stream don't have valid headers (i.e., trailing garbage). This would provide clarity for users who many not realize that they need to disable gzip.Reader.Multistream. Additionally, it could be used to programmatically call Multistream(false) for specific files with trailing garbage.

I will leave the specific error message and behave for future discussion, should there be interest in implementing this proposal.


Caveat: I am not familiar with the implementation of gzip.Reader. This change may not be possible or desirable due to technical limitations.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Incoming

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions