Skip to content

Encode 32 bit integers in arrays #266

Closed
@sgraf812

Description

@sgraf812

There are multiple issues related to running into the 16 bit limit of the tables encoded by happy:

I don't see the appeal in having 16 bit array entries; let's double it to 32 bit.

  • I had a look at the Parser.hs for GHC. Just crudely counting the width of the lines ending in "#, I count about 1M characters. That's at most 250KB of tables (\xFF encodes one byte); doubling that to 500KB won't hurt.
  • For the big reproducer of Code generated is unpredictably incorrect #93, we have roughly 2.5M characters. That means we are at 1.25MB after doubling; still not too bad.

More seriously, I tried bloaty on GHC's parser:

nix run nixpkgs#bloaty -- _validate/stage1/compiler/build/GHC/Parser.o
    FILE SIZE        VM SIZE
 --------------  --------------
  38.4%  1.14Mi  79.0%  1.14Mi    .text
  32.6%   991Ki   0.0%       0    .rela.text
  11.8%   360Ki   0.0%       0    .rela.data
   5.9%   179Ki  12.1%   179Ki    .data
   4.7%   142Ki   0.0%       0    .strtab
   3.8%   116Ki   7.9%   116Ki    .rodata
   2.0%  62.0Ki   0.0%       0    .symtab
   0.5%  14.3Ki   1.0%  14.2Ki    .rodata.str
   0.2%  6.95Ki   0.0%       0    .rela.rodata
   0.0%     192   0.0%       0    [ELF Headers]
   0.0%     187   0.0%       0    .shstrtab
   0.0%     146   0.0%       0    .comment
   0.0%     112   0.0%      48    .note.gnu.property
   0.0%      23   0.0%       0    [Unmapped]
 100.0%  2.97Mi 100.0%  1.44Mi    TOTAL

and then on the repro for #93 (Edit: it turns out that the linked executable contains the whole RTS of course; and the tables are all in .rodata contributing just 380KB):

nix run nixpkgs#bloaty -- tests/issue93
    FILE SIZE        VM SIZE
 --------------  --------------
  31.6%  4.24Mi  69.4%  4.24Mi    .text
  17.4%  2.33Mi   0.0%       0    .strtab
  15.5%  2.07Mi   0.0%       0    .debug_info
  10.4%  1.40Mi  22.9%  1.40Mi    .data
   8.2%  1.10Mi   0.0%       0    .debug_loc
   6.0%   827Ki   0.0%       0    .symtab
   3.2%   444Ki   0.0%       0    .debug_line
   2.8%   381Ki   6.1%   380Ki    .rodata
   2.7%   376Ki   0.0%       0    .debug_ranges
   0.9%   119Ki   0.0%       0    .debug_abbrev
   0.6%  76.1Ki   0.0%       0    .debug_str
   0.4%  49.2Ki   0.8%  49.2Ki    .eh_frame
   0.0%       0   0.3%  18.4Ki    .bss
   0.1%  10.2Ki   0.2%  10.1Ki    .eh_frame_hdr
   0.1%  7.49Ki   0.1%  5.27Ki    [23 Others]
   0.0%  5.76Ki   0.1%  5.70Ki    .dynsym
   0.0%  5.52Ki   0.1%  5.46Ki    .rela.plt
   0.0%  4.91Ki   0.0%       0    .debug_aranges
   0.0%  4.12Ki   0.0%       0    .debug-ghc-link-info
   0.0%  3.72Ki   0.1%  3.66Ki    .plt
   0.0%  2.68Ki   0.0%  2.62Ki    .dynstr
 100.0%  13.4Mi 100.0%  6.11Mi    TOTAL

So actually it's a bit larger than anticipated; I wonder why that is but I'm not going to investigate.

Anyway, I think it's far more important to guarantee that large parsers can be generated correctly rather than to generate them incorrectly in the smallest space possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions