Encode 32 bit integers in arrays

There are multiple issues related to running into the 16 bit limit of the tables encoded by happy:

* https://github.com/haskell/happy/issues/93, debugging this must have been a horrible experience. Kudos, @harpocrates 
* A workaround that doesn't scale beyond parsers double the size https://github.com/haskell/happy/pull/102
* https://github.com/haskell/happy/issues/199

I don't see the appeal in having 16 bit array entries; let's double it to 32 bit.

* I had a look at the Parser.hs for GHC. Just crudely counting the width of the lines ending in `"#`, I count about 1M characters. That's at most 250KB of tables (`\xFF` encodes one byte); doubling that to 500KB won't hurt.
* For the big reproducer of #93, we have roughly 2.5M characters. That means we are at 1.25MB after doubling; still not too bad.

More seriously, I tried `bloaty` on GHC's parser:

```
nix run nixpkgs#bloaty -- _validate/stage1/compiler/build/GHC/Parser.o
    FILE SIZE        VM SIZE
 --------------  --------------
  38.4%  1.14Mi  79.0%  1.14Mi    .text
  32.6%   991Ki   0.0%       0    .rela.text
  11.8%   360Ki   0.0%       0    .rela.data
   5.9%   179Ki  12.1%   179Ki    .data
   4.7%   142Ki   0.0%       0    .strtab
   3.8%   116Ki   7.9%   116Ki    .rodata
   2.0%  62.0Ki   0.0%       0    .symtab
   0.5%  14.3Ki   1.0%  14.2Ki    .rodata.str
   0.2%  6.95Ki   0.0%       0    .rela.rodata
   0.0%     192   0.0%       0    [ELF Headers]
   0.0%     187   0.0%       0    .shstrtab
   0.0%     146   0.0%       0    .comment
   0.0%     112   0.0%      48    .note.gnu.property
   0.0%      23   0.0%       0    [Unmapped]
 100.0%  2.97Mi 100.0%  1.44Mi    TOTAL
```

and then on the repro for #93 (Edit: it turns out that the linked executable contains the whole RTS of course; and the tables are all in .rodata contributing just 380KB):

```
nix run nixpkgs#bloaty -- tests/issue93
    FILE SIZE        VM SIZE
 --------------  --------------
  31.6%  4.24Mi  69.4%  4.24Mi    .text
  17.4%  2.33Mi   0.0%       0    .strtab
  15.5%  2.07Mi   0.0%       0    .debug_info
  10.4%  1.40Mi  22.9%  1.40Mi    .data
   8.2%  1.10Mi   0.0%       0    .debug_loc
   6.0%   827Ki   0.0%       0    .symtab
   3.2%   444Ki   0.0%       0    .debug_line
   2.8%   381Ki   6.1%   380Ki    .rodata
   2.7%   376Ki   0.0%       0    .debug_ranges
   0.9%   119Ki   0.0%       0    .debug_abbrev
   0.6%  76.1Ki   0.0%       0    .debug_str
   0.4%  49.2Ki   0.8%  49.2Ki    .eh_frame
   0.0%       0   0.3%  18.4Ki    .bss
   0.1%  10.2Ki   0.2%  10.1Ki    .eh_frame_hdr
   0.1%  7.49Ki   0.1%  5.27Ki    [23 Others]
   0.0%  5.76Ki   0.1%  5.70Ki    .dynsym
   0.0%  5.52Ki   0.1%  5.46Ki    .rela.plt
   0.0%  4.91Ki   0.0%       0    .debug_aranges
   0.0%  4.12Ki   0.0%       0    .debug-ghc-link-info
   0.0%  3.72Ki   0.1%  3.66Ki    .plt
   0.0%  2.68Ki   0.0%  2.62Ki    .dynstr
 100.0%  13.4Mi 100.0%  6.11Mi    TOTAL
```

So actually it's a bit larger than anticipated; I wonder why that is but I'm not going to investigate.

Anyway, I think it's far more important to guarantee that large parsers *can* be generated correctly rather than to generate them incorrectly in the smallest space possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Encode 32 bit integers in arrays #266

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Encode 32 bit integers in arrays #266

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions