Closed
Description
There are multiple issues related to running into the 16 bit limit of the tables encoded by happy:
- Code generated is unpredictably incorrect #93, debugging this must have been a horrible experience. Kudos, @harpocrates
- A workaround that doesn't scale beyond parsers double the size Support larger offset tables #102
- Debugging "grammar does not fit in 16-bit representation that is used with '--ghc'"? #199
I don't see the appeal in having 16 bit array entries; let's double it to 32 bit.
- I had a look at the Parser.hs for GHC. Just crudely counting the width of the lines ending in
"#
, I count about 1M characters. That's at most 250KB of tables (\xFF
encodes one byte); doubling that to 500KB won't hurt. - For the big reproducer of Code generated is unpredictably incorrect #93, we have roughly 2.5M characters. That means we are at 1.25MB after doubling; still not too bad.
More seriously, I tried bloaty
on GHC's parser:
nix run nixpkgs#bloaty -- _validate/stage1/compiler/build/GHC/Parser.o
FILE SIZE VM SIZE
-------------- --------------
38.4% 1.14Mi 79.0% 1.14Mi .text
32.6% 991Ki 0.0% 0 .rela.text
11.8% 360Ki 0.0% 0 .rela.data
5.9% 179Ki 12.1% 179Ki .data
4.7% 142Ki 0.0% 0 .strtab
3.8% 116Ki 7.9% 116Ki .rodata
2.0% 62.0Ki 0.0% 0 .symtab
0.5% 14.3Ki 1.0% 14.2Ki .rodata.str
0.2% 6.95Ki 0.0% 0 .rela.rodata
0.0% 192 0.0% 0 [ELF Headers]
0.0% 187 0.0% 0 .shstrtab
0.0% 146 0.0% 0 .comment
0.0% 112 0.0% 48 .note.gnu.property
0.0% 23 0.0% 0 [Unmapped]
100.0% 2.97Mi 100.0% 1.44Mi TOTAL
and then on the repro for #93 (Edit: it turns out that the linked executable contains the whole RTS of course; and the tables are all in .rodata contributing just 380KB):
nix run nixpkgs#bloaty -- tests/issue93
FILE SIZE VM SIZE
-------------- --------------
31.6% 4.24Mi 69.4% 4.24Mi .text
17.4% 2.33Mi 0.0% 0 .strtab
15.5% 2.07Mi 0.0% 0 .debug_info
10.4% 1.40Mi 22.9% 1.40Mi .data
8.2% 1.10Mi 0.0% 0 .debug_loc
6.0% 827Ki 0.0% 0 .symtab
3.2% 444Ki 0.0% 0 .debug_line
2.8% 381Ki 6.1% 380Ki .rodata
2.7% 376Ki 0.0% 0 .debug_ranges
0.9% 119Ki 0.0% 0 .debug_abbrev
0.6% 76.1Ki 0.0% 0 .debug_str
0.4% 49.2Ki 0.8% 49.2Ki .eh_frame
0.0% 0 0.3% 18.4Ki .bss
0.1% 10.2Ki 0.2% 10.1Ki .eh_frame_hdr
0.1% 7.49Ki 0.1% 5.27Ki [23 Others]
0.0% 5.76Ki 0.1% 5.70Ki .dynsym
0.0% 5.52Ki 0.1% 5.46Ki .rela.plt
0.0% 4.91Ki 0.0% 0 .debug_aranges
0.0% 4.12Ki 0.0% 0 .debug-ghc-link-info
0.0% 3.72Ki 0.1% 3.66Ki .plt
0.0% 2.68Ki 0.0% 2.62Ki .dynstr
100.0% 13.4Mi 100.0% 6.11Mi TOTAL
So actually it's a bit larger than anticipated; I wonder why that is but I'm not going to investigate.
Anyway, I think it's far more important to guarantee that large parsers can be generated correctly rather than to generate them incorrectly in the smallest space possible.
Metadata
Metadata
Assignees
Labels
No labels