You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@earlephilhower Great improvement! Did I get it right that after igrr/newlib-xtensa#5 settles, there would be no need to use F() helpers and special str_P funcions anymore? const char * will go directly to flash and str functions will manage them automatically?
@igrr I've always wondered: what's the reason of keeping const char * into RAM? it is not intended to change anyway. For me, everything flash strings brings in, can be achieved by const char * on other platforms.
The problem is that ESP8266 memory architecture does not support mapping flash into dport (data memory space). It can be mapped only into instruction memory space, and for instruction memory only 4-byte-aligned access is allowed. So while we can put a const char* string into flash, it will not be possible to address it bytewise, which is what most of the existing code does. Therefore we have to place it into RAM.
edit: some projects built around the esp8266 (like esp-open-rtos) feature an "unaligned access exception handler", which is a software workaround for this limitation, at the cost of performance. The handler gets invoked whenever code tries to access instruction memory as if it was byte addressable. Then it does the equivalent of what pgm_read_byte does, and returns from the exception. From application perspective, the byte gets read successfully, but this costs more CPU cycles.
The problem is that igrr/newlib-xtensa#5 is only transparent as long as the code uses printf and does not read the string byte by byte.
A fully transparent workaround is to force the compiler to use 32-bit loads only, even when accessing 1-byte quantities. This was implemented in jcmvbkbc/gcc-xtensa@6b0c9f9, but was never included into the mainline version.
For the discussion of different approaches to this problem (unaligned exception handler, mforce-l32), please have a look at SuperHouse/esp-open-rtos#11.
Reading bytes by exception handler isn't just slow, it's glacially slow. One "ld" turns into an exception trap, stack push, context switch, ~60 instructions of fixup code to figure out what insn failed and fudge the stored registers to make it look like it passed, context switch and complete stack pop. Repeat that for every single byte of every single string. You can look at the assembly code in the Linux kernel for the LXR architecture that does this, it's amazing...
Using only 32-bit accesses even for bytes adds (IIRC) 3 insns to every single non-32bit memory read to shift and mask. Not just PROGMEM accesses, everything has to be done this way. That's pretty painful, too, for the case where most people won't ever benefit from it. You'd also need the OS routines burned in ROM and in the SDKs to be rebuilt with this, as OTW you'd just fault in them when they tried to access PROGMEM stored parameters.
#4160 will fix the original issue here(%S format), but the latest SDK as a silent update added a misaligned access exception handler that makes byte-wise PROGMEM accesses work (albeit kind of slowly).
Activity
earlephilhower commentedon Jan 23, 2018
igrr/newlib-xtensa#5
romansavrulin commentedon Mar 9, 2018
@earlephilhower Great improvement! Did I get it right that after igrr/newlib-xtensa#5 settles, there would be no need to use F() helpers and special str_P funcions anymore? const char * will go directly to flash and str functions will manage them automatically?
igrr commentedon Mar 9, 2018
char *
will not go directly to flash, but there functions likeprintf
will support PROGMEM strings as format arguments and string arguments.romansavrulin commentedon Mar 9, 2018
@igrr I've always wondered: what's the reason of keeping
const char *
into RAM? it is not intended to change anyway. For me, everything flash strings brings in, can be achieved byconst char *
on other platforms.igrr commentedon Mar 9, 2018
The problem is that ESP8266 memory architecture does not support mapping flash into dport (data memory space). It can be mapped only into instruction memory space, and for instruction memory only 4-byte-aligned access is allowed. So while we can put a
const char*
string into flash, it will not be possible to address it bytewise, which is what most of the existing code does. Therefore we have to place it into RAM.edit: some projects built around the esp8266 (like esp-open-rtos) feature an "unaligned access exception handler", which is a software workaround for this limitation, at the cost of performance. The handler gets invoked whenever code tries to access instruction memory as if it was byte addressable. Then it does the equivalent of what
pgm_read_byte
does, and returns from the exception. From application perspective, the byte gets read successfully, but this costs more CPU cycles.romansavrulin commentedon Mar 9, 2018
@igrr I completely understand this architectural factors, but can it be done transparently in libc, like igrr/newlib-xtensa#5 did with strcpy?
igrr commentedon Mar 9, 2018
The problem is that igrr/newlib-xtensa#5 is only transparent as long as the code uses
printf
and does not read the string byte by byte.A fully transparent workaround is to force the compiler to use 32-bit loads only, even when accessing 1-byte quantities. This was implemented in jcmvbkbc/gcc-xtensa@6b0c9f9, but was never included into the mainline version.
igrr commentedon Mar 9, 2018
For the discussion of different approaches to this problem (unaligned exception handler, mforce-l32), please have a look at SuperHouse/esp-open-rtos#11.
romansavrulin commentedon Mar 9, 2018
Thank you for the links. So, how do you think why none of this got mainlined?
earlephilhower commentedon Mar 9, 2018
My $0.02...
Reading bytes by exception handler isn't just slow, it's glacially slow. One "ld" turns into an exception trap, stack push, context switch, ~60 instructions of fixup code to figure out what insn failed and fudge the stored registers to make it look like it passed, context switch and complete stack pop. Repeat that for every single byte of every single string. You can look at the assembly code in the Linux kernel for the LXR architecture that does this, it's amazing...
Using only 32-bit accesses even for bytes adds (IIRC) 3 insns to every single non-32bit memory read to shift and mask. Not just PROGMEM accesses, everything has to be done this way. That's pretty painful, too, for the case where most people won't ever benefit from it. You'd also need the OS routines burned in ROM and in the SDKs to be rebuilt with this, as OTW you'd just fault in them when they tried to access PROGMEM stored parameters.
earlephilhower commentedon Nov 9, 2018
#4160 will fix the original issue here(%S format), but the latest SDK as a silent update added a misaligned access exception handler that makes byte-wise PROGMEM accesses work (albeit kind of slowly).
earlephilhower commentedon Dec 3, 2018
Closed via #5376