-
Notifications
You must be signed in to change notification settings - Fork 7.9k
pdo_odbc: Don't fetch 256 byte blocks for long columns #10809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I'm marking this as draft because it is a different approach, though one that might be less troublesome. The reason why PDO_ODBC fetched 256 byte blocks seems a little unclear, so I went digging and found 43c1a1b, which references #33533 - that does seem to imply this could create problems instead for drivers that instead declare something like 2 GB available, though that seem like it'd be a problem regardless? |
The other angle is why isn't it just binding all the time; procedural ODBC seems to do this, or at least a lot less conservatively. |
FWIW a user reported to me that they tried this PR and it resolved the original issue for them. |
I was looking around on Sunday night to see if there were any issues that I might be able to help with, and I found this PR. Is it correct to understand that the fact that it has not been merged yet means that it is unresolved? If you need a test run or review, I'm ready to help. |
When I and a client tested it, it did seem to resolve the issue (see issue linked in description), but I'm a bit worried about edge cases and how well this behaves with other drivers outside of Db2i. |
First, I tried it with SQL Server. Test code
ResultsSQL Server + FreeTDSmaster branch
Natty's branch
|
FYI This test code works fine on either branch:
|
I don't expect the issue this patch was written to affect MSSQL (it was a driver bug w/ character conversion that PHP behaviour exacerbated), but the regression there is worrying. Do you know which allocation failed, and how? I'm suspecting it's the string realloc, with somehow a zero length being returned? |
It looks like passing a negative value to L689 (edit) For "こんにちは", the value of |
I'm guessing a negative value is one of those sentinel values that ODBC allows it to return return:
|
Probably |
I was just reading the same page.
It seems like all the data is already included, so I feel like I can count the number of bytes. |
Yeah, I think it would be possible to do an estimation on our side. Keep in mind we can just try again with whatever buffer size we like, and we can just stitch it together at the end, because the final call will always return the right length:
|
No, my plan is no good because it only takes 256 bytes. Counting length in php looks good. However, I overlooked the fact that not all strings could be retrieved the first time. |
Maybe if we get NO_TOTAL, we can fall back to using a fixed size, i.e. the 256 from before? |
Yes, I think it's possible. Indeed, SQL Server did not seem to have the same problems as db2 for i, so it seems that there will be no problem with fallback. |
But maybe we should wait a little. The procedural version of |
Both bound fetches and direct results (w/ odbc_result) are stringy like PDO_ODBC is, and direct fetches have similar-ish logic. The loop there is pretty similar; though they do handle the NULL_DATA and NO_TOTAL cases, but errors out on the latter (which is better than crashing, at least). |
@NattyNarwhal
|
take3 When |
Take2 has failed the test. I think take3 is smarter. |
By the way, no matter how I try, I cannot insert multibyte characters into db2 of PUB400.COM from php. |
I agree take3 looks better; erroring on NO_TOTAL is annoying for any conversion cases (so, pretty much any case with drivers converting encodings from DB to client representation).
How are you inserting/what's your table specification? You might want to specify the CCSID (encoding) of the column, to force it into a. I often annotate |
Did take the time to test - NO_TOTAL cases are handled so that it doesn't fail. (In the case of the original bug, NO_TOTAL triggers it because the 256-byte buffer fallback isn't enough, but it means for most cases, the driver option to suppress NO_TOTAL is unneeded.) Another option is instead of using 256 as the fixed size, get the size of the column instead for the initial/no total cases. I bet in most cases, there's a good chance the described column length is sufficient for a single round. While you could call diff --git a/ext/pdo_odbc/odbc_stmt.c b/ext/pdo_odbc/odbc_stmt.c
index e2bdc95ef3..3e491ebc16 100644
--- a/ext/pdo_odbc/odbc_stmt.c
+++ b/ext/pdo_odbc/odbc_stmt.c
@@ -651,6 +651,8 @@ static int odbc_stmt_get_col(pdo_stmt_t *stmt, int colno, zval *result, enum pdo
pdo_odbc_stmt *S = (pdo_odbc_stmt*)stmt->driver_data;
pdo_odbc_column *C = &S->cols[colno];
+fprintf(stderr, "col #%d (%s) size??? %zd\n", colno, ZSTR_VAL(stmt->columns[colno].name), stmt->columns[colno].maxlen);
+
/* if it is a column containing "long" data, perform late binding now */
if (C->is_long) {
SQLLEN orig_fetched_len = SQL_NULL_DATA; and it'll output:
Changing it to not use an initial 256-byte buffer and using a dynamic one instead though requires more thought though:
It might be worth bumping the fallback buffer size to a larger one - something in the kilobytes or megabytes. |
Thank you. I simply inserted to var_char(2000). All multi-byte characters are deleted and inserted. I'll try your advice.
I see. Am I correct in understanding that in most use cases, if the block size is larger, it can be fetched in one round? That makes sense. Btw, I was thinking of another method. In the first place, I was wondering why db2 characters become garbled when separated by 256 bytes, and I wondered where on earth they were encoded. This is completely a random idea and I haven't tried it yet, but I thought that using |
I'm using duckdb-odbc, see https://duckdb.org/docs/installation/?version=latest&environment=odbc&installer=binary&platform=linux |
I'm not sure if you'd be encountering the same issue I had that drove me to write the PR. That said, if you could try the PR and see if it works for you (it should rebase onto current master fine), that'd be helpful. If it solves your issue, I'd have some confidence this is the right approach in general. |
I installed (This is probably something only Natty would know. ↓) First, like db2,
Suppose we repeat the loop and This last loop conflicts with the This means that the fetch loop starts again and never ends. I think there is a problem with @NattyNarwhal |
By the way, unrelated to the above topic, I suddenly thought of something, how about making it possible to set If you specify It may be a good option for users who only handle small amounts of data that do not require buffers. |
Your explanation as to why is fails makes sense; DuckDB shouldn't be returning more data if we hit theoretical end of buffer and try to read more. It's been a while since I looked at this, but maybe we need to explicitly break the loop when we get less data. I think the logic as to why it did this before was it'd hit the else block when it got SQL_NO_DATA for another attempt (which might help if the driver was misleading about there actually being stuff at the end for encoding conversion cases?), but if this driver doesn't do that... The other proposal I'm a bit mixed on. I see the purpose, but it's another knob to have to worry about as a user (what does it do, does changing it help me, if I should change it, shouldn't it be at a better default?), and PHP tends to shy away from that recently (i.e. less emphasis on INI options, it makes testing more complex). However, if we can't find a reasonable default (be it fixed buffer or based on column size, whatever), there might not be a better option. |
Thanks for checking.
There is such a case... odbc bugs are really troublesome. If that's the case, maybe I mean |
Yeah, if DuckDB doesn't fix this upstream (and we point users to update their driver if they report this), we'll have to work around this. Point, I didn't know pdo_mysql did this. If there's precedent, it's probably fine to do this (and maybe borrow the buffer size they use by default?). |
Agree, I think it would be a good idea to first hope that the bug will be fixed upstream, and if it isn't fixed, then think of a workaround again. Thanks, borrowing the default value is a good idea. In the case of MySQL, a different attribute value ( |
That makes sense to me; sentinel value doesn't matter per se. Figuring out an ideal default is the trickiest part, but it sounds like pdo_mysql's ~1 MB default might be reasonable since it does something similar. My guess is most columns are smaller than that, and the ones larger than that are probably going to be things like |
Yeah, your view is probably correct. Do you have time to add these changes? It took some time, but I think we were able to find specifications that we were fully satisfied with. |
DuckDB folks are normally very eager to fix valid issues, I would recommend filing an issue at: https://github.com/duckdb/duckdb/tree/main/tools/odbc |
I've got it. I'll report it later. |
@NattyNarwhal I think it's still a good idea to borrow the default value, but it may not be necessary to allow the buffer size to be set by the user. |
I'm afraid there is no general solution. If drivers don't properly report Increasing the buffer size appears to be a reasonable mitigation, and I don't really see the reason to even have a separate buffer. The relevant part of the |
I think we should pursue this, since it might actually improve performance, and at least would mitigate the issues with drivers not properly counting the available data length. I would suggest to have a buffer size that just fits into a ZendMM page (4KiBi); if we're storing in a |
Do you think a variable buffer size would be better? We were experimenting with that approach last time me and Saki touched the PR. I agree that the current size of 256 is too small, a bigger size alone would probably work around a lot of driver issues, so it's worth updating that initial buffer size at least. (There are other criteria like i.e. MTUs and DB protocol overhead over the network, but that seems really hard to predict when localhost is a very common case.) |
Well, I'm still interested in this too. |
@SakiTakamachi How do you feel about merging this for PHP 8.5? I'm still a little hesitant as previously mentioned, but it does seem to resolve issues (i.e. the one I originally diagnosed this for w/ DBCS, and fleadram's here), but those experiences are specific to Db2i (which I'm more familiar with). |
Since there is a mention of the buffer size, I personally think we can merge it into 8.5 once that part is fixed. |
Fetching 256 byte blocks can confuse some drivers with conversion routines. That, and it seems to me the round trips to and from a database could be a major performance impact. Instead, we try to fetch all at once, and continue fetching if a driver somehow has more for us. This has been tested with a problematic case with the Db2i driver with stateful MBCS encodings. See phpGH-10733 for discussion about this and issues it can resolve.
change to separate by 256 bytes, when C->fetched_len == SQL_NO_TOTAL changed from 256 byte to 2048 byte buf block.
Could be configurable maybe, but best to avoid magic numbers even for a compile-time constant.
Change recommended by Christoph. Probably a little better performance wise I have to guess.
a97752e
to
5dc5975
Compare
Fetching 256 byte blocks can confuse some drivers with conversion routines. That, and it seems to me the round trips to and from a database could be a major performance impact.
Instead, we try to fetch all at once, and continue fetching if a driver somehow has more for us.
This has been tested with a problematic case with the Db2i driver with stateful MBCS encodings.
See GH-10733 for discussion about this and issues it can resolve.