-
Notifications
You must be signed in to change notification settings - Fork 352
Description
Hi! I'm trying to make some unofficial Rust bindings for libunit.a
(see unit-rs), and I have some questions about the C API.
Right now my bindings are pretty bad. It seems there's several things I got wrong and I might have several misconceptions that are making my bindings a lot more restrictive than they should be. Still, I am really impressed with the technical aspects of Unit and how it works, and it's been really fun to work with it, and I'd like to rework my bindings to better match the API's capabilities.
If anyone answers any of these questions, I'd like to improve the descriptions in the nxt_unit.h
header. Would a PR for that be accepted?
The questions:
Multi-threading and thread local storage
Assuming that contexts and requests are only accessed with a locked mutex, can Unit's C API functions be called from a different thread than the thread which created the context/request? In other words, do the context/request objects rely on thread-specific things like variables in thread-local-storage?
More specifically...
nxt_unit_init()
returns a context object that must then be destroyed with nxt_unit_done()
. Can nxt_unit_done()
be called from a different thread?
nxt_unit_ctx_alloc()
creates a secondary context based on the main context. Can nxt_unit_done()
be called from a different thread than the one which created the context?
Can nxt_unit_run()
be called on a different thread than the one which created the context?
The request_handler()
callback will be called on the thread that runs nxt_unit_run()
, and it will be given a request object. Can methods that use this request object (like nxt_unit_response_send
, nxt_unit_response_buf_alloc
, etc) be called on a different thread than the one which received the request object?
If I get a request from nxt_unit_dequeue_request()
, can I send that request to a different thread and call API functions on it there?
Request body streaming
From my experiments, Unit supports a max of 8MB bodies, buffers the whole body, and then calls this data_handler()
callback at most once. Is that correct, or should I expect it to be called multiple times for slow-writing clients?
Also from my experiments, if data_handler()
is to be called, then before that, in request_handler()
, the nxt_unit_request_read()
API always returns 0 bytes. Is that always the case? Does nxt_unit_request_read()
always return all or nothing? Or can I expect partial results?
I don't see blocking/non-blocking variants for nxt_unit_request_read()
. Can I safely assume that nxt_unit_request_read()
is always non-blocking?
Is the NXT_UNIT_AGAIN
error code related in any way to the above?
Is the nxt_unit_app_test.c
example incorrect for requests with large request bodies?
Clean shutdown
Let's say a thread wants to quit (e.g. it experienced a fatal error). Is my only option to exit()
the process? Is there any way to trigger a graceful shutdown of this process, so that all other threads can finish whatever request they are handling, and then be given a QUIT
message?
Also, what happens if nxt_unit_done()
is called on the main context when there are still secondary contexts created from the main one? Will they cleanly shut down, or is this undefined behavior?
Does the main context have to live for at least as long as the contexts spawned from it, or can it be done
'd earlier?
Request response buffers
Can nxt_unit_response_buf_alloc()
be called multiple times before sending one of the buffers? In other words, can multiple buffers exist at the same time?
Can I send response buffers in reverse order?
What is nxt_unit_buf_next()
for? Does its result affect nxt_unit_buf_send()
in any way?
Is it safe to call nxt_unit_request_done()
on a request before sending or deallocating all of the buffers? If yes, will the buffers be automatically deallocated?
Since there is a non-blocking version of nxt_unit_response_write()
, then I assume nxt_unit_response_write()
is the blocking variant. When this blocks, the entire thread will be unavailable to process other requests. Is this vulnerable to clients with slow-reading, or will the Unit server accept and buffer the whole response even if the client doesn't read it?
Does nxt_unit_buf_send()
block? If yes, is it susceptible to slow-reading clients? Does it ever return NXT_UNIT_AGAIN
?
Misc questions
When is the close_handler()
callback ever called? Is that only for websockets?
How do nxt_unit_run()
, nxt_unit_run_ctx()
, and nxt_unit_run_shared()
differ?
If I call nxt_unit_malloc()
on one context, can I call nxt_unit_free()
on a different context?
What is NXT_UNIT_AGAIN
for, and what returns this? Can I return or send this myself from anywhere?
Activity
[-]Questions about the C API[/-][+]RUST: Questions about the C API[/+]tippexs commentedon Jul 27, 2022
Hi @andreivasiliu – First, THANK YOU VERY MUCH for working on the initial Rust bindings and sorry for the long delay! As far as I can see you created the Rust bindings manually.
Did you tried to auto generate the Rust bindings from the header files. I have played around with this and would like to get your feedback on this.
Furthermore, there is a Scala Implementation of the same Unit API (Likewise in Go and NodeJS). Maybe we can find some answers to your questions while looking into this code. I would like to talk with @hongzhidao, @hongzhidao and @ac000 about your questions. Gentlemen, please feel free to pick a question and share your thoughts. Will do the same.
The Rust bindings and the possibilities we will have with those are a great step into the right direction to a more widely adoption! looking forward to see this issue grow and be filled with a ton of useful information.
andreivasiliu commentedon Jul 27, 2022
They are created automatically, based on
nxt_unit.h
fromunit-dev
(see wrapper.h), andbindgen
(see build.rs).However, they are only used internally, since the generated bindings are very unsafe to use directly from Rust. The generated bindings use raw pointers that behave like C pointers; Rust code can only use these through the use of
unsafe
, hence the need tor a safe wrapper around them in order to turn them into APIs that match Rust's much stronger memory guarantees (lifetimes, thread safety, unwind safety, etc).Thank you very much!
tippexs commentedon Jul 28, 2022
Got your point with the bindings and sorry for missing it in my inital review of your repo. So the goal is clear:
Having a stable and reliable Rust wrapper around the C-API bindings. I will have a chat with the other engineers to answer the questions you have just posted and come back with answers asap!
st33v3 commentedon Feb 20, 2023
I think the goal here is to have C-API (libunit) documented better. Correct me, if I'm wrong, but currently only source of information about C API is header file (nxt_unit.h) and a blog post about using Unit from assembly language (https://www.nginx.com/blog/nginx-unit-adds-assembly-language-support/). And of course current language bindings that are really hard to read.
Especially information about threading and non-blocking/asynchronous/streaming mode of operation would be really appreciated.
ac000 commentedon Feb 20, 2023
There is also nxt_unit_app_test.c which shows how to use the C API, which may answer some of the above questions.
lcrilly commentedon Feb 20, 2023
And this blog explains how another project created a Scala language module using libunit
https://blog.indoorvivants.com/2022-03-05-twotm8-part-3-nginx-unit-and-fly.io-service
st33v3 commentedon Feb 20, 2023
Thanx for pointing to that direction, test cases could be useful. However, nxt_unit_app_test.c shows pretty basic usage. I'm more interested in proper implementation of backpressure during reading request body and writing response body, all test cases read or write data only once.
For example in Python binding implementation
nxt_unit_response_write_nb
is called repeatedly until is returns 0 (meaning try later). I tried to reproduce, but without any luck. The function either writes response data or returns 1 (NXT_UNIT_ERROR) and in log is following message:Zero is never returned neither NXT_UNIT_AGAIN (which I would expect). Interactions like this one should be documented...
alejandro-colomar commentedon May 20, 2023
I suggest you try and see if it works. I wonder what benefit you'd get from doing that.
My guess is that you can probably do that.
But I'm not sure, so you should try it.
Similarly as the above, probably yes.
Probably yes.
Probably yes.
For all of these questions, my guess is that as long as you have the context object, the thread in which you call things doesn't really matter. But again, I'm not sure, and you could try it.
I have only seen the data_handler() being used in Python code. I haven't investigated about it. Maybe @ac000 knows something about it.
I need to investigate more into this function.
But from reading the source code, it seems
nxt_unit_request_read()
allows partial reads.I think it is always non-blocking, yes.
Not really. This function returns the size of the read, which works similarly to read(2). It doesn't return NXT_UNIT_AGAIN.
In which sense is it incorrect?
In fact, you should rarely exit(3), I think. You should pthread_exit(3) or similar from each thread. Only if the entire app is in an inconsistent state you should suicide the entire app.
I hope the cleanup will be correct. If not, it's a bug in Unit. You should be able to rely on it. Otherwise, if a thread accidentally dies, the complete unitd could be compromised.
It probably needs to exist. I'd guess that if you done() the main one, and there's any other context object still alive, Unit should kill it, as mentioned right above.
Yes, that's supported.
I tested it here:
http://192.168.1.254/src/alx/alx/nginx/unit-c-app.git/commit/?id=c3b6231eca2f7ff9ddf2758922953ef2e97947e1
Yes, that's supported. Also tested there.
Don't know. It's only explicitly used in the java module, and internally in nxt_unit.c. I dind't investigate that. If I learn it, I'll document it.
Yes. See for example the nxt_unit_app_test.c, which calls it in the
fail:
label if anything failed, to clean up. It should safely end the request, and all the resources attached to it.It doesn't look to me as a blocking variant. Look at the source code yourself and judge:
N/A
It doesn't block. It doesn't even send the buffer, actually. Unit will just put the buffers in the queue for sending. If there's a lot of traffic, it may even end up merging several chunks for a single send. See this test:
Hello world!\n
was sent in the same call as the headers, and the lines after it were in a separate buffer, and Unit merged them into a single chunk.I don't know; sorry. Maybe @ac000 ?
No idea. I can see that they are slightly different in the implementation, but they're so complex that I can't tell the actual difference without deep investigation.
The commit logs that introduced them are silent about it, so no idea.
Technically yes, since these are just malloc(3) and free(3) wrappers, and the context is only used for logging. However, the log might then be confusing, but if you expect that one ctx mallocs and another one frees, you should be fine.
However, I wonder why you'd do that.
You can think of it as Unit's
EAGAIN
.It is returned for example in
nxt_unit_run()
.Sure. It's just a number.
andreivasiliu commentedon May 21, 2023
Many thanks!
Rust requires very strict memory and thread safety guarantees in its safe subset of the language. So in Rust, whenever wrapping foreign C code, the unsafe wrapping code (aka my bindings, in this case) must guarantee memory and thread safety under all circumstances, otherwise miscompilations can occur in the safe subset, as the Rust compiler does more assumptions there thanks to those guarantees.
See the Sync, Send, and UnwindSafe markers for more details.
Ah, sorry, I meant whether I can use it as a meaningful return value from my callbacks, e.g. to tell Unit that it should call my callback again later, because I'm not ready yet, or Unit hasn't given me enough data yet. This is how Nginx handlers work if I remember correctly.
Using just pthread_exit() would be the cleanest, but then I would be permanently reducing the number of worker threads in the process.
But I was more interested in the case where there is inconsistent state; the most common reason to use multi-threading is to share cache between threads, and if that gets corrupted, the entire process should exit. At that point, I have two options:
exit()
: this will bypass all of Rust's destructors/drop code, which I'd like to avoid, as it might, for example, leave disk databases in an unclean state.nxt_unit_run()
to immediately return and give control back to my thread. I can't see any way to make Unit do that.I see. So it is blocking, but only with regards to sending it between processes to Unit, via whatever mechanism that uses (pipe file descriptors and/or shared memory, I can't figure out which).
This is important in Rust when creating asynchronous functions (returning Future), which are required to do no blocking I/O.
This behavior also seems to be the same for reading request bodies; from my testing, by the time the app's request handler is called, the Unit server either has the entire request body data from the client, or none of it (i.e. it has just the header). I thought that reading the body data might hang until the client sends more data (which is not the case), or send NGX_AGAIN (which is also not the case).
This also seems to mean that streaming body data from clients is impossible to do with a Unit app. The Unit server will either wait and buffer the entire body data from the client, or give up when the body exceeds 8MB (in which case the client gets a "Request too big" error, and the app never gets anything).
7 remaining items