Constrained generation support #235
Description
This is an open-ended issue; I expect there will be more than one solution to this.
There have been a couple of solutions for constraining the output of generations:
- https://twitter.com/GrantSlatton/status/1657559506069463040
- https://github.com/microsoft/guidance
- https://github.com/piercefreeman/gpt-json
- https://github.com/1rgs/jsonformer
- https://lmql.ai/
The idea's pretty simple: the user supplies some kind of schema, and then generation is forced to match that schema by only sampling/feeding the tokens that fit that schema. jsonformer
is a good place to look for this: it will feed in the JSON structure up to the point where the LLM should generate something, and then samples only the tokens that would be valid in that context.
Given that there are many potential ways to solve this problem and potential output formats, I'm not sure we should bake in one particular solution. My feeling is that we should offer additional crates for this kind of work, but not bake it into llm
specifically.
An example might be a llm-json
crate, which extends InferenceSession
with a trait that takes any serde
-able type and produces structured output:
#[derive(Serialize, Deserialize)]
struct Steps {
steps: Vec<String>,
}
let steps = session.infer_json::<Steps>(/* ... */, format!("The following paragraph describes a process.\n{{paragraph}}\nPlease transcode it to JSON using the following schema: [[SCHEMA]]"))?;
dbg!(steps.steps);
This could also potentially live in llm-chain
(and might be better suited to there), but I'm not sure if their abstraction allows for controlled sampling like this. Would need to chat to them.