More powerful session API #495

Mathnerd314 · 2023-09-02T03:28:53Z

Based on a discussion on Discord between me and @borzunov in the webui discussion, I didn't want it to get lost.

So consider a simple program using sessions:

with model.inference_session(max_length=500) as sess:
   output1 = model.generate('[Input 1]', max_new_tokens=2, session=sess, do_sample=True, temperature=0.9, top_p=0.6)
   output2 = model.generate(`[Input 2]`, max_new_tokens=2, session=sess, do_sample=True, temperature=0.9, top_p=0.6)

Currently the way the session API works is that it keeps history, so the first command generates [Output 1] and then the second command generates starting from [Input 1][Output 1][Input 2]. But compared to the usual transformers API this is quite restrictive, it is really only useful for chat-like applications where you can never go back and edit anything.

It would be a more powerful API if instead the .generate() calls acted as though they were unrelated/independent, and then the session managed the reuse logic internally. So for example if you wanted the old behavior you would call model.generate("[Input 1][Output 1][Input 2]") in the second call, but if you didn't you could still do model.generate("[Input 2]"). It's fairly cheap to process a buffer of tokens in Python and analyze it for potential reuse patterns.

As far as the reuse logic, I have developed the outline for a little algorithm that I think will work in most cases.

def do_generation(old,new):
  start =0
  while(old[start] != new[0]):
    start += 1
  len = 0
  while(old[start + len] == new[len]):
    len += 1
  reuse_inference(old[start:start+len])
  reprocess_fresh(new[len:])

This supports three main use cases:

prefix - similar to the old one, doing a then ab will reuse the blocks
prefix with different suffix - doing ab then ac will reuse a
rolling - doing abc then bcd will drop a then reuse bc, which can happen as you get long prompts that exceed the context length

Per Alexander B., this would actually be fairly easy to implement in petals, but currently, it is not yet implemented.

The text was updated successfully, but these errors were encountered:

borzunov · 2023-09-19T23:00:13Z

Hi @Mathnerd314,

Your suggestions sound reasonable. We'll start with an option to slice inference session (reuse_inference(old[start:end])) - I hope to add it in the nearest releases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More powerful session API #495

More powerful session API #495

Mathnerd314 commented Sep 2, 2023

borzunov commented Sep 19, 2023

More powerful session API #495

More powerful session API #495

Comments

Mathnerd314 commented Sep 2, 2023

borzunov commented Sep 19, 2023