-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue: filtering inputs/outputs should happen after serialization #677
Comments
Actually, the filtering is more broken than I thought: it looks like modifications I make to the input/output dict are actually reflected in the actual objects I use in my code ? so if I delete a key in the langsmith inputs dict, it's no longer present when I try to access it in my code? |
You're entirely in control of not mutating your values. The output is supposed to be whatever you want sent over, so you'd do:
It's meant to be treated functionally. Mutating stuff that's passed by reference is often kinda wonky in general If we did it after serialization, it would be a lot harder to reason about because you don't get to use your own types |
So there's two issues here (which I jumbled together, my bad). 1. filtering should happen on the serialized objects (my opinion):I still think it would be much easier to implement filtering on the final "serialized" dicts representing the objects being sent to langsmith - makes it a lot easier to just traverse the dicts to look for keys we want to remove. The example you showed above feels a bit hacky - then I might have some objects serialized one way (e.g. 2. Mutable inputsA few comments:
|
Hmm - while i see there's friction, I still don't agree with the suggested improvements unless you have a good example. Maybe a good middle ground would be if we expose our serializer? Then you could always do
For (1), processing it after we serialize it feels like an even worse UX. Take a standard python object, for example. If your function uses an object, you could define a serializer on the object:
So then you could do this:
Otherwise, you'd get our best effort. In this case it looks OK:
But in other cases it looks weird, like our default serialization for the literal
For (2), we ourselves have to copy your object to serialize it |
Sorry to let this hang, had to put out a few other fires. (1) Indeed, I had resorted to just hijacking your private serializing logic. Maybe supporting that officially would be good. And ideally, you would allow passing custom serializers for specific classes - (yet) another issue I'm facing with this is that some classes are not serializable, and there's no easy way to tell langsmith to serialize them a specific way. For instance, I get tons of
In this case I think they just get serialized using I'm a bit confused by your For (2): I think if you take care of copying the object, you could do it in the right place with a lock that prevents other threads from accessing the object while it's being copied? Not really sure of the best way here, it might have some performance impacts. |
Hi @ldorigo thanks for pushing on this! In the most recent version I added additional copying of inputs/outputs before the hide_* methods are called - the values would look like the pre-serialized dictionaries without any of the langsmith serialization logic applied. If you run into atomicity issues then I can roll those changes back We attempt a deep copy in most cases. However there are some situations/payloads (locks, generators, playwright browser handles, etc.) that can't be deepcopied, so we resort to a depth-restricted recursive copy. Will keep this issue open until we confirm the things are working as intended. |
Ah fun - the DBRef objects are coming from a MongoDB integration I take it? |
One other thing we could look into doing is some API for letting you annotate or process how to serialize things on a function/runnable object level. For the SDK i was thinking something like this: #739 But for langchain core it would probably have to look a bit different |
Issue you'd like to raise.
Currently when running the client as
LangSmithClient(hide_inputs=myfilter, hide_outputs=myfilter)
, themyfilter
function is called before the objects are serialized for sending to langsmith.Suppose we have e.g. an object like:
I would like to remove
a_very_secret_number
, but I can't do it while the object is not serialized - Pydantic would throw an error.Suggestion:
No response
The text was updated successfully, but these errors were encountered: