-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat] get vizro ai customized text output #488
base: main
Are you sure you want to change the base?
Conversation
I like it! I think this should have the exact same API as the A couple of ideas for suggestions:
Other than that, think this is great 💪 let's do it Will do a final review once you finalised it, thanks! |
Thank you!
I am leaning towards including output in the name, probably get_plot_output_dict if it is not too long?
Here we are using INFO instead of WARNING, it would be nice to notify user they haven't turned on explain and won't be able to get business insights and code explanation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr: this doesn't feel right to me, but maybe it's the best solution for now. I don't have much context on it so happy to go with what you guys think is best. We can always change in the future since I imagine the API here may well change anyway in a breaking way when we add the dashboard functionality.
I'm away till Wednesday next week but happy to discuss more then or if you want to move ahead with this before I'm back that's totally fine too.
Sorry it took so long to review this but I wanted to give it a proper look!
To be honest I am not a big fan of the change, but also not a big fan of the current alternatives. I don't have much context on what we want to achieve here or why we need to return these additional objects, but on the face of it this feels weird to me. Maybe it's ok as a temporary solution but I don't think it's right in the long term. Removing it in future would be a breaking change, but we can be much more relaxed about on VizroAI than on Vizro. So basically I don't have a strong objection if you guys think it's the right thing to do, but it doesn't feel right to me as a long term solution.
Things that feel wrong
-
API: returning a dictionary for something that is structured data with fixed fields (see e.g. https://www.attrs.org/en/stable/why.html#dicts, https://stackoverflow.com/questions/354883/alternatives-for-returning-multiple-values-from-a-python-function). Here I think the only sensible options are namedtuple or dataclass. And out of those you can't really go wrong with dataclass I think.
-
API: Two functions with the same signature that do something basically identical but one returns a subset of the other. Sometimes this makes sense, but here I think it does not (sorry @maxschulz-COL. And also @Anna-Xiong because I said I basically agreed with Max on this before, but that was before I had read through more of the details).
-
Implementation (so less important than the above two points): this is a case where the code could be much more DRY. There's two functions that do something basically identical but are copy and pasted. If
plot
is effectively a "shortcut wrapper" forget_plot_outputs
then probably it should be implemented asdef plot(...): return get_plot_outputs(...)["fig_object"]
.
What would feel better?
In an ideal world where plot
didn't already exist and return go.Figure
then probably something like this:
@dataclass
class Plot:
# I'm just taking the same property names as you use now, even if they could maybe be improved a bit
business_insights: str
code_explanation: str
code_string: str
fig: go.Figure
class VizroAI:
def plot(...) -> Plot:
...
# use as:
plot = VizroAI().plot()
print(plot.business_insights)
vm.Graph(figure=plot.figure)
Then the user has access to everything they need, can choose what to take from the output object, etc. without needing nearly-duplicated methods.
It might be worth thinking about how things will evolve in future, e.g. if there's a new dashboard
method. What would the different patterns look like then?
# Following pattern proposed here - potentially many duplicated methods
class VizroAI:
def plot(...):
...
def get_plot_outputs(...):
...
def dashboard(...):
...
def get_dashboard_outputs(...):
...
# Following my above patten
class VizroAI:
def plot(...) -> Plot:
...
def dashboard(...) -> Dashboard: # maybe a bad name for class since will be confused with vizro.models.Dashboard, but you get the idea
# Here it means a dataclass containing a vm.Dashboard object and maybe other things like the code string
...
For now
In the non-ideal world we live in where plot
does already exist and return go.Figure
and that was a change made recently (? I think) then we should probably not do the above since it's a breaking change. What would then be the options?
# Same as here but don't use dict as return type and refactor to be more DRY
class VizroAI:
def plot() -> go.Figure:
# ultimately if this is a higher-level entry point than get_plot_outputs, it probably shouldn't have as many arguments as get_plot_outputs. But you can't remove arguments for now without it being breaking
return self.get_plot_outputs.figure
def get_plot_outputs() -> Plot:
...
# The alternative Anna mentioned before where a flag determines the return type
class VizroAI:
def plot(verbose: bool) -> Union[go.Figure, Plot]: # rename verbose to something that makes sense
...
tbh I don't much like either of these (variable return types are generally awkward, though in this case it's not a huge problem, just a bit of a code smell) but would decide based on what arguments to plot
vs. get_plot_outputs
would be:
- if they should remain the same signature then probably do the
Union
return type and oneplot
method - if
get_plot_outputs
might in future have more arguments then probably do as two methods and leaveplot
as the "shortcut wrapper"
Great review from Antony, and on a more thorough review I agree with some but not all points. Let me try to summarize: Regarding what would feel betterI disagree that we should return a Regarding the things that feel wrongRegarding 1Yes, I 100% agree and I am annoyed I didn't think of it before given that this is exactly what is done in DS tools. So for the component collection, yes 100% dataclass. Regarding 2Does not feel great, but I think here it makes sense, because Thus, the second function with the same function signature was intended to return the same (not a subset) things, but in a way that is simple to further use (e.g. if you would like to just have the code string, or if you would like to return the insights separately in a chatbot). Here is the 4 combinations, and what I think it should do: In Jupter:
not in jupyter:
If the above is ensured, we just have to find a way to deliver all output to the user in an alternative way (which as said in 1 should definitely be a
Regarding 3Yes, I think that would be wise, but I felt it was not the major focus for now What is my actual preferred optionI have tried to show above that
Long story short - we simply cannot tell. Thus I prefer to have a single function with a variable return type that is clear to maintain: it returns either the fig with all other components shown alongside OR it returns a dataclass with all the components as attributes. That is also easy to explain in docs, and easy to use. If |
Great points @maxschulz-COL! You are absolutely right that it's good to automatically show the figure in a Jupyter notebook, which I hadn't thought of before. That would actually still be possible with my "ideal" solution above since you could delegate the relevant methods to the underlying figure:
This is very similar to one of the prototypes I had for our We could actually also follow the subclassing approach like we do with Assuming we don't like that then I agree with your solution 1 here that we have a single
My assumption here is that in 99% of cases
|
Description
Add a separate call to return all possible outputs in a dictionary.
def get_plot_outputs( self, df: pd.DataFrame, user_input: str, explain: bool = False, max_debug_retry: int = 3, )
Pending:
will be deleted afterwards.
I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":