Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feat] get vizro ai customized text output #488

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

Anna-Xiong
Copy link
Contributor

@Anna-Xiong Anna-Xiong commented May 16, 2024

Description

Add a separate call to return all possible outputs in a dictionary.

def get_plot_outputs( self, df: pd.DataFrame, user_input: str, explain: bool = False, max_debug_retry: int = 3, )

  • if explain is False, it will return this dictionary and give info logging
"Flag explain is set to False. business_insights and code_explanation will not be included in the output dictionary"
{ "code_string": code_string,
  "fig":fig_object
}
  • if explain is True, it will return this dictionary
{ "business_insights": business_insights,
"code_explanation": code_explanation,
"code_string": code_string,
"fig":fig_object}

Pending:

  • Integration test needs to be updated, another PR will address integration tests and
 # TODO Tentative for integration test, will be updated/removed for new tests
        if self._return_all_text:
            return output_dict

will be deleted afterwards.

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

    • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
    • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorized to submit this contribution on behalf of the original creator(s) or their licensees.
    • I certify that the use of this contribution as authorized by the Apache 2.0 license does not violate the intellectual property rights of anyone else.
    • I have not referenced individuals, products or companies in any commits, directly or indirectly.
    • I have not added data or restricted code in any commits, directly or indirectly.

@Anna-Xiong Anna-Xiong changed the title [Feature] get vizro ai customized text output [Feat] get vizro ai customized text output May 16, 2024
@maxschulz-COL
Copy link
Contributor

I like it! I think this should have the exact same API as the plot function, and it does!

A couple of ideas for suggestions:

  • maybe tune the name so we know the return type: get_plot_dict?
  • Warning is nice, but I think not even necessary, could even skip it

Other than that, think this is great 💪 let's do it

Will do a final review once you finalised it, thanks!

@Anna-Xiong
Copy link
Contributor Author

Anna-Xiong commented May 23, 2024

Thank you!

  • maybe tune the name so we know the return type: get_plot_dict?

I am leaning towards including output in the name, probably get_plot_output_dict if it is not too long?

  • Warning is nice, but I think not even necessary, could even skip it

Here we are using INFO instead of WARNING, it would be nice to notify user they haven't turned on explain and won't be able to get business insights and code explanation.

Copy link
Contributor

@antonymilne antonymilne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr: this doesn't feel right to me, but maybe it's the best solution for now. I don't have much context on it so happy to go with what you guys think is best. We can always change in the future since I imagine the API here may well change anyway in a breaking way when we add the dashboard functionality.

I'm away till Wednesday next week but happy to discuss more then or if you want to move ahead with this before I'm back that's totally fine too.


Sorry it took so long to review this but I wanted to give it a proper look!

To be honest I am not a big fan of the change, but also not a big fan of the current alternatives. I don't have much context on what we want to achieve here or why we need to return these additional objects, but on the face of it this feels weird to me. Maybe it's ok as a temporary solution but I don't think it's right in the long term. Removing it in future would be a breaking change, but we can be much more relaxed about on VizroAI than on Vizro. So basically I don't have a strong objection if you guys think it's the right thing to do, but it doesn't feel right to me as a long term solution.

Things that feel wrong

  1. API: returning a dictionary for something that is structured data with fixed fields (see e.g. https://www.attrs.org/en/stable/why.html#dicts, https://stackoverflow.com/questions/354883/alternatives-for-returning-multiple-values-from-a-python-function). Here I think the only sensible options are namedtuple or dataclass. And out of those you can't really go wrong with dataclass I think.

  2. API: Two functions with the same signature that do something basically identical but one returns a subset of the other. Sometimes this makes sense, but here I think it does not (sorry @maxschulz-COL. And also @Anna-Xiong because I said I basically agreed with Max on this before, but that was before I had read through more of the details).

  3. Implementation (so less important than the above two points): this is a case where the code could be much more DRY. There's two functions that do something basically identical but are copy and pasted. If plot is effectively a "shortcut wrapper" for get_plot_outputs then probably it should be implemented as def plot(...): return get_plot_outputs(...)["fig_object"].

What would feel better?

In an ideal world where plot didn't already exist and return go.Figure then probably something like this:

@dataclass
class Plot:
    # I'm just taking the same property names as you use now, even if they could maybe be improved a bit
    business_insights: str
    code_explanation: str
    code_string: str
    fig: go.Figure

class VizroAI:
    def plot(...) -> Plot:
        ...

# use as:
plot = VizroAI().plot()
print(plot.business_insights)
vm.Graph(figure=plot.figure)

Then the user has access to everything they need, can choose what to take from the output object, etc. without needing nearly-duplicated methods.

It might be worth thinking about how things will evolve in future, e.g. if there's a new dashboard method. What would the different patterns look like then?

# Following pattern proposed here - potentially many duplicated methods
class VizroAI:
    def plot(...):
        ...

    def get_plot_outputs(...):
        ...

    def dashboard(...):
        ...

    def get_dashboard_outputs(...):
        ...

# Following my above patten 
class VizroAI:
    def plot(...) -> Plot:
        ...

    def dashboard(...) -> Dashboard: # maybe a bad name for class since will be confused with vizro.models.Dashboard, but you get the idea
    # Here it means a dataclass containing a vm.Dashboard object and maybe other things like the code string
        ...

For now

In the non-ideal world we live in where plot does already exist and return go.Figure and that was a change made recently (? I think) then we should probably not do the above since it's a breaking change. What would then be the options?

# Same as here but don't use dict as return type and refactor to be more DRY
class VizroAI:
    def plot() -> go.Figure: 
     # ultimately if this is a higher-level entry point than get_plot_outputs, it probably shouldn't have as many arguments as get_plot_outputs. But you can't remove arguments for now without it being breaking
        return self.get_plot_outputs.figure

    def get_plot_outputs() -> Plot:
        ...

# The alternative Anna mentioned before where a flag determines the return type
class VizroAI:
    def plot(verbose: bool) -> Union[go.Figure, Plot]: # rename verbose to something that makes sense
        ...

tbh I don't much like either of these (variable return types are generally awkward, though in this case it's not a huge problem, just a bit of a code smell) but would decide based on what arguments to plot vs. get_plot_outputs would be:

  • if they should remain the same signature then probably do the Union return type and one plot method
  • if get_plot_outputs might in future have more arguments then probably do as two methods and leave plot as the "shortcut wrapper"

@maxschulz-COL
Copy link
Contributor

Great review from Antony, and on a more thorough review I agree with some but not all points. Let me try to summarize:

Regarding what would feel better

I disagree that we should return a dataclass for plot - a core idea for simplicity is that plot returns a figure, and in jupyter that figure is automatically displayed.

Regarding the things that feel wrong

Regarding 1

Yes, I 100% agree and I am annoyed I didn't think of it before given that this is exactly what is done in DS tools. So for the component collection, yes 100% dataclass.

Regarding 2

Does not feel great, but I think here it makes sense, because .plot is a very special function... It returns a fig object for simplicity (and that should remain as is, as in jupyter it shows the fig automatically and it is the core functionality), but in a jupyter environment, it also shows the insights and code snippet above the fig. That is potentially what is most confusing, but at the same time probably the best way to handle a complicated situation.

Thus, the second function with the same function signature was intended to return the same (not a subset) things, but in a way that is simple to further use (e.g. if you would like to just have the code string, or if you would like to return the insights separately in a chatbot).

Here is the 4 combinations, and what I think it should do:

In Jupter:

  • plot(...,explain=False) -> Return fig object, that is automatically displayed
  • plot(...,explain=True) -> Return fig object, that is automatically displayed, with explanation displayed above, not altering the return type

not in jupyter:

  • plot(...,explain=False) -> Return fig object, that can be displayed if the user chooses fig.show() themselves
  • plot(...,explain=True) -> Return fig object, that can be displayed if the user chooses fig.show() themselves. Print the explanation to the terminal (this is new, and I think this would make .plot more consistent

If the above is ensured, we just have to find a way to deliver all output to the user in an alternative way (which as said in 1 should definitely be a dataclass. I think there is broadly three options:

  1. Variable return type based on a flag, e.g. .plot(...,return_components = True)
  2. Alternative function (current proposed solution)
  3. Alternative function but private method, so no pubic API (somewhat current solution)
  4. (Scrap this idea, and just return components - A's preferred solution, which is disagree with)

Regarding 3

Yes, I think that would be wise, but I felt it was not the major focus for now

What is my actual preferred option

I have tried to show above that plot is not a subset of get_plot_outputs, but a sister function. That would suggest (I think) a variable return type. But it makes sense as you say to look into the future:

  • will there be additional things to return that do not make sense to be part of .plot
  • will there be situations, where plot interacts with other parts of VizroAI, will it be a burden to sync with a potential different function (thinking of iteration, usage in .dashboard, etc)
  • do we need to think about iteration, will that be part of the same function?

Long story short - we simply cannot tell. Thus I prefer to have a single function with a variable return type that is clear to maintain: it returns either the fig with all other components shown alongside OR it returns a dataclass with all the components as attributes.

That is also easy to explain in docs, and easy to use. If plot becomes involved in other parts of the program, at least we only have to think about plot and not something else.

@antonymilne
Copy link
Contributor

Great points @maxschulz-COL! You are absolutely right that it's good to automatically show the figure in a Jupyter notebook, which I hadn't thought of before. That would actually still be possible with my "ideal" solution above since you could delegate the relevant methods to the underlying figure:

@dataclass
class Plot:
    # I'm just taking the same property names as you use now, even if they could maybe be improved a bit
    business_insights: str
    code_explanation: str
    code_string: str
    fig: go.Figure

    # Whatever methods are needed
    def _repr_html_(...):
        return fig._repr_html_(...)

This is very similar to one of the prototypes I had for our _DashboardReadyFigure which had similar requirements of needing to behave like a go.Figure in Jupyter while requiring extra behaviour for use in a dashboard. There I went for inheritance over composition in the end. One of the reasons for this is it turned out that while it's possible to do the above it's actually a bit painful to implement. And given we have the constraint of needing to return a go.Figure already given the change in vizro-ai 0.2.0, let's not do this.

We could actually also follow the subclassing approach like we do with _DashboardReadyFigure and do class Plot(go.Figure) and put a business_insights property into that class to get the best of both worlds where you get something that behaves like go.Figure but also exposes additional properties. This is pretty ugly and also a little awkward to implement so I don't think I'm a fan of it but might work ok here 🤔

Assuming we don't like that then I agree with your solution 1 here that we have a single plot function but with an argument that alters the return type. But I find the existence of both explain and return_components a bit confusing and wonder if we could just use the explain argument for both?

  1. plot(..., explain=False): return fig object and don't show any explanation, just like it does now.
  2. plot(..., explain=True): return dataclass (different from now). If in Jupyter then display explanation on screen (as done now) and also fig on screen (needed since we no longer directly return the figure object). If outside Jupyter then personally I don't care whether we show the explanation or not, since it's available in the return dataclass anyway.

My assumption here is that in 99% of cases explain is basically equivalent to return_components. The only cases that this solution renders impossible would be:

  • explain=True and return_components=False: you want to see the explanation but don't want to use it further. With my above solution you set explain=True and get extra components in your return object and you can just ignore those if you like so this seems fine since the graph is still displayed
  • explain=False and return_components=True: again just set explain=True. The only difference here is the explanation gets shown (at least in Jupyter notebooks) when maybe you don't want that, but that seems ok to me also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants