Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joining dataframes #16

Open
dave-killough opened this issue Apr 20, 2023 · 3 comments
Open

Joining dataframes #16

dave-killough opened this issue Apr 20, 2023 · 3 comments

Comments

@dave-killough
Copy link

Nice. I've been doing similar things with SQL sources. I'm curious why you attach the ask to a specific dataframe. Can multiple dataframes be considered in one request? Thanks

@bluecoconut
Copy link
Contributor

Hey @dave-killough ,

I attached to a specific dataframe because it was the easiest to "gather" the local context. In a python kernel, just doing sketch.ask(...) and inspecting the locals() in the same parent as the call is possible but potentially leading to lots of mistaken variables and too many dataframes. Also, this is somewhat rough in jupyter kernel style sessions, where navigating the call stack and finding other variables seemed to be a bit difficult (though if you have a trick here, this might make this really easy).

In many jupyter notebooks, people often re-use the same dataframe as well (editing in place), so the clarity of the answer I felt should be contextualized to a single dataframe.

One option for multiple joining I considered was something like

df.sketch.howto("question", extra_contextual_dfs=[*dfs])

but that didn't feel "easy to type" / "native".

Do you have any ideas or preferences for what would make a good API?

@dave-killough
Copy link
Author

Thanks for the explanation. Specifying the dataframes to include is important. There are often many working dataframes created in a notebook that should be excluded from end-result queries. It seems useful to specify the dataframes once, and then have a higher level ask function that can connect the dataframes. ChatGPT is quite good at modeling table relationships from just schema, and we're getting effective SQL joins as a result. I expect it could work similarly with dataframe merges.

Maybe something like this:

sketch.dataset([client_df, purchase_df, contact_df])
sketch.ask("who bought the most last year?")

@dnk8n
Copy link

dnk8n commented Apr 27, 2023

Asking questions across multiple dataframes would be very interesting. I was hoping to do something like this:

import pandas as pd
import sketch
from sqlalchemy import create_engine

# Define a class to store views
class Views:
    def __init__(self, schema_name, db_url):
        self.views = {}
        self.history = []
        self.engine = create_engine(db_url)
        self.schema_name = schema_name

        # Get all views in the schema
        views = pd.read_sql_query(f"SELECT table_name FROM information_schema.views WHERE table_schema = '{schema_name}'", con=self.engine)

        # Loop through each view and read it into a Pandas dataframe
        for view in views['table_name']:
            df = pd.read_sql_query(f"SELECT * FROM {schema_name}.{view}", con=self.engine)
            self.add_view(view, df)

    def add_view(self, name, df):
        self.views[name] = df

    def ask(self, question):
        answer = sketch.ask(question, self.views.items())
        self.history.append((question, answer))

# Connect to Postgres
db_url = 'postgresql://user:password@localhost:5432/mydatabase'
schema_name = 'my_schema'

# Create an instance of Views
my_views = Views(schema_name, db_url)

# Ask questions of all views and store history
my_views.ask('What are the top problematic projects with respect to planned vs actual time tracked?')
my_views.ask('Which activities appear to require upskilling?')

# Interrogate my_views.history for example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants