Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SecGPT - LlamaIndex Integration #13127

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

Yuhao-W
Copy link

@Yuhao-W Yuhao-W commented Apr 26, 2024

Description

SecGPT is an LLM-based system that secures the execution of LLM apps via isolation. The key idea behind SecGPT is to isolate the execution of apps and to allow interaction between apps and the system only through well-defined interfaces with user permission. SecGPT can defend against multiple types of attacks, including app compromise, data stealing, inadvertent data exposure, and uncontrolled system alteration. We develop SecGPT using LlamaIndex because it supports several LLMs and apps and can be easily extended to include additional LLMs and apps. We implement SecGPT as a personal assistant chatbot, which the users can communicate with using text messages.

New Package?

Did I fill in the tool.llamahub section in the pyproject.toml and provide a detailed README.md for my new integration or package?

  • Yes
  • [x ] No

Version Bump?

Did I bump the version in the pyproject.toml file of the package I am updating? (Except for the llama-index-core package)

  • Yes
  • [x ] No

Type of Change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • [x ] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • Added new unit/integration tests
  • [x ] Added new notebook (that tests end-to-end)
  • I stared at the code and made sure it makes sense

Suggested Checklist:

  • [x ] I have performed a self-review of my own code
  • [x ] I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added Google Colab support for the newly added notebooks.
  • [x ] My changes generate no new warnings
  • [x ] I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I ran make format; make lint to appease the lint gods

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Apr 26, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@nerdai nerdai self-requested a review April 27, 2024 05:33
Copy link
Contributor

@nerdai nerdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Yuhao-W,

I notice that we haven't used the standard process for creating a llama pack here. Could you follow the instructions at the link provided to get in the standard format? In particular, we use poetry for package dep manager as well as for building our python packages. For convenience we have a cli tool that helps you created these packs:

https://github.com/run-llama/llama_index/blob/main/CONTRIBUTING.md#2--contribute-a-pack-reader-tool-or-dataset-formerly-from-llama-hub

@nerdai
Copy link
Contributor

nerdai commented Apr 29, 2024

@Yuhao-W submitted a PR to your fork/main branch. It brings in the necessary pants build files to pass our checks.

llm-platform-security#1

@nerdai
Copy link
Contributor

nerdai commented Apr 29, 2024

@Yuhao-W looks like lint/fmt checks are failing. Can you please run:

make lint and make format then commit and push?

@Yuhao-W
Copy link
Author

Yuhao-W commented Apr 29, 2024

@Yuhao-W looks like lint/fmt checks are failing. Can you please run:

make lint and make format then commit and push?

@nerdai Thanks, Andrei. I just fixed this.

@Yuhao-W
Copy link
Author

Yuhao-W commented May 1, 2024

@nerdai Hi, Andrei. I see that some checks failed. Is there anything that needs to be changed?

@nerdai
Copy link
Contributor

nerdai commented May 2, 2024

@nerdai Hi, Andrei. I see that some checks failed. Is there anything that needs to be changed?

Hey @Yuhao-W sorry for the troubles. I took a look at the logs and couldn't find anything. Tagging @logan-markewich who is quite good at figuring out this stuff when it seems like all is lost. lol

@Yuhao-W
Copy link
Author

Yuhao-W commented May 3, 2024

I think I figured out the errors and updated the package by mainly including dependency information in the pyproject.toml file under our package path. I also set up a unit test environment and ran it locally. The unit test was passed on my end. @nerdai and/or @logan-markewich, would appreciate if you can review the changes!

@nerdai
Copy link
Contributor

nerdai commented May 6, 2024

I think I figured out the errors and updated the package by mainly including dependency information in the pyproject.toml file under our package path. I also set up a unit test environment and ran it locally. The unit test was passed on my end. @nerdai and/or @logan-markewich, would appreciate if you can review the changes!

thanks, lets run the checks and see what happens!

@Yuhao-W
Copy link
Author

Yuhao-W commented May 6, 2024

I think I figured out the errors and updated the package by mainly including dependency information in the pyproject.toml file under our package path. I also set up a unit test environment and ran it locally. The unit test was passed on my end. @nerdai and/or @logan-markewich, would appreciate if you can review the changes!

thanks, lets run the checks and see what happens!

Thanks @andrei, it failed again : ( this time because the requirements.txt was named requirements.tx

I have made a new commit. Not sure but we may still see errors after, would appreciate a deeper look if it fails. Thank you!

@nerdai
Copy link
Contributor

nerdai commented May 7, 2024

@logan-markewich we're still running into some errors here. Perhaps we need to add a dependency in pants? This is the error we're seeing in the tests:

llama-index-packs/llama-index-packs-secgpt/llama_index/packs/secgpt/sandbox.py:6: in <module>
    import tldextract
E   ModuleNotFoundError: No module named 'tldextract'

But tldextract is indeed included in the pyproject.toml as a dep for the project.

(CC @Yuhao-W)

@nerdai
Copy link
Contributor

nerdai commented May 11, 2024

@Yuhao-W got checks to pass 🥳. Needed to remove the requirements.txt file as it was tripping up pants having deps listed in both requirements.txt and pyproject.toml

@@ -0,0 +1,741 @@
{
Copy link
Contributor

@nerdai nerdai May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused as to how these tools actually provide the fare price of both ride-sharing apps? Is this purely for illustration? In other words, is this notebook not actually functional?


Reply via ReviewNB

@@ -0,0 +1,741 @@
{
Copy link
Contributor

@nerdai nerdai May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #4.    # A benign ride-sharing app - quick_ride

this one isn't benign is it?


Reply via ReviewNB

@@ -0,0 +1,741 @@
{
Copy link
Contributor

@nerdai nerdai May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #20.    # A malicious ride-sharing app - metro hail

I think it was mentioned in the writeup before this cell block that QuickRide was the malicious app.


Reply via ReviewNB

@@ -0,0 +1,741 @@
{
Copy link
Contributor

@nerdai nerdai May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to show the case when not using SecGPT or having these measures turned off? In other words, can we see the case when the attack is successful?


Reply via ReviewNB

Copy link
Contributor

@nerdai nerdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yuhao-W Thanks for this contribution! I'm really excited about this :)

I left some comments on your PR. As another blanket comment, I do think your pack would greatly improve if you were able to include some doc/class strings throughout your code (i.e., quick descriptions of funcs/classes and its params/args).

from llama_index.core import ChatPromptTemplate


class HubPlanner:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is a prompt here, I think we should subclass PromptMixin:

https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/prompts/mixin.py

lc_output_parser = JsonOutputParser()
self.output_parser = LangchainOutputParser(lc_output_parser)

self.query_engine = QueryPipeline(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be a bit of a name clash with llama-index ecosystem. As this is not really a QueryEngine but rather a QueryPipeline. If possible, would suggest using a different name: query_pipeline.



class Message:
def function_probe_request(self, spoke_id, function):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: maybe should this be a staticmethod?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly for all other funcs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: have you considered using Pydantic BaseModel to represent Message? You can then subclass a BaseMessage. Pydantic can be helpful for validaton.

verbose=verbose,
)

def chat(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think since this is a light wrapper on our React class, I think you can support stream_chat and its async versions as well.

Comment on lines +91 to +112
from llama_index.core.tools import FunctionTool


def add_numbers(x: int, y: int) -> int:
"""
Adds the two numbers together and returns the result.
"""
return x + y


if __name__ == "__main__":
llm = OpenAI(model="gpt-4-turbo", temperature=0.0, additional_kwargs={"seed": 0})
function_tool = FunctionTool.from_defaults(fn=add_numbers)
print(function_tool.metadata)
print(function_tool.metadata.get_parameters_dict())
spoke = Spoke(
tools=[function_tool],
collab_functions=["send_email", "draft_email", "read_email"],
llm=llm,
verbose=True,
)
spoke.chat("send a email to [email protected], subject: hello, body: hello world")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it was used perhaps for testing while developing? I would suggest converting this into an acutal unit test and using mocking of LLMs.

For inspiration, see here: https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/agent/llama-index-agent-introspective/tests/test_self_reflection.py

# Format and send the app request message to the hub
def make_request(self, functionality: str, request: dict):
# format the app request message
app_request_message = Message().app_request(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these are staticmethods then you should be able to do Message.app_request(...) instead.

Comment on lines +18 to +32
HubOperator = "Yuhao-W"
HubPlanner = "Yuhao-W"
Message = "Yuhao-W"
Socket = "Yuhao-W"
Spoke = "Yuhao-W"
SpokeOperator = "Yuhao-W"
SpokeOutputParser = "Yuhao-W"
TIMEOUT = "Yuhao-W"
ToolImporter = "Yuhao-W"
VanillaSpoke = "Yuhao-W"
create_function_placeholder = "Yuhao-W"
create_message_spoke_tool = "Yuhao-W"
drop_perms = "Yuhao-W"
get_user_consent = "Yuhao-W"
set_mem_limit = "Yuhao-W"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these will show up in llamahub.ai which i don't think is ideal. Probably only makes sense to have the main one be discoverable i.e., Hub.

from .hub_operator import HubOperator


class Hub(BaseLlamaPack):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming convention for packs is typically XXXPack. I would suggest Renaming this to:

SecGPTPack and then just create the alias for Hub i.e., add the below to the end of this file.

Hub = SecGPTPack

@Yuhao-W
Copy link
Author

Yuhao-W commented May 12, 2024

@nerdai Thanks for the feedback on the PR! I will address your comments and get back to you with an update soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants