Skip to content

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

License

Notifications You must be signed in to change notification settings

Omarsmsm/webarena

 
 

Repository files navigation

Python 3.10 pre-commit Code style: black Checked with mypy bear-ified

WebArena: A Realistic Web Environment for Building Autonomous Agents

[Website] [Paper]

Overview

WebArena is a standalone, self-hostable web environment for building autonomous agents. WebArena creates websites from four popular categories with functionality and data mimicking their real-world equivalents. To emulate human problem-solving, WebArena also embeds tools and knowledge resources as independent websites. WebArena introduces a benchmark on interpreting high-level realistic natural language command to concrete web-based interactions. We provide annotated programs designed to programmatically validate the functional correctness of each task.

Note This README is still under constructions. Stay tuned!

Install

# Python 3.10+
conda create -n webarena python=3.10; conda activate webarena
pip install -r requirements.txt
playwright install
pip install -e .

# optional, dev only
pip install -e ".[dev]"
mypy --install-types --non-interactive browser_env
pip install pre-commit
pre-commit install

Preperation

  • Config the URLs of each website in env_config
  • python scripts/generate_test_data.py will generate individual config file for each test example in config_files
  • bash prepare.sh to obtain the auto-login cookies for all websites
  • export OPENAI_API_KEY=your_key
  • python run.py --instruction_path agent/prompts/jsons/p_cot_id_actree_2s.json --test_start_idx 0 --test_end_idx 1 --model gpt-3.5-turbo --result_dir your_result_dir to run the first example with GPT-3.5 reasoning agent. The trajectory will be saved in your_result_dir/0.html

About

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.6%
  • Shell 0.4%