Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FastInteractiveParser Subclass to Improve CFGFSM Performance #622

Closed
wants to merge 4 commits into from

Conversation

lapp0
Copy link
Collaborator

@lapp0 lapp0 commented Feb 7, 2024

Improve performance of lark_lark_self_grammar.lark.test-True (tests -True meaning the cache already is populated, simulating a second run.)

39 tokens / second -> 63 tokens / second

TODO:

Problem

Substantial time is spent in InteractiveParser.accepts() / InteractiveParser.feed_token() because it deepcopies Token objects along with all their properties. However that metadata isn't ever mutated, so it's unnecessary.

(see https://github.com/lark-parser/lark/blob/master/lark/parsers/lalr_interactive_parser.py)

Solution

Create a global (class variable) copy_memo which stores the Tokens which would have been copied. Don't deepcopy Tokens, retrieve from the copy_memo instead.

New Benchmark

Additional Benchmark Details:
tests/benchmark/test_benchmark_cfg_generation.py::test_benchmark_cfg_generation[lark_lark_self_grammar.lark.test-True]:
	Tokens / Second: 63.455
	(Num Tokens: 2771, Time: 43.669 seconds)


----------------------------------------------------------------------- benchmark: 1 tests ----------------------------------------------------------------------
Name (time in s)                                                             Min      Max     Mean  StdDev   Median     IQR  Outliers     OPS  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_cfg_generation[lark_lark_self_grammar.lark.test-True]     43.6688  43.6688  43.6688  0.0000  43.6688  0.0000       0;0  0.0229       1           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Profile:

tests/benchmark/test_benchmark_cfg_generation.py::test_benchmark_cfg_generation[lark_lark_self_grammar.lark.test-True]
ncalls	tottime	percall	cumtime	percall	filename:lineno(function)
1	0.0001	0.0001	60.3215	60.3215	outlines/tests/benchmark/test_benchmark_cfg_generation.py:130(<lambda>)
1	3.0319	3.0319	60.3214	60.3214	outlines/tests/benchmark/test_benchmark_cfg_generation.py:51(run_until_eos)
2772	1.0395	0.0004	56.7751	0.0205	outlines/outlines/fsm/fsm.py:224(allowed_token_ids)
2455	0.0234	0.0000	41.4093	0.0169	outlines/outlines/models/transformers.py:177(__hash__)
2455	0.0057	0.0000	41.3844	0.0169	outlines/.myenv/lib/python3.11/site-packages/datasets/fingerprint.py:231(hash)
2455	0.0089	0.0000	41.3787	0.0169	outlines/.myenv/lib/python3.11/site-packages/datasets/fingerprint.py:227(hash_default)
2455	0.0172	0.0000	41.0469	0.0167	outlines/.myenv/lib/python3.11/site-packages/datasets/utils/py_utils.py:723(dumps)
2455	0.0209	0.0000	40.9838	0.0167	outlines/.myenv/lib/python3.11/site-packages/datasets/utils/py_utils.py:700(dump)
2455	0.0080	0.0000	40.9274	0.0167	outlines/.myenv/lib/python3.11/site-packages/dill/_dill.py:416(dump)
2455	0.0192	0.0000	40.9099	0.0167	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/pickle.py:476(dump)
213585/2455	0.2292	0.0001	40.8604	0.0166	outlines/.myenv/lib/python3.11/site-packages/datasets/utils/py_utils.py:607(save)
213585/2455	0.1673	0.0001	40.6439	0.0166	outlines/.myenv/lib/python3.11/site-packages/dill/_dill.py:365(save)
213585/2455	0.5156	0.0002	40.6011	0.0165	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/pickle.py:535(save)
7365/2455	0.0666	0.0000	40.5513	0.0165	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/pickle.py:621(save_reduce)
7365/2455	0.0349	0.0000	40.2919	0.0164	outlines/.myenv/lib/python3.11/site-packages/dill/_dill.py:1190(save_module_dict)
7365/2455	0.0191	0.0000	40.2614	0.0164	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/pickle.py:965(save_dict)
7365/2455	0.1108	0.0000	40.2449	0.0164	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/pickle.py:978(_batch_setitems)
34264	37.0693	0.0011	37.0693	0.0011	~:0(<method '__reduce_ex__' of 'object' objects>)
2455	0.0403	0.0000	5.7382	0.0023	outlines/.myenv/lib/python3.11/site-packages/lark/parsers/lalr_interactive_parser.py:47(exhaust_lexer)
194248	0.1183	0.0000	5.6979	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/parsers/lalr_interactive_parser.py:35(iter_parse)
2454	4.6008	0.0019	4.6008	0.0019	outlines/outlines/fsm/fsm.py:274(<listcomp>)
216237	0.1141	0.0000	3.3792	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/parsers/lalr_interactive_parser.py:28(feed_token)
216237	1.6973	0.0000	3.2650	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/parsers/lalr_parser_state.py:67(feed_token)
2455	0.0714	0.0000	2.8354	0.0012	outlines/.myenv/lib/python3.11/site-packages/lark/parsers/lalr_interactive_parser.py:103(accepts)
194248	0.1507	0.0000	2.4708	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:661(lex)
159870	0.7756	0.0000	0.8556	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/parse_tree_builder.py:145(__call__)
22095	0.6836	0.0000	0.9404	0.0000	outlines/.myenv/lib/python3.11/site-packages/packaging/version.py:186(__init__)
367363/9046	0.6215	0.0001	1.6694	0.0002	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/copy.py:128(deepcopy)
194248	0.6037	0.0000	2.2879	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:590(next_token)
5226	0.5900	0.0001	0.5930	0.0001	outlines/outlines/fsm/fsm.py:130(allowed_token_ids)
305361	0.4443	0.0000	0.4443	0.0000	~:0(<method 'match' of '_regex.Pattern' objects>)
304840	0.3112	0.0000	0.8131	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:387(match)
2455	0.3045	0.0001	0.3045	0.0001	~:0(<method 'update' of 'xxhash.xxh64' objects>)
287235	0.2748	0.0000	0.2748	0.0000	~:0(<method 'write' of '_io.BytesIO' objects>)
2771	0.2534	0.0001	0.4975	0.0002	outlines/outlines/fsm/fsm.py:309(next_state)
339	0.2529	0.0007	0.2529	0.0007	outlines/outlines/fsm/fsm.py:305(<listcomp>)
310724	0.2427	0.0000	0.3085	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:213(_future_new)
1150169	0.2348	0.0000	0.2348	0.0000	~:0(<method 'get' of 'dict' objects>)
304840	0.2094	0.0000	0.2773	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:292(feed)
310724	0.1970	0.0000	0.5054	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:202(__new__)
132570	0.1956	0.0000	0.3823	0.0000	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/pickle.py:851(save_str)
149119/26899	0.1890	0.0000	2.4392	0.0001	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/copy.py:66(copy)
304840	0.1850	0.0000	1.0593	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:587(match)
2341037	0.1430	0.0000	0.1430	0.0000	~:0(<method 'append' of 'list' objects>)
1665962	0.1532	0.0000	0.1532	0.0000	~:0(<built-in method builtins.len>)
1419451	0.1299	0.0000	0.1299	0.0000	~:0(<built-in method builtins.id>)
1267085	0.1429	0.0000	0.1547	0.0000	~:0(<built-in method builtins.isinstance>)
695582	0.1128	0.0000	0.1128	0.0000	~:0(<built-in method builtins.getattr>)
555623	0.1356	0.0000	0.1871	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/grammar.py:124(__eq__)
489362	0.0588	0.0000	0.0588	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/grammar.py:121(__hash__)
412091	0.0879	0.0000	0.0879	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/tree.py:61(__init__)
407204	0.1605	0.0000	0.1995	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:265(__eq__)
370330	0.0373	0.0000	0.0373	0.0000	~:0(<built-in method builtins.issubclass>)
367363	0.1424	0.0000	0.1938	0.0000	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/copy.py:243(_keep_alive)
359718	0.0771	0.0000	0.0771	0.0000	~:0(<built-in method __new__ of type object at 0x7f22343b1ba0>)
304840	0.0579	0.0000	0.0579	0.0000	~:0(<method 'group' of '_regex.Match' objects>)
304840	0.0539	0.0000	0.0611	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:581(scanner)
272505	0.0933	0.0000	0.1404	0.0000	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/pickle.py:241(write)
255255	0.1052	0.0000	0.1312	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/parse_tree_builder.py:20(__call__)
9820/7365	0.0130	0.0000	1.0791	0.0001	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/pickle.py:874(save_tuple)
94487	0.0674	0.0000	0.2409	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:262(__deepcopy__)
88380	0.0314	0.0000	0.0314	0.0000	outlines/.myenv/lib/python3.11/site-packages/packaging/version.py:205(<genexpr>)
8834	0.0019	0.0000	0.0019	0.0000	~:0(<method 'keys' of 'dict' objects>)
81447	0.0245	0.0000	0.0245	0.0000	~:0(<built-in method builtins.hasattr>)
7382	0.0032	0.0000	0.0035	0.0000	~:0(<method 'join' of 'str' objects>)
7365	0.0608	0.0000	0.5807	0.0001	outlines/.myenv/lib/python3.11/site-packages/dill/_dill.py:1730(save_type)
7365	0.0569	0.0000	0.3105	0.0000	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/pickle.py:1056(save_global)
7365	0.0412	0.0000	0.1485	0.0000	outlines/.myenv/lib/python3.11/site-packages/dill/_dill.py:1048(_locate_function)
7365	0.0247	0.0000	0.0255	0.0000	outlines/.myenv/lib/python3.11/site-packages/dill/_dill.py:1186(_repr_dict)
7365	0.0218	0.0000	0.0762	0.0000	outlines/.myenv/lib/python3.11/site-packages/dill/_dill.py:1018(_import_module)
7365	0.0120	0.0000	0.0206	0.0000	outlines/.myenv/lib/python3.11/site-packages/dill/_dill.py:1035(_getattribute)
7365	0.0113	0.0000	0.0126	0.0000	outlines/.myenv/lib/python3.11/site-packages/packaging/version.py:76(__lt__)
7365	0.0108	0.0000	0.0341	0.0000	outlines/.myenv/lib/python3.11/site-packages/dill/_dill.py:122(numpyufunc)
7365	0.0098	0.0000	0.0117	0.0000	outlines/.myenv/lib/python3.11/site-packages/dill/_dill.py:320(__missing__)
7365	0.0075	0.0000	0.0233	0.0000	~:0(<built-in method builtins.any>)
7365	0.0073	0.0000	0.0124	0.0000	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/pickle.py:751(save_long)
7365	0.0069	0.0000	0.0089	0.0000	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/pickle.py:322(_getattribute)
7365	0.0067	0.0000	0.0193	0.0000	outlines/.myenv/lib/python3.11/site-packages/dill/_dill.py:124(numpydtype)
7365	0.0047	0.0000	0.0150	0.0000	outlines/.myenv/lib/python3.11/site-packages/dill/_dill.py:112(ndarraysubclassinstance)
7365	0.0044	0.0000	0.0044	0.0000	~:0(<method 'rpartition' of 'str' objects>)
136438/4515	0.1718	0.0000	1.5942	0.0004	/nix/store/qp5zys77biz7imbk6yy85q5pdv7qk84j-python3-3.11.6/lib/python3.11/copy.py:201(_deepcopy_list)
17	0.0006	0.0000	0.0063	0.0004	outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:334(_create_unless)
136438/4515	0.1183	0.0000	1.6193	0.0004	outlines/.myenv/lib/python3.11/site-packages/lark/tree.py:206(__deepcopy__)
2771	0.0408	0.0000	0.0408	0.0000	~:0(<method 'decode' of 'tokenizers.Tokenizer' objects>)
2455	0.0325	0.0000	0.0597	0.0000	outlines/.myenv/lib/python3.11/site-packages/lark/parser_frontends.py:96(_make_lexer_thread)
2771	0.0299	0.0000	0.1307	0.0000	outlines/.myenv/lib/python3.11/site-packages/transformers/utils/generic.py:232(to_py_obj)
17	0.0001	0.0000	0.0073	0.0004	outlines/.myenv/lib/python3.11/site-packages/lark/lexer.py:568(_build_scanner)
17	0.0000	0.0000	0.0000	0.0000	~:0(<method 'values' of 'dict' objects>)
71	0.0000	0.0000	0.0000	0.0000	~:0(<method 'translate' of 'str' objects>)
220950	0.0279	0.0000	0.0279	0.0000	~:0(<method 'tell' of '_io.BytesIO' objects>)
28585	0.0150	0.0000	0.0150	0.0000	~:0(<method 'startswith' of 'str' objects>)
41735	0.0148	0.0000	0.0148	0.0000	~:0(<method 'split' of 'str' objects>)
22095	0.0651	0.0000	0.0651	0.0000	~:0(<method 'search' of 're.Pattern' objects>)
60769	0.0128	0.0000	0.0128	0.0000	~:0(<method 'rindex' of 'str' objects>)
27710	0.0029	0.0000	0.0029	0.0000	~:0(<method 'replace' of 'str' objects>)
2116	0.0003	0.0000	0.0003	0.0000	~:0(<method 'remove' of 'set' objects>)
10136	0.0021	0.0000	0.0021	0.0000	~:0(<method 'pop' of 'dict' objects>)
37052	0.0057	0.0000	0.0057	0.0000	~:0(<method 'items' of 'dict' objects>)
53600	0.0073	0.0000	0.0073	0.0000	~:0(<method 'isupper' of 'str' objects>)
2455	0.0015	0.0000	0.0015	0.0000	~:0(<method 'hexdigest' of 'xxhash.xxh64' objects>)
220950	0.0369	0.0000	0.0369	0.0000	~:0(<method 'group' of 're.Match' objects>)
2455	0.0010	0.0000	0.0010	0.0000	~:0(<method 'getvalue' of '_io.BytesIO' objects>)
4910	0.0040	0.0000	0.0040	0.0000	~:0(<method 'getbuffer' of '_io.BytesIO' objects>)
2771	0.0004	0.0000	0.0004	0.0000	~:0(<method 'extend' of 'list' objects>)

@lapp0 lapp0 mentioned this pull request Feb 7, 2024
3 tasks
outlines/fsm/fast_lark.py Outdated Show resolved Hide resolved
@rlouf
Copy link
Member

rlouf commented Mar 9, 2024

Closing as inactive. Feel free to reopen later.

@rlouf rlouf closed this Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants