Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A running time question #1014

Open
helloqirun opened this issue Jul 2, 2021 · 5 comments
Open

A running time question #1014

helloqirun opened this issue Jul 2, 2021 · 5 comments
Assignees

Comments

@helloqirun
Copy link

helloqirun commented Jul 2, 2021

Hi,

I have a quick question about the ddlog performance. Please see the following.

I have a .dl file that contains 4000+ lines. But the .dat file is fairly small. Please see the test files at https://github.com/helloqirun/ddlog_test

The timestamp is very small (0.21 seconds), as expected. However, the overall running time for dyck_cil is a bit long (~19 seconds).

$ ddlog -v
DDlog v0.41.0 (cadc3f1a66b03766049ab125ea5dddae19e537ba)
Copyright (c) 2019-2020 VMware, Inc. (MIT License)

$ time dyck_ddlog/target/release/dyck_cli < input.dat
Timestamp: 214690070

real	0m18.992s
user	0m24.924s
sys	0m7.078s

Is the long-running time for dyck_cil an expected behavior? Thanks!

@Kixiron Kixiron self-assigned this Jul 2, 2021
@ryzhyk
Copy link
Contributor

ryzhyk commented Jul 2, 2021

@helloqirun , thanks for reporting the issue. That the timestamp is small suggests that ddlog spends this entire time instantiating the enormous dataflow graph with thousands of rules and relations. In addition, 0.21s is a lot of time to perform a trivial computation, which likely means that differential dataflow also struggles with this huge graph at runtime.

I see that @Kixiron self-assigned this issue, so he may have a more detailed explanation soon.

Just curious: how did you come up with this program? It looks like auto-generated code :)

@helloqirun
Copy link
Author

helloqirun commented Jul 2, 2021

Thanks @ryzhyk and @Kixiron !
Yes, the .dl file is auto-generated. It essentially contains a Dyck language of 1600+ parentheses, defined by the grammar "S-> SS | (_i S )_i | empty". I was using souffle and switched to ddlog because it supports incremental updates.

@ryzhyk
Copy link
Contributor

ryzhyk commented Jul 2, 2021

I am not familiar with dyck, but here is a equivalent program that adds an idx column to A and B to model 1600 individual relations used in your code. This should be way more efficient.

typedef node = u32

input relation RA(idx: u16, s: node, t: node)
input relation RB(idx: u16, s: node, t: node)

input relation EMPTY(s: node, t: node)

output relation S(s: node, t: node)

S(x, y) :- EMPTY(x, y).
S(x, y) :- S(x,z), S(z,y).
S(x, z) :- RA(i, x, a), S(a, b), RB(i, b, z).

@Kixiron
Copy link
Contributor

Kixiron commented Jul 2, 2021

This is a really interesting case that I'm not fully done investigating so I'll update whenever I finish looking into it.
From what I can see now, this looks like a really interesting and strangely pathological case due to arrangements, there's hundreds of thousands of arrangement-related events produced by the code but very few timely events which is super interesting. Again, not finished looking into things so I'll update y'all when I have more info

@helloqirun
Copy link
Author

I am not familiar with dyck, but here is a equivalent program that adds an idx column to A and B to model 1600 individual relations used in your code. This should be way more efficient.

typedef node = u32

input relation RA(idx: u16, s: node, t: node)
input relation RB(idx: u16, s: node, t: node)

input relation EMPTY(s: node, t: node)

output relation S(s: node, t: node)

S(x, y) :- EMPTY(x, y).
S(x, y) :- S(x,z), S(z,y).
S(x, z) :- RA(i, x, a), S(a, b), RB(i, b, z).

Thanks. This one runs much faster!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants