Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support L2 cache hint #3470

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Conversation

manman-ren
Copy link
Collaborator

@manman-ren manman-ren commented Mar 26, 2024

This patch currently creates a cache policy for each load/store with eviction policy. It may be better to create cache policy globally for each function.

Fixes #3438

Copy link
Collaborator

@ThomasRaoux ThomasRaoux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure there is a test exercising this

Comment on lines 415 to 439
// Create cache policy here. Consider moving it to function scope.
// createpolicy.fractional.L2::policy.b64 xx 1.0
PTXBuilder ptxBuilder;
const bool hasL2EvictPolicy =
op.getEvict() == triton::EvictionPolicy::EVICT_FIRST ||
op.getEvict() == triton::EvictionPolicy::EVICT_LAST;
Value policyRet;
if (hasL2EvictPolicy) {
auto &policy =
ptxBuilder.create<>("createpolicy.fractional")
->o("L2::evict_first",
op.getEvict() == triton::EvictionPolicy::EVICT_FIRST)
.o("L2::evict_last",
op.getEvict() == triton::EvictionPolicy::EVICT_LAST)
.b(64);
const std::string writeConstraint = "=l";
// prepare asm operands
auto *dstOpr = ptxBuilder.newOperand(writeConstraint,
/*init=*/true);
std::string fractionStr = "1.0";
auto *fractionOpr = ptxBuilder.newConstantOperand(fractionStr);
policy(dstOpr, fractionOpr);
Type policyRetTy = rewriter.getI64Type();
policyRet = ptxBuilder.launch(rewriter, loc, policyRetTy);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like the this is the same exact code for load and store. Could you move it in a common helper?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick review! I will add test cases and refactor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you still working on the refactoring? Let me know when it's ready to review or you can move it into a draft until then

// CHECK: ld.global.L1::evict_last.L2::cache_hint.v4.b32
// CHECK: st.global.L1::evict_last.L2::cache_hint.v4.b32

#blocked = #triton_gpu.blocked<{sizePerThread = [4], threadsPerWarp = [32], warpsPerCTA = [4], order = [0]}>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you minimize this test and move it to tritongpu_to_llvm.mlir?

Comment on lines 415 to 439
// Create cache policy here. Consider moving it to function scope.
// createpolicy.fractional.L2::policy.b64 xx 1.0
PTXBuilder ptxBuilder;
const bool hasL2EvictPolicy =
op.getEvict() == triton::EvictionPolicy::EVICT_FIRST ||
op.getEvict() == triton::EvictionPolicy::EVICT_LAST;
Value policyRet;
if (hasL2EvictPolicy) {
auto &policy =
ptxBuilder.create<>("createpolicy.fractional")
->o("L2::evict_first",
op.getEvict() == triton::EvictionPolicy::EVICT_FIRST)
.o("L2::evict_last",
op.getEvict() == triton::EvictionPolicy::EVICT_LAST)
.b(64);
const std::string writeConstraint = "=l";
// prepare asm operands
auto *dstOpr = ptxBuilder.newOperand(writeConstraint,
/*init=*/true);
std::string fractionStr = "1.0";
auto *fractionOpr = ptxBuilder.newConstantOperand(fractionStr);
policy(dstOpr, fractionOpr);
Type policyRetTy = rewriter.getI64Type();
policyRet = ptxBuilder.launch(rewriter, loc, policyRetTy);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you still working on the refactoring? Let me know when it's ready to review or you can move it into a draft until then

@manman-ren
Copy link
Collaborator Author

Sorry, I have forgotten about the refactoring. Will convert to draft and do an internal review first.

@htyu
Copy link
Collaborator

htyu commented May 3, 2024

Thanks for working on this. LGTM.

I'm also curious about how this moves perf. Maybe kick off a Pytorch nightly perf run?

@manman-ren
Copy link
Collaborator Author

What is the easiest way to kick off pytorch perf test? Is it changing the pytorch pin to this hash, then start some job from the diff? @htyu

@htyu
Copy link
Collaborator

htyu commented May 8, 2024

What is the easiest way to kick off pytorch perf test? Is it changing the pytorch pin to this hash, then start some job from the diff? @htyu

Yes. Once you update pytorch pin with this one, push it to a personal branch in pytorch repro and submit a perf run on a branch by using:

https://github.com/pytorch/pytorch/actions/workflows/inductor-perf-test-nightly.yml

Then see the result ~12 hours later in:
https://hud.pytorch.org/benchmark/compilers

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support L2 cache hint/policy
3 participants