Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NoOp scheduler claims zero-dimensional non-trivial fusions #2241

Closed
jacobhinkle opened this issue May 14, 2024 · 1 comment
Closed

NoOp scheduler claims zero-dimensional non-trivial fusions #2241

jacobhinkle opened this issue May 14, 2024 · 1 comment
Labels
Segmentation Issues related to nvFuser Segmentation

Comments

@jacobhinkle
Copy link
Collaborator

I'm not sure if this is intended behavior, but it led to some small confusion leading to reordering the ExpressionEvaluator scheduler before NoOp in #2209. I'm posting this issue since it is confusing for a NoOp scheduler to compute an op.

The following test "fails":

TEST_F(NVFuserTest, ZeroDimAdd) {
  auto fusion = std::make_unique<Fusion>();
  FusionGuard fg(fusion.get());

  auto tv0 = makeSymbolicTensor(0, DataType::Float);
  auto tv1 = makeSymbolicTensor(0, DataType::Float);
  auto tv2 = add(tv0, tv1);

  fusion->addInput(tv0);
  fusion->addInput(tv1);
  fusion->addOutput(tv2);

  at::Tensor t0 = at::randn({}, at::kFloat).cuda();
  at::Tensor t1 = at::randn({}, at::kFloat).cuda();
  at::Tensor out_ref = t0 + t1;

  FusionExecutorCache fec(std::move(fusion));
  auto out = fec.runFusionWithInputs({t0, t1});

  const FusionKernelRuntime* runtime = fec.getMostRecentKernelRuntime();
  EXPECT_FALSE(runtime->isSegmented());
  const std::vector<FusionExecutor>& executors = runtime->executors();
  EXPECT_EQ(executors.size(), 1);
  // Verify that fusion compilation was not skipped.
  EXPECT_TRUE(executors.front().hasCompiledKernel());

  EXPECT_EQ(
      runtime->schedulerHeuristics()->heuristicsList().front()->heuristic(),
      ScheduleHeuristic::PointWise);
  /*
Expected equality of these values:
  runtime->schedulerHeuristics()->heuristicsList().front()->heuristic()
    Which is: no_op
  ScheduleHeuristic::PointWise
    Which is: pointwise
  */

  EXPECT_TRUE(at::allclose(out[0], out_ref));
}

Even though the no-op scheduler claims the segment, it does result in a working fusion and the result is correct. Here is the kernel:

__global__ void nvfuser_no_op_f0_c1_r0_g0(Tensor<float, 0, 0> T0, Tensor<float, 0, 0> T1, Tensor<float, 0, 0> T2) {
  T2[0]
    = T0[0]
    + T1[0];
}

Segmenter logging doesn't seem to give any clues as to why NoOp accepts the fusion:

***Runtime***: Try to schedule fusion un-segmented:

Scheduler _expr_eval_ ***rejected*** because : Fusion must contain a single expression of type MatmulOp
***Accepted*** as: no_op
@jacobhinkle jacobhinkle added the Segmentation Issues related to nvFuser Segmentation label May 14, 2024
jacobhinkle added a commit that referenced this issue May 30, 2024
@jacobhinkle
Copy link
Collaborator Author

This is actually the intended behavior of the NoOp scheduler. We just need to be more careful to exclude ops that produce reductions to zero-dimensional outputs, like matmuls with all 1D inputs. I'll do that in #2236 and close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Segmentation Issues related to nvFuser Segmentation
Projects
None yet
Development

No branches or pull requests

1 participant