Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tabular] add infer throughput logging #4200

Merged
merged 2 commits into from
May 16, 2024

Conversation

Innixma
Copy link
Contributor

@Innixma Innixma commented May 14, 2024

Issue #, if available:
Resolves #4162

Description of changes:

  • add infer throughput logging, this implementation specifically introduces no overheads by avoiding calling predict separately, instead calculating the throughput at the same time as the score is being calculated for efficiency.
  • code cleanup / code dedupe
  • added additional method documentation

Mainline:

AutoGluon training complete, total runtime = 21.17s ... Best model: WeightedEnsemble_L2

This PR:

AutoGluon training complete, total runtime = 21.17s ... Best model: WeightedEnsemble_L2 | Estimated inference throughput: 23766.6 rows/s (1500 batch size)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Innixma Innixma added this to the 1.1.1 Release milestone May 14, 2024
@yinweisu
Copy link
Collaborator

Previous CI Run Current CI Run
botocore==1.34.104 botocore==1.34.105
boto3==1.34.104 boto3==1.34.105
botocore==1.34.104 botocore==1.34.105
boto3==1.34.104 boto3==1.34.105

@yinweisu
Copy link
Collaborator

Previous CI Run Current CI Run

Copy link

Job PR-4200-9d2669d is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4200/9d2669d/index.html

Copy link
Collaborator

@rey-allan rey-allan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

"""
y_pred_proba = self.predict_proba(X, **kwargs)
y_pred = get_pred_from_proba(y_pred_proba=y_pred_proba, problem_type=self.problem_type)
return y_pred

def predict_proba(self, X, normalize=None, **kwargs) -> np.ndarray:
def predict_proba(self, X, *, normalize: bool | None = None, record_time: bool = False, **kwargs) -> np.ndarray:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we not set the record_time as true and by default record the time taken for prediction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because calling time.time takes time, and predict is something we want to take as little time as possible for the user.

Copy link
Contributor

@prateekdesai04 prateekdesai04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Innixma Innixma merged commit 6d7122f into autogluon:master May 16, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[tabular] Add logging of inference throughput of best model at end of fit
4 participants