-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documented noisy aggregate functions in new page #22715
Conversation
Noisy Aggregate Functions | ||
========================= | ||
|
||
Noisy aggregate functions are aggregate functions that provide random, noisy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are aggregate functions --> are functions
mechanisms, it is important to note that neither the values returned by | ||
these functions nor the query results that incorporate these functions | ||
are differentially private in general. See `Limitations`_ below for more | ||
details. Users who wish to support a strong privacy guarantee should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd delete " Users who wish to support a strong privacy guarantee should
discuss with a suitable technical expert first."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to vote to leave this sentence, as we want to be abundantly clear about both the limitations and their practical implications.
|
||
.. note:: | ||
|
||
Unlike :func:`count`, this function will return ``NULL`` when the (true) count of ``x`` is 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will return --> returns
|
||
.. function:: noisy_count_gaussian(x, noise_scale[, random_seed]) -> bigint | ||
|
||
Counts the non-``NULL`` values in ``x`` and then adds random Gaussian noise with 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's x? (a column I assume)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I've renamed all these from x
to col
, which is hopefully more clear.
|
||
Distinct counting can be performed using ``noisy_count_gaussian(DISTINCT x, ...)``, or with | ||
:func:`noisy_approx_distinct_sfm`. Generally speaking, :func:`noisy_count_gaussian` will | ||
return more accurate results but at a larger computational cost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will return __> returns
.. function:: noisy_avg_gaussian(x, noise_scale[, random_seed]) -> double | ||
:noindex: | ||
|
||
Calculates the average (arithmetic mean) of all the input values in ``x`` and then adds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"adds a normally distributed random double value with 0 mean..."
|
||
.. note:: | ||
|
||
Unlike :func:`approx_set`, this function will return ``NULL`` when ``x`` is empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will return --> returns
.. note:: | ||
|
||
Unlike :func:`approx_set`, this function will return ``NULL`` when ``x`` is empty. | ||
If this behavior is undesirable, :func:`coalesce` with :func:`noisy_empty_approx_set_sfm`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sentence fragment
|
||
.. note:: | ||
|
||
Unlike :func:`approx_distinct`, this function will return ``NULL`` when ``x`` is empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returns
Libraries and How to Fix It. In *Proceedings of the 2022 ACM SIGSAC Conference | ||
on Computer and Communications Security* (pp. 471-484). | ||
|
||
.. [Hehir2023] Hehir, J., Ting, D., & Cormode, G. (2023). Sketch-Flip-Merge: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone ahead and linked all the references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, thanks! I had a suggestion I couldn't make inline so I'll mention it here.
Similar to the table of contents that I suggest adding to noisy.rst, aggregate.rst could benefit from a table of contents even more due to its length. Could you add a table of contents to aggregate.rst, the same as I added a comment in noisy.rst about?
If you do so, you could add a heading to aggregate.rst for Noisy Aggregate Functions, and underneath that header link to the new page with "See :doc:noisy
. " underneath it, to help the reader know that there are aggregate functions that are not on the Aggregate Functions page.
Screenshot of aggregate.rst in local build with screenshot.
Screenshot of local build for consideration.
Unlike :func:`count`, this function returns ``NULL`` when the (true) count of ``col`` is 0. | ||
|
||
Distinct counting can be performed using ``noisy_count_gaussian(DISTINCT col, ...)``, or with | ||
:func:`noisy_approx_distinct_sfm`. Generally speaking, :func:`noisy_count_gaussian` will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:func:`noisy_approx_distinct_sfm`. Generally speaking, :func:`noisy_count_gaussian` will | |
:func:`noisy_approx_distinct_sfm`. Generally speaking, :func:`noisy_count_gaussian` |
Thank you, @steveburnett! I appreciate the edits and reStructuredText pointers. I think I've hit your full wish list. Let me know if all looks good. 😄 |
Moved documentation for noisy_count_gaussian and related functions from the Aggregate Functions page to a new Noisy Aggregate Functions page. Also added documentation of the SFM sketch type and functions, such as noisy_approx_set_sfm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local build of docs, verified changes. New review of the documents as a whole, everything looks good. Thanks!
Description
Moved documentation for
noisy_count_gaussian
and related functions from the Aggregate Functions page to a new Noisy Aggregate Functions page. Also added documentation of the SFM sketch type and functions, such asnoisy_approx_set_sfm
.Motivation and Context
This is the documentation promised back in #21290.
Impact
Docs only
Test Plan
Docs build and render nicely.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.