You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the way common edge cases are handled by some aggregates is incorrect.
sum of zero elements returns NaN. Should return zero. This is particularly inconsistent as count of zero elements returns zero.
std of one element returns NaN. Should return zero. std of zero elements should return NaN.
Based on comment from @flyingsilverfin, it may be the case that std is using sample rather than population standard deviation. If so, this is incorrectly applied. Database queries necessarily operate under a closed-world assumption. When we ask for the standard deviation of a variable in query results, the results are the population, rather than a sample of some greater population (whatever that may be, if such a thing even exists in the context of the variable). Consider the following query.
match
$user isa user,
has gender "male",
has age $age;
get; std $age;
Are we asking for a) the standard deviation of ages of male users, or b) the standard deviation of ages of all human males? The answer is almost certainly (a), in which case the population is defined by the users selected by the query's constraints (in this case males). This means the population standard deviation is the applicable statistic. In fact, the sample standard deviation should only be applied when attempting to use the statistic as a predictor, and the population standard deviation is far more useful in the majority of cases. Of course, we should ideally offer both, as most mature languages do.
Environment
TypeDB distribution: Core
TypeDB version: 2.25.7
Environment: MacOS
Client and version: Studio 2.25.0
The text was updated successfully, but these errors were encountered:
Description
Currently, the way common edge cases are handled by some aggregates is incorrect.
sum
of zero elements returns NaN. Should return zero. This is particularly inconsistent ascount
of zero elements returns zero.std
of one element returns NaN. Should return zero.std
of zero elements should return NaN.Based on comment from @flyingsilverfin, it may be the case that
std
is using sample rather than population standard deviation. If so, this is incorrectly applied. Database queries necessarily operate under a closed-world assumption. When we ask for the standard deviation of a variable in query results, the results are the population, rather than a sample of some greater population (whatever that may be, if such a thing even exists in the context of the variable). Consider the following query.Are we asking for a) the standard deviation of ages of male users, or b) the standard deviation of ages of all human males? The answer is almost certainly (a), in which case the population is defined by the users selected by the query's constraints (in this case males). This means the population standard deviation is the applicable statistic. In fact, the sample standard deviation should only be applied when attempting to use the statistic as a predictor, and the population standard deviation is far more useful in the majority of cases. Of course, we should ideally offer both, as most mature languages do.
Environment
The text was updated successfully, but these errors were encountered: