Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEEDBACK] The data model could be simplified for options and attributes #716

Closed
eemeli opened this issue Mar 9, 2024 · 15 comments · Fixed by #800
Closed

[FEEDBACK] The data model could be simplified for options and attributes #716

eemeli opened this issue Mar 9, 2024 · 15 comments · Fixed by #800
Labels
data model Issues related with MF data Model Preview-Feedback Feedback gathered during the technical preview

Comments

@eemeli
Copy link
Collaborator

eemeli commented Mar 9, 2024

Parts of this issue have been broken out into their own issues #717 and #718 after this was initially posted.

While working on some Python code, I ended up needing to put together a pythonic representation of the message data model. While doing so, I encountered a few places where I could apply some simplifications to the data model with no loss of fidelity.

I think we should consider applying this change to how options and attributes are represented in the data model:

 interface FunctionAnnotation {
   type: "function";
   name: string;
-  options: Option[];
+  options: Map<string, Literal | VariableRef>;
 }

-interface Option { name: string, value: Literal | VariableRef }

 interface LiteralExpression { ...
 interface VariableExpression { ...
 interface FunctionExpression { ...
 interface UnsupportedExpression { ...
 interface Markup { ...
-  attributes: Attribute[];
+  attributes: Map<string, Literal | VariableRef | null>;
 }

-interface Attribute { name: string, value?: Literal | VariableRef }

As discussed most recently in #710, option names cannot be duplicated. This means that instead of an array of name+value wrapper objects, options could be represented by a 1:1 mapping. I'm using a JS Map here rather than an Object, as the former more explicitly does not include any prototype fields. In the JSON Schema, this could equivalently be represented by a JSON Object.

One explicit benefit of this change would be that messages coming from JSON interchange could not include any Duplicate Option Name errors.

The same applies for attributes as for options, but with the added note that values may be null because the attribute value is optional.

Separately, we probably should add a requirement for functions not to assign any meaning to option order, such that :func foo=42 bar=13 and :func bar=13 foo=42 would always be synonymous.

@eemeli eemeli added Preview-Feedback Feedback gathered during the technical preview data model Issues related with MF data Model labels Mar 9, 2024
@aphillips
Copy link
Member

aphillips commented Mar 9, 2024

Separately, we probably should add a requirement for functions not to assign any meaning to option order, such that :func foo=42 bar=13 and :func bar=13 foo=42 would always be synonymous.

I think we should explicitly say that this is so. It's not just DM that's affected. Translation tools or message authors shouldn't have to remember option orders.

--

Side note: while I don't expect "normal people" (don't read too much into 'normal') to file feedback in this format, could we, as WG members, try to keep separate topics in separate issues?

@eemeli
Copy link
Collaborator Author

eemeli commented Mar 9, 2024

could we, as WG members, try to keep separate topics in separate issues?

Fair point. I'll break this up into three, keeping the options here because that's what you already commented on.

@eemeli eemeli changed the title [FEEDBACK] The data model could be simplified in multiple places [FEEDBACK] The data model could be simplified for options and attributes Mar 9, 2024
@stasm
Copy link
Collaborator

stasm commented Mar 22, 2024

While I'm not opposed to this change, I'd like to note potential reasons to prefer arrays over maps:

  • With maps for options, the serialization of a message becomes unstable in languages where maps keys are unordered. This may result in false positives when comparing serialized messages for equality, for example when checking if one of them needs to updated. Or may produce different hashes for the same message.
  • When map iteration order is not deterministic (or even explicitly random), special care must be taken when testing, for example for expected emitted errors. Otherwise, tests may be flaky.
  • In lazy-evaluating implementations (IIUC), the order of options may influence the order of evaluations, e.g. :func foo=$foo bar=$bar, where $foo and $bar are locals which depend on each other.

@aphillips
Copy link
Member

With maps for options, the serialization of a message becomes unstable in languages where maps keys are unordered. This may result in false positives when comparing serialized messages for equality, for example when checking if one of them needs to updated. Or may produce different hashes for the same message.

I think this is an interesting problem. A message that passes through the data model and is reserialized might have declarations or variants in a different order. This could negatively affect leverage in naive translation memory tools that look only at the value of the message string.

In lazy-evaluating implementations (IIUC), the order of options may influence the order of evaluations, e.g. :func foo=$foo bar=$bar, where $foo and $bar are locals which depend on each other.

We guard against this:

A declaration MUST NOT bind a variable that appears as a variable anywhere within a previous declaration.

@eemeli
Copy link
Collaborator Author

eemeli commented Mar 22, 2024

With maps for options, the serialization of a message becomes unstable in languages where maps keys are unordered. This may result in false positives when comparing serialized messages for equality, for example when checking if one of them needs to updated. Or may produce different hashes for the same message.

On the other hand, our current approach is rather prone to producing false negative results for comparisons, as the annotations :func a=13 b=42 and :func b=42 a=13 would not be considered equivalent by a naive implementation.

@catamorphism
Copy link
Collaborator

catamorphism commented Mar 22, 2024

A different approach:

interface NoOperand {
  type: "no-operand"
}

interface Option {
  name: string;
  value: Literal | VariableRef | NoOperand;
}

Then, the value type can be named:

interface Operand {
  type: "operand";
  value: Literal | VariableRef | NoOperand;
}

interface Option {
  name: string;
  value: Operand;
}

(Keeping NoOperand instead of making the Operand have a value? field, for clarity, but maybe those are equivalent changes.)

Then, the same type can be used for both option maps and attribute maps:

Map<string, Operand>

This has the disadvantage that it allows constructing an option map that maps some option name onto an operand with no value. But it has the advantage of making option maps and attribute maps uniform. (This is what the data model in my ICU4C implementation does.)

@eemeli
Copy link
Collaborator Author

eemeli commented Mar 22, 2024

What's the benefit of allowing for an option with no value to be represented in the data model?

@catamorphism
Copy link
Collaborator

What's the benefit of allowing for an option with no value to be represented in the data model?

I was going to say that there's no direct benefit, but the indirect benefit is that having a single name for operands -- which can be the right-hand side of an option, the right-hand side of an attribute, or the subject of an expression -- makes the data model easier to read and understand.

However, thinking about it, this would mean it would be possible to construct a data model that can't be serialized to a syntactically correct message (e.g. {:foo x=} or {:foo x} would be a syntax error), so I withdraw that suggestion. (In my implementation, the API prevents that.)

It still seems like a bad code smell to me to repeat Literal | VariableRef without naming it as a type, but maybe that's intrinsic to allowing attributes with optional values.

@mihnita
Copy link
Collaborator

mihnita commented Mar 22, 2024

I'm fine with a map, and I think it is a good idea.
We can declare it to be a map the preserves order.

The data model is to describe how the data looks like, and hint on how it works.

See my comment on "map as an abstract data type" here #718 (comment)

See my old data model:

// The order matters.
// So we need a "special map" that keeps the insertion order.
export type OrderedMap<K, V> = Map<K, V>;

https://github.com/unicode-org/message-format-wg/blob/experiments/experiments/data_model/ts_mihai/src/imessageformat.ts#L34

When one implements the data model in a certain programming language, they must use an order preserving map if available (for example LinkedHashMap in Java), or implement something that works the same.

For example protobuffer v2 use a list of pairs:
https://protobuf.dev/programming-guides/encoding/#maps

It is still a map, conceptually.

@mihnita
Copy link
Collaborator

mihnita commented Mar 22, 2024

So I would propose changing Eemeli's proposal to something like this:

export type OrderPreservingMap<K, V> = Map<K, V>;
...
+  options: OrderPreservingMap<string, Literal | VariableRef>;
...
+  attributes: OrderPreservingMap<string, Literal | VariableRef | null>;

We can also define Literal | VariableRef as a type:

export type OrderPreservingMap<K, V> = Map<K, V>;
export type LiteralOrVariableRef = Literal | VariableRef
...
+  options: OrderPreservingMap<string, LiteralOrVariableRef>;
...
+  attributes: OrderPreservingMap<string, LiteralOrVariableRef | null>;

@eemeli
Copy link
Collaborator Author

eemeli commented Mar 23, 2024

@mihnita Why would we want to explicitly preserve option order?

@macchiati
Copy link
Member

macchiati commented Mar 23, 2024 via email

@mihnita
Copy link
Collaborator

mihnita commented Mar 23, 2024

Stas' argument.

We want the data model to be used by tools, so that they can do refactoring and all kind of cool stuff without writing a parser / serializer.

But now I edit a file in various editors, or refactored / auto-formatted by tools written different programming languages, and you end up with big differences for nothing. Messing up the diff tools, version control, etc.

And in general I would find it very annoying to write my options one way and have some dumb tool reshuffle them based on some invisible criteria (a hash value, most often).
I would sort options alphabetically, or grouped by functionality.


So it makes no difference in runtime functionality, but in user convenience.

@mihnita
Copy link
Collaborator

mihnita commented Mar 23, 2024

We can maybe have it non-normative?

As in:
"If the data model in a given implementation is not public, and not intended for reuse to build tooling, then it can be any map desired. But if the data model is intended for tooling that will load / save messages, or show options in some kind of UI, then the map must preserve the original order of the arguments"

WDYT?

@eemeli
Copy link
Collaborator Author

eemeli commented Mar 23, 2024

I would not oppose something like the proposed text, but I honestly don't think it's necessary to encourage or require tools to keep source order when parsing and reserialising option bags; this will happen naturally because it's so easy to do and, as mentioned, it prevents churn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data model Issues related with MF data Model Preview-Feedback Feedback gathered during the technical preview
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants