[FEEDBACK] The data model could be simplified for options and attributes #716

eemeli · 2024-03-09T12:40:50Z

Parts of this issue have been broken out into their own issues #717 and #718 after this was initially posted.

While working on some Python code, I ended up needing to put together a pythonic representation of the message data model. While doing so, I encountered a few places where I could apply some simplifications to the data model with no loss of fidelity.

I think we should consider applying this change to how options and attributes are represented in the data model:

 interface FunctionAnnotation {
   type: "function";
   name: string;
-  options: Option[];
+  options: Map<string, Literal | VariableRef>;
 }

-interface Option { name: string, value: Literal | VariableRef }

 interface LiteralExpression { ...
 interface VariableExpression { ...
 interface FunctionExpression { ...
 interface UnsupportedExpression { ...
 interface Markup { ...
-  attributes: Attribute[];
+  attributes: Map<string, Literal | VariableRef | null>;
 }

-interface Attribute { name: string, value?: Literal | VariableRef }

As discussed most recently in #710, option names cannot be duplicated. This means that instead of an array of name+value wrapper objects, options could be represented by a 1:1 mapping. I'm using a JS Map here rather than an Object, as the former more explicitly does not include any prototype fields. In the JSON Schema, this could equivalently be represented by a JSON Object.

One explicit benefit of this change would be that messages coming from JSON interchange could not include any Duplicate Option Name errors.

The same applies for attributes as for options, but with the added note that values may be null because the attribute value is optional.

Separately, we probably should add a requirement for functions not to assign any meaning to option order, such that :func foo=42 bar=13 and :func bar=13 foo=42 would always be synonymous.

The text was updated successfully, but these errors were encountered:

aphillips · 2024-03-09T14:40:57Z

Separately, we probably should add a requirement for functions not to assign any meaning to option order, such that :func foo=42 bar=13 and :func bar=13 foo=42 would always be synonymous.

I think we should explicitly say that this is so. It's not just DM that's affected. Translation tools or message authors shouldn't have to remember option orders.

--

Side note: while I don't expect "normal people" (don't read too much into 'normal') to file feedback in this format, could we, as WG members, try to keep separate topics in separate issues?

eemeli · 2024-03-09T16:24:31Z

could we, as WG members, try to keep separate topics in separate issues?

Fair point. I'll break this up into three, keeping the options here because that's what you already commented on.

stasm · 2024-03-22T12:18:43Z

While I'm not opposed to this change, I'd like to note potential reasons to prefer arrays over maps:

With maps for options, the serialization of a message becomes unstable in languages where maps keys are unordered. This may result in false positives when comparing serialized messages for equality, for example when checking if one of them needs to updated. Or may produce different hashes for the same message.
When map iteration order is not deterministic (or even explicitly random), special care must be taken when testing, for example for expected emitted errors. Otherwise, tests may be flaky.
In lazy-evaluating implementations (IIUC), the order of options may influence the order of evaluations, e.g. :func foo=$foo bar=$bar, where $foo and $bar are locals which depend on each other.

aphillips · 2024-03-22T14:02:34Z

With maps for options, the serialization of a message becomes unstable in languages where maps keys are unordered. This may result in false positives when comparing serialized messages for equality, for example when checking if one of them needs to updated. Or may produce different hashes for the same message.

I think this is an interesting problem. A message that passes through the data model and is reserialized might have declarations or variants in a different order. This could negatively affect leverage in naive translation memory tools that look only at the value of the message string.

In lazy-evaluating implementations (IIUC), the order of options may influence the order of evaluations, e.g. :func foo=$foo bar=$bar, where $foo and $bar are locals which depend on each other.

We guard against this:

A declaration MUST NOT bind a variable that appears as a variable anywhere within a previous declaration.

eemeli · 2024-03-22T14:20:01Z

With maps for options, the serialization of a message becomes unstable in languages where maps keys are unordered. This may result in false positives when comparing serialized messages for equality, for example when checking if one of them needs to updated. Or may produce different hashes for the same message.

On the other hand, our current approach is rather prone to producing false negative results for comparisons, as the annotations :func a=13 b=42 and :func b=42 a=13 would not be considered equivalent by a naive implementation.

catamorphism · 2024-03-22T19:13:28Z

A different approach:

interface NoOperand {
  type: "no-operand"
}

interface Option {
  name: string;
  value: Literal | VariableRef | NoOperand;
}

Then, the value type can be named:

interface Operand {
  type: "operand";
  value: Literal | VariableRef | NoOperand;
}

interface Option {
  name: string;
  value: Operand;
}

(Keeping NoOperand instead of making the Operand have a value? field, for clarity, but maybe those are equivalent changes.)

Then, the same type can be used for both option maps and attribute maps:

Map<string, Operand>

This has the disadvantage that it allows constructing an option map that maps some option name onto an operand with no value. But it has the advantage of making option maps and attribute maps uniform. (This is what the data model in my ICU4C implementation does.)

eemeli · 2024-03-22T20:16:18Z

What's the benefit of allowing for an option with no value to be represented in the data model?

catamorphism · 2024-03-22T20:29:59Z

What's the benefit of allowing for an option with no value to be represented in the data model?

I was going to say that there's no direct benefit, but the indirect benefit is that having a single name for operands -- which can be the right-hand side of an option, the right-hand side of an attribute, or the subject of an expression -- makes the data model easier to read and understand.

However, thinking about it, this would mean it would be possible to construct a data model that can't be serialized to a syntactically correct message (e.g. {:foo x=} or {:foo x} would be a syntax error), so I withdraw that suggestion. (In my implementation, the API prevents that.)

It still seems like a bad code smell to me to repeat Literal | VariableRef without naming it as a type, but maybe that's intrinsic to allowing attributes with optional values.

mihnita · 2024-03-22T21:10:51Z

I'm fine with a map, and I think it is a good idea.
We can declare it to be a map the preserves order.

The data model is to describe how the data looks like, and hint on how it works.

See my comment on "map as an abstract data type" here #718 (comment)

See my old data model:

// The order matters.
// So we need a "special map" that keeps the insertion order.
export type OrderedMap<K, V> = Map<K, V>;

https://github.com/unicode-org/message-format-wg/blob/experiments/experiments/data_model/ts_mihai/src/imessageformat.ts#L34

When one implements the data model in a certain programming language, they must use an order preserving map if available (for example LinkedHashMap in Java), or implement something that works the same.

For example protobuffer v2 use a list of pairs:
https://protobuf.dev/programming-guides/encoding/#maps

It is still a map, conceptually.

mihnita · 2024-03-22T21:13:01Z

So I would propose changing Eemeli's proposal to something like this:

export type OrderPreservingMap<K, V> = Map<K, V>;
...
+  options: OrderPreservingMap<string, Literal | VariableRef>;
...
+  attributes: OrderPreservingMap<string, Literal | VariableRef | null>;

We can also define Literal | VariableRef as a type:

export type OrderPreservingMap<K, V> = Map<K, V>;
export type LiteralOrVariableRef = Literal | VariableRef
...
+  options: OrderPreservingMap<string, LiteralOrVariableRef>;
...
+  attributes: OrderPreservingMap<string, LiteralOrVariableRef | null>;

eemeli · 2024-03-23T07:48:35Z

@mihnita Why would we want to explicitly preserve option order?

macchiati · 2024-03-23T18:08:42Z

I would like to understand that too. That is for .input and .local. clearly the order is significant. Options seem quit different, and having the order be significant seems likely to be error-prone.

…

On Sat, Mar 23, 2024, 00:49 Eemeli Aro ***@***.***> wrote: @mihnita <https://github.com/mihnita> Why would we want to explicitly preserve option order? — Reply to this email directly, view it on GitHub <#716 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACJLEMDAEPBHPQ365RMGMM3YZUXW3AVCNFSM6AAAAABEOCSIKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJWGQYDAMBYGM> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

mihnita · 2024-03-23T18:09:24Z

Stas' argument.

We want the data model to be used by tools, so that they can do refactoring and all kind of cool stuff without writing a parser / serializer.

But now I edit a file in various editors, or refactored / auto-formatted by tools written different programming languages, and you end up with big differences for nothing. Messing up the diff tools, version control, etc.

And in general I would find it very annoying to write my options one way and have some dumb tool reshuffle them based on some invisible criteria (a hash value, most often).
I would sort options alphabetically, or grouped by functionality.

So it makes no difference in runtime functionality, but in user convenience.

mihnita · 2024-03-23T18:31:39Z

We can maybe have it non-normative?

As in:
"If the data model in a given implementation is not public, and not intended for reuse to build tooling, then it can be any map desired. But if the data model is intended for tooling that will load / save messages, or show options in some kind of UI, then the map must preserve the original order of the arguments"

WDYT?

eemeli · 2024-03-23T18:49:43Z

I would not oppose something like the proposed text, but I honestly don't think it's necessary to encourage or require tools to keep source order when parsing and reserialising option bags; this will happen naturally because it's so easy to do and, as mentioned, it prevents churn.

eemeli added Preview-Feedback Feedback gathered during the technical preview data model Issues related with MF data Model labels Mar 9, 2024

This was referenced Mar 9, 2024

[FEEDBACK] The data model could be simplified for literals #717

Open

[FEEDBACK] The data model could be simplified for declarations #718

Open

eemeli changed the title ~~[FEEDBACK] The data model could be simplified in multiple places~~ [FEEDBACK] The data model could be simplified for options and attributes Mar 9, 2024

eemeli mentioned this issue Mar 25, 2024

Clarify that options and attributes are unordered #751

Merged

eemeli mentioned this issue May 22, 2024

Use mappings for data model options and attributes #800

Merged

aphillips closed this as completed in #800 Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEEDBACK] The data model could be simplified for options and attributes #716

[FEEDBACK] The data model could be simplified for options and attributes #716

eemeli commented Mar 9, 2024 •

edited

aphillips commented Mar 9, 2024 •

edited

eemeli commented Mar 9, 2024

stasm commented Mar 22, 2024 •

edited

aphillips commented Mar 22, 2024

eemeli commented Mar 22, 2024

catamorphism commented Mar 22, 2024 •

edited

eemeli commented Mar 22, 2024

catamorphism commented Mar 22, 2024

mihnita commented Mar 22, 2024

mihnita commented Mar 22, 2024 •

edited

eemeli commented Mar 23, 2024

macchiati commented Mar 23, 2024 via email

mihnita commented Mar 23, 2024 •

edited

mihnita commented Mar 23, 2024 •

edited

eemeli commented Mar 23, 2024

[FEEDBACK] The data model could be simplified for options and attributes #716

[FEEDBACK] The data model could be simplified for options and attributes #716

Comments

eemeli commented Mar 9, 2024 • edited

aphillips commented Mar 9, 2024 • edited

eemeli commented Mar 9, 2024

stasm commented Mar 22, 2024 • edited

aphillips commented Mar 22, 2024

eemeli commented Mar 22, 2024

catamorphism commented Mar 22, 2024 • edited

eemeli commented Mar 22, 2024

catamorphism commented Mar 22, 2024

mihnita commented Mar 22, 2024

mihnita commented Mar 22, 2024 • edited

eemeli commented Mar 23, 2024

macchiati commented Mar 23, 2024 via email

mihnita commented Mar 23, 2024 • edited

mihnita commented Mar 23, 2024 • edited

eemeli commented Mar 23, 2024

eemeli commented Mar 9, 2024 •

edited

aphillips commented Mar 9, 2024 •

edited

stasm commented Mar 22, 2024 •

edited

catamorphism commented Mar 22, 2024 •

edited

mihnita commented Mar 22, 2024 •

edited

mihnita commented Mar 23, 2024 •

edited

mihnita commented Mar 23, 2024 •

edited