Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-3968 - [Java] Support for custom @AvroNamespace annotation #2887

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

henryf1991
Copy link

Add support for custom AvroTypeName annotation to allow overriding namespace when generating schema with reflection

What is the purpose of the change

Add support for custom AvroTypeName annotation to allow overriding namespace when generating schema with reflection

Verifying this change

This change added tests and can be verified as follows:

  • Added test that validates that Java throws an AvroRuntimeException on invalid binary data

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

Add support for custom AvroTypeName annotation to allow overriding namespace when generating schema with reflection
@github-actions github-actions bot added the Java Pull Requests for Java binding label May 1, 2024
Copy link
Contributor

@opwvhk opwvhk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is nicely contained and tested. Well done!
However, it does not implement the annotation as described in Jira (yet). I'm missing the option to override the type name, as the annotation name suggests.

return avroTypeName.value();
}
if (c.getEnclosingClass() != null) // nested class
return c.getEnclosingClass().getName().replace('$', '.');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the enclosing class has an @AvroTypeName?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@opwvhk Thats a great point! I was assuming that we should only be overriding the namespace when the class has the AvroTypeName annotation.

Just to clarify if the class structure is as below:

@AvroTypeName("org.apache.avro.reflect.OverrideNamespace")
  private static class NamespaceTest {
    private static class EnclosingNamespaceTest {
    }
  }

When the schema definition is generated for EnclosingNamespaceTest.class, the namespace should be org.apache.avro.reflect.OverrideNamespace ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now addressed

String space = c.getPackage() == null ? "" : c.getPackage().getName();
if (c.getEnclosingClass() != null) // nested class
space = c.getEnclosingClass().getName().replace('$', '.');
String space = getNamespace(c);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the annotation @AvroTypeName not specify the type name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not clear on this. Could you please explain this?

Do you mean to rename the function getNamespace to getTypeName ?

@Retention(RetentionPolicy.RUNTIME)
@Target({ ElementType.TYPE })
public @interface AvroTypeName {
String value() default "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always want to specify the name here, or also allow to override the namespace only?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the responsibility of this annotation should be solely to allow users to override the namespace only given the annotation can only be applied to classes.

@henryf1991
Copy link
Author

@opwvhk regarding your comment -> I'm missing the option to override the type name, as the annotation name suggests.

Based on my understanding the ask was the ability to allow overriding the namespace when schema is created using java reflection. The change does this at the moment by removing the default assignment of namepsace which was previously done as below

String space = c.getPackage() == null ? "" : c.getPackage().getName();
 if (c.getEnclosingClass() != null) // nested class
          space = c.getEnclosingClass().getName().replace('$', '.');

Kindly let me know if my understanding is incorrect

@ex-ratt
Copy link

ex-ratt commented May 15, 2024

Sorry for barging into the discussion uninvited, but I am following this PR, since I like the idea of such an annotation. If I understood the comments on the original issue correctly, then the annotation should not only allow to specify the namespace, but also the name of a record/enum - hence the name @AvroTypeName for the annotation instead of @AvroNamespace. The following suggestions are from Oscar's comment on the issue, I just adjusted the name of the annotation:

  1. @AvroTypeName("simpleOrFullName") for a named type would set its name, optionally with namespace
  2. @AvroTypeName(value="simpleName", namespace="my.space") for a named type would set its name and namespace
  3. @AvroTypeName(namespace="my.space") for a named type would set only the namespace and let the name be inferred

I guess this is what @opwvhk is referring to.

On a related note: I really like the idea of this kind of annotation, because namespace & name of a record schema can be independent from package & name of the corresponding class. However, it does not end there: ReflectData and ReflectDatumReader are not able to deserialize data when the schema of a record/enum has a changed name or namespace. Currently, it assumes that the class can be inferred from namespace & name of a record, see SpecificData.getClass(Schema), which is used by SpecificData.newRecord(Object, Schema) (and ReflectData derives from SpecificData). I stumbled upon this issue this week when trying to implement an @AvroNamespace annotation myself.

What solved this issue for me was to add a property to the generated schema (SpecificData.CLASS_PROP, which is already used to specify classes for Avro arrays) and override ReflectData.getClass(Schema) to read this property (and cache the loaded class, just like ReflectData.getClassProp(Schema,String)).

I would appreciate an official solution, so I can throw away my code which may break with new versions of the Avro library.

@henryf1991
Copy link
Author

Sorry for barging into the discussion uninvited, but I am following this PR, since I like the idea of such an annotation. If I understood the comments on the original issue correctly, then the annotation should not only allow to specify the namespace, but also the name of a record/enum - hence the name @AvroTypeName for the annotation instead of @AvroNamespace. The following suggestions are from Oscar's comment on the issue, I just adjusted the name of the annotation:

  1. @AvroTypeName("simpleOrFullName") for a named type would set its name, optionally with namespace
  2. @AvroTypeName(value="simpleName", namespace="my.space") for a named type would set its name and namespace
  3. @AvroTypeName(namespace="my.space") for a named type would set only the namespace and let the name be inferred

I guess this is what @opwvhk is referring to.

On a related note: I really like the idea of this kind of annotation, because namespace & name of a record schema can be independent from package & name of the corresponding class. However, it does not end there: ReflectData and ReflectDatumReader are not able to deserialize data when the schema of a record/enum has a changed name or namespace. Currently, it assumes that the class can be inferred from namespace & name of a record, see SpecificData.getClass(Schema), which is used by SpecificData.newRecord(Object, Schema) (and ReflectData derives from SpecificData). I stumbled upon this issue this week when trying to implement an @AvroNamespace annotation myself.

What solved this issue for me was to add a property to the generated schema (SpecificData.CLASS_PROP, which is already used to specify classes for Avro arrays) and override ReflectData.getClass(Schema) to read this property (and cache the loaded class, just like ReflectData.getClassProp(Schema,String)).

I would appreciate an official solution, so I can throw away my code which may break with new versions of the Avro library.

@ex-ratt Thanks for your suggestion and your comment was really helpful for me to understand.

Regarding you point where we can extend the @AvroTypeName annotation functionality to also override the name of the record, in my opinion there is already @AvroName annotation that specially does this. Letting the users override the record name using both @AvroTypeName and @AvroName might be confusing and may also cause conflicts. Take the example below

@AvroTypeName(value = "OverridenAddressValue1") class Address { string address; }

class Person { @AvroName(value = "OverridenAddressValue2") string address; }

When the Person schema is generated, what should the name of the Address record be, OverridenAddressValue1 or OverridenAddressValue2?

Hence I would rather have @AvroTypeName only applied at class level to only override the namespace. Having thought about its better to rename @AvroTypeName to @AvroNamespace. @opwvhk what are your thoughts on this?

Regarding your second point about deserialization failing when using the annotation, that a good point, I l write a testcase to confirm and fix if necessary.

@opwvhk
Copy link
Contributor

opwvhk commented May 16, 2024

Regarding you point where we can extend the @AvroTypeName annotation functionality to also override the name of the record, in my opinion there is already @AvroName annotation that specially does this.

Actually, @AvroName if for fields only; not types. In the original discussion we came to the conclusion that adding a namespace field to that annotation would lead to confusion. On the other hand, using a separate @AvroNamespace annotation would be a start into the annotation problems (as written about quite a bit).

With this, I think we should make a choice:

  1. Use @AvroName for field names, and @AvroTypeName for the full type names (i.e., including namespace)
  2. Adjust @AvroName for fields and types, and @AvroNamespace for type namespaces
  3. Use @AvroName only, but add a special syntax like my.namespace._ to override the namespace only

The advantage of option 2 is that this sidesteps this entire discussion, but it also means there are 2 annotations to adjust the full type name. The latter is what will stay with us forever.

The advantage of option 3 is that we don't have two annotations on full type names, but I'm not happy with the extra syntax.

@henryf1991
Copy link
Author

  1. Use @AvroName for field names, and @AvroTypeName for the full type names (i.e., including namespace)
  2. Adjust @AvroName for fields and types, and @AvroNamespace for type namespaces
  3. Use @AvroName only, but add a special syntax like my.namespace._ to override the namespace only

The advantage of option 2 is that this sidesteps this entire discussion, but it also means there are 2 annotations to adjust the full type name. The latter is what will stay with us forever.

The advantage of option 3 is that we don't have two annotations on full type names, but I'm not happy with the extra syntax.

I prefer the option 2 where @AvroNamespace would let you only override the namespace only and can only be applied at class level. And use the existing @AvroName annotation let you override the name only (no changes to be done for this as this is an existing functionality).

I dont think option 3 works too well because @AvroName annotation can be applied on primitive types as well today and primitive types dont have a namespace attribute on their own.

If we agree that option 2 is the best course of action to implement this feature, then the current code anyways does that today with the only difference being the name of the annotation is @AvroTypeName instead of @AvroNamespace.

@opwvhk if thats fine I l just update the name to @AvroNamespace and the code in its current state does what is expected.

@henryf1991
Copy link
Author

Hi @opwvhk.

Kind reminder. Awaiting your response so I can make the changes required

@opwvhk
Copy link
Contributor

opwvhk commented May 31, 2024

Although I really prefer option 1, I can also live with option 2. Can you please make the changes for @AvroName, @AvroNamespace and the java-class property?

update annotation name
add handling for deserialization
@henryf1991
Copy link
Author

henryf1991 commented Jun 30, 2024

@opwvhk I have made the changes discussed

  1. Updated name of @AvroTypeName to @AvroNamespace
  2. Updated implementation of ReflectDatumReader to ensure deserialization works as expected
  3. Added check to make sure if enclosing class has @AvroNamespace annotation, then the namespace is respected

Kindly request you to review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java Pull Requests for Java binding
Projects
None yet
3 participants