Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NIFI-12674 Modified ValidateCSV to make the schema optional if a head… #8362

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Freedom9339
Copy link
Contributor

…er is provided. Added validate on attribute option.

Summary

NIFI-12674

Made the schema optional for the ValidateCSV processor if a header is provided. In this case, only the structure of the CSV will validated, using the header to determine how many fields each line should have.

Additionally, a new validation strategy was implemented, Validate on Attribute. This works similar to the way ValidateXML works in which the value of a given attribute of a FlowFile will be treated as the contents of a CSV file. Validation will be done on that attribute and not on the content of the FlowFile.

I would also like to note that I made the stream a variable that can be assigned to either the FlowFile content or the value of the attribute. In doing this, I removed the need for an inner method, and thus all of the variables that previously needed to be Atomic References could now be regular variables.

Tracking

Please complete the following tracking steps prior to pull request creation.

Issue Tracking

Pull Request Tracking

  • Pull Request title starts with Apache NiFi Jira issue number, such as NIFI-00000
  • Pull Request commit message starts with Apache NiFi Jira issue number, as such NIFI-00000

Pull Request Formatting

  • Pull Request based on current revision of the main branch
  • Pull Request refers to a feature branch with one commit containing changes

Verification

Please indicate the verification steps performed prior to pull request creation.

Build

  • Build completed using mvn clean install -P contrib-check
    • JDK 21

Licensing

  • New dependencies are compatible with the Apache License 2.0 according to the License Policy
  • New dependencies are documented in applicable LICENSE and NOTICE files

Documentation

  • Documentation formatting appears as expected in rendered files

@pvillard31
Copy link
Contributor

Any reason for not using ValidateRecord processor if the only requirement is to confirm that the data is valid CSV without any specific constraint?

@pvillard31 pvillard31 changed the title nifi-12674 Modified ValidateCSV to make the schema optional if a head… NIFI-12674 Modified ValidateCSV to make the schema optional if a head… Feb 6, 2024
@Freedom9339
Copy link
Contributor Author

@pvillard31 The ValidateRecord processor splits the input FlowFile into 2 Flowfiles, one for valid and another for invalid records. With the change to ValidateCSV, the whole file will be routed to either valid or invalid.

@dan-s1
Copy link
Contributor

dan-s1 commented Feb 22, 2024

@exceptionfactory Can you please restart the failed job? It does not seem related to the changes. Thanks!

@Freedom9339
Copy link
Contributor Author

Any updates on reviewing this change?

@mattyb149
Copy link
Contributor

There are merge conflicts that need to be resolved

@Freedom9339
Copy link
Contributor Author

@mattyb149 I've rebased against main. Thank You

…er is provided. Added validate on attribute option.
if (schema != null) {
this.parseSchema(schema);
} else if (!headerProp.asBoolean()) {
throw(new Exception("Schema cannot be empty if header is false."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than throwing an exception, a ValidationResult should be added to a list to be returned at the end of the method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants