Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unexpected behaviour for metadata with NULL or NA content #203

Open
ElsLommelen opened this issue Apr 16, 2024 · 3 comments
Open

unexpected behaviour for metadata with NULL or NA content #203

ElsLommelen opened this issue Apr 16, 2024 · 3 comments
Labels
complexity:low Likely not complex to implement enhancement New feature or request function:write_package Function write_package()
Milestone

Comments

@ElsLommelen
Copy link

When adding a metadata item without content (e.g. because all other tables or columns have content for this metadata item), writing and reading the package alters this content: NA becomes NULL, and NULL becomes Named list(), and also the written datapackage.json is different. I don't mind the distinction between NA or NULL, so I don't mind if they would be saved and reloaded as the same value (or not at all), but I find annoying that the written file changes when first giving the metadata value NA (which is kind of a default given by functions in R if no data are available).

The reprex demonstrates the issue for metadata on the table level, but metadata on the column level behave similar.

library(frictionless)
#> Warning: package 'frictionless' was built under R version 4.3.3

# creating a package with metadata title = NULL and description = NA
my_package <-
  create_package() |>
  add_resource(
    resource_name = "iris",
    data = iris,
    title = NULL,
    description = NA
  )
str(my_package)
#> List of 2
#>  $ resources:List of 1
#>   ..$ :List of 10
#>   .. ..$ name       : chr "iris"
#>   .. ..$ data       :'data.frame':   150 obs. of  5 variables:
#>   .. .. ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#>   .. .. ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>   .. .. ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>   .. .. ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#>   .. .. ..$ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#>   .. ..$ profile    : chr "tabular-data-resource"
#>   .. ..$ format     : NULL
#>   .. ..$ mediatype  : NULL
#>   .. ..$ encoding   : NULL
#>   .. ..$ dialect    : NULL
#>   .. ..$ title      : NULL
#>   .. ..$ description: logi NA
#>   .. ..$ schema     :List of 1
#>   .. .. ..$ fields:List of 5
#>   .. .. .. ..$ :List of 2
#>   .. .. .. .. ..$ name: chr "Sepal.Length"
#>   .. .. .. .. ..$ type: chr "number"
#>   .. .. .. ..$ :List of 2
#>   .. .. .. .. ..$ name: chr "Sepal.Width"
#>   .. .. .. .. ..$ type: chr "number"
#>   .. .. .. ..$ :List of 2
#>   .. .. .. .. ..$ name: chr "Petal.Length"
#>   .. .. .. .. ..$ type: chr "number"
#>   .. .. .. ..$ :List of 2
#>   .. .. .. .. ..$ name: chr "Petal.Width"
#>   .. .. .. .. ..$ type: chr "number"
#>   .. .. .. ..$ :List of 3
#>   .. .. .. .. ..$ name       : chr "Species"
#>   .. .. .. .. ..$ type       : chr "string"
#>   .. .. .. .. ..$ constraints:List of 1
#>   .. .. .. .. .. ..$ enum: chr [1:3] "setosa" "versicolor" "virginica"
#>  $ directory: chr "."
#>  - attr(*, "class")= chr [1:2] "datapackage" "list"

# writing the package
write_package(my_package, "irisdir")

# in datapackage.json, title = {} and description = null

# when reading the package again, title = Named list() and description = NULL
my_loaded_package <- read_package("irisdir/datapackage.json")
str(my_loaded_package)
#> List of 2
#>  $ resources:List of 1
#>   ..$ :List of 9
#>   .. ..$ name       : chr "iris"
#>   .. ..$ path       : chr "iris.csv"
#>   .. ..$ profile    : chr "tabular-data-resource"
#>   .. ..$ format     : chr "csv"
#>   .. ..$ mediatype  : chr "text/csv"
#>   .. ..$ encoding   : chr "utf-8"
#>   .. ..$ title      : Named list()
#>   .. ..$ description: NULL
#>   .. ..$ schema     :List of 1
#>   .. .. ..$ fields:List of 5
#>   .. .. .. ..$ :List of 2
#>   .. .. .. .. ..$ name: chr "Sepal.Length"
#>   .. .. .. .. ..$ type: chr "number"
#>   .. .. .. ..$ :List of 2
#>   .. .. .. .. ..$ name: chr "Sepal.Width"
#>   .. .. .. .. ..$ type: chr "number"
#>   .. .. .. ..$ :List of 2
#>   .. .. .. .. ..$ name: chr "Petal.Length"
#>   .. .. .. .. ..$ type: chr "number"
#>   .. .. .. ..$ :List of 2
#>   .. .. .. .. ..$ name: chr "Petal.Width"
#>   .. .. .. .. ..$ type: chr "number"
#>   .. .. .. ..$ :List of 3
#>   .. .. .. .. ..$ name       : chr "Species"
#>   .. .. .. .. ..$ type       : chr "string"
#>   .. .. .. .. ..$ constraints:List of 1
#>   .. .. .. .. .. ..$ enum: chr [1:3] "setosa" "versicolor" "virginica"
#>  $ directory: chr "irisdir"
#>  - attr(*, "class")= chr [1:2] "datapackage" "list"

write_package(my_loaded_package, "irisdir2")

# and in this datapackage.json, title = {} and description = {}

Created on 2024-04-16 with reprex v2.0.2

@peterdesmet
Copy link
Member

peterdesmet commented Apr 17, 2024

Thanks for reporting. We have a helper function clean_list() that allows to sanitize NULL, list() etc. We could run it on resource or datapackage before writing, but I'm afraid it might have unintended side effects.

It's probably better to extend this line:

https://github.com/frictionlessdata/frictionless-r/blob/5024c909e67591a6eae9347e31ac99d6fa795749/R/write_package.R#L77C29-L77C35

With the properties null = "null" and na = "null", so values are always exported the same way (as NULL, which is the default for lists).

@peterdesmet
Copy link
Member

Using na = "string" would cause all NA values to be exported as "NA" (and thus different than NULL values). This is probably not desirable, since reading the package would not interpret those automatically as NA. At least NULL has an inherent meaning in lists.

@ElsLommelen
Copy link
Author

With the properties null = "null" and na = "null", so values are always exported the same way (as NULL, which is the default for lists).

This indeed seems a good solution: now it is null = "list" and na = "null", I suppose, and replacing the first by null = "null" would give the same behaviour after writing for NULL and NA

@peterdesmet peterdesmet added this to the 1.2.0 milestone Apr 22, 2024
@peterdesmet peterdesmet added enhancement New feature or request function:write_package Function write_package() labels Apr 22, 2024
@peterdesmet peterdesmet added the complexity:low Likely not complex to implement label Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity:low Likely not complex to implement enhancement New feature or request function:write_package Function write_package()
Projects
None yet
Development

No branches or pull requests

2 participants