Self reporting tooling schema and processes definition #12

Relequestual · 2024-05-28T14:44:40Z

Related to json-schema-org/community#412 (but does not close).
(This PR probably needs a specific Issue. Sorry.)

This PR introduces a JSON Schema which details the data structure that tooling repositories will be asked to included to self identify themselves to the JSON Schema Ecosystem project.

When the JSON Schema Ecosystem project ingests the data, it will be the Single Source of Truth for that data (beyond the source repositories). The data will then be pulled to the website repository and the landscape repository.

At this stage I'm looking for feedback and comments on the schema. I'll update with an Issue and providing a readme for tooling implementers to help them understand the expectations and what will be done with the data and why.

Work to do based on feedback:

Describe all of the tooling catagories in the readme
Specify the process for pulling in data and that it triggers a PR rather than just being accepted, as any change should be manually reviewed by a member of the JSON Schema team
Pluralize consistently
Clarify purpose of languages field
Require full URL of depended on validators and detail that the validator should already be known to the ecosystem repo to be accepted
Correct use of format to uri
Add field to declare support for other defined dialects
Define ingested data as automated into one file, and curated data into another file with overrides as manual with PR (created automatically)

DarhkVoyd · 2024-05-30T08:23:09Z

@Relequestual Hello, I am currently working to prepare the consolidated new data source for the tooling page. The current data model stores tooling by categories of tooling type and sub categorises them by programming language (if applicable). I like the approach to store each tooling as a separate entity which matches this schema. This allows us to have more control over filters, sorting and other data transformations if ever required. I wanted to propose that we add some field that the tooling is language dependent (to sub categorise toolings such as validators for different languages as default view such as the current view) and language agnostic (for tooling such as editors and more). This way we can consistently maintain the file and the page.

benjagm · 2024-05-31T15:45:00Z

  "required": [
    "name",
    "repositoryURL"
  ],

Regarding the required fields, name, description and toolingType should be mandatory. In addition repositoryURL should not be mandatory as some of the tooling is not opensource.

DarhkVoyd · 2024-06-01T12:04:26Z

   "toolingType": {
       "description": "The category of tooling of the project",
       "type": "string",
       "enum": [
       ...
       ]
     }

I believe that the tooling type should be an array instead of a string as some toolings provide JSON Schema framework including validators, schema generators etc. Case: https://www.newtonsoft.com/jsonschema

Relequestual · 2024-06-06T07:58:39Z

  "required": [
    "name",
    "repositoryURL"
  ],
Regarding the required fields, name, description and toolingType should be mandatory. In addition repositoryURL should not be mandatory as some of the tooling is not opensource.
-@benjagm

The self reporting tooling will only be tooling which has an associated repository.

We can and should track non-open source tooling here also, but we can do that later.

As discussed, we should define a roadmap of changes we will make for tooling data, and then share it publicly.

…or multi-functional tools

… on the website

Also add editor-plugins as tooling category

…s missed

Julian

Nice.

Took a first pass and left some comments with an eye on "low friction", so I've been a bit intentionally harsh, but hopefully helpful.

projects/tooling-self-identification/identification.schema.json

projects/tooling-self-identification/readme.md

gregsdennis

I'm not sure that we should be leaving some of this information to the creators of the projects. I think such a system can be easily abused, and we would need to have checks for that.

For example, the compliance field: can we really trust maintainers to admit that their libraries aren't 100% compliant by default? We already know of at least one that doesn't.

I'm not sure that I agree that this is the right direction. I like the automation aspect, but I still think that the tooling list is something that needs to be curated by a trusted source (someone directly associated with our org).

projects/tooling-self-identification/identification.schema.json

gregsdennis · 2024-06-10T19:32:32Z

projects/tooling-self-identification/identification.schema.json

+ "projectType": {
+ "description": "The type of project, classified by Nadia Eghbal in Working in Public - https://project-types.github.io",
+ "type": "string",
+ "enum": [


projects/tooling-self-identification/identification.schema.json

Fix typo Co-authored-by: Julian Berman <[email protected]>

…n a supported platform, the API will be used to obtain the project description. In response to https://github.com/json-schema-org/ecosystem/pull/12/files/0b7700451f61c345c0b811a79f5c0fa0dba73284\?diff\=unified\&w\=0\#r1633514610

…epository In response to https://github.com/json-schema-org/ecosystem/pull/12/files/0b7700451f61c345c0b811a79f5c0fa0dba73284\?diff\=unified\&w\=0\#r1633520481

… friction. In response to two PR commnets

…t it if we do using the time the file was last updated. In response to https://github.com/json-schema-org/ecosystem/pull/12/files/0b7700451f61c345c0b811a79f5c0fa0dba73284\?diff\=unified\&w\=0\#r1633530303

In response to https://github.com/json-schema-org/ecosystem/pull/12/files/0b7700451f61c345c0b811a79f5c0fa0dba73284\?diff\=unified\&w\=0\#r1633535131

Relequestual · 2024-06-12T12:28:55Z

I'm not sure that we should be leaving some of this information to the creators of the projects. I think such a system can be easily abused, and we would need to have checks for that.

Agreed. We can make it so the automation creates a PR for us to review, rather than blanket acceptance.

For example, the compliance field: can we really trust maintainers to admit that their libraries aren't 100% compliant by default? We already know of at least one that doesn't.

Not always, no. But, I count SIX implementations where we said that they document what must be done to be fully to the specification compliant. Some libraries are written in specific contexts where they don't want that by default, I guess.
If we include the field, some more may add such details if they exist.

I'm not sure that I agree that this is the right direction. I like the automation aspect, but I still think that the tooling list is something that needs to be curated by a trusted source (someone directly associated with our org).

Agreed. We can make it so the automation creates a PR. We need to do this anyway really, because someone could put ANYTHING there. If an implementation repo was compromised, they could add undesierable content.

Fix typos. Remove schema-to-documentation category from tooling types in schema. This seems the same as the category

projects/tooling-self-identification/identification.schema.json

Julian

Left another round of comments, but after addressing whichever of those suit you, lgtm.

gregsdennis

I'm interested in how the override is expected to work. What happens when the maintainer updates an overridden field?

I think it's important to explicitly state that this is data mining and that we will still be curating the end results of the website and landscape.

projects/tooling-self-identification/readme.md

gregsdennis · 2024-06-13T06:20:55Z

projects/tooling-self-identification/readme.md

+
+This project aims to enable tooling authors and maintainers to detail their tools existence and additional information to be listed on the JSON Schema website and Landscape diagram.
+
+The approach is to define a data structure for a file which is located in their own repo, which will then be located and extracted into a single file within this repository. Other repositories such as the website and landscape repositories, will then copy and transform the data as required. The data may be used to augment or totally replace the data they hold, if any.


Suggested change

The approach is to define a data structure for a file which is located in their own repo, which will then be located and extracted into a single file within this repository. Other repositories such as the website and landscape repositories, will then copy and transform the data as required. The data may be used to augment or totally replace the data they hold, if any.

The approach is to define a data structure for a file hosted located in their own repo, which will then be located and extracted into a single file within this repository. Other repositories such as the website and landscape repositories, will then copy and transform the data as required. The data may be used to augment or totally replace the data they hold, if any.

I think maybe just do a read through and see how many "which"'s there are...

Can you explain this please? It feels like you're assuming it should be obvious why this is bad? Does it make it harder to read or understand?

The primary purpose of the readme is to explain things. It often goes into more detail about things.
I'm open to more specific suggestions if you have them.

projects/tooling-self-identification/identification.schema.json

I'm okay with this as a data mining effort, but not as a relay directly to our sites. This data should be curated before it's published in any form.

Fix typo Co-authored-by: Julian Berman <[email protected]>

Fix possessive Co-authored-by: Greg Dennis <[email protected]>

Relequestual · 2024-06-13T08:36:31Z

I'm okay with this as a data mining effort, but not as a relay directly to our sites. This data should be curated before it's published in any form. - @gregsdennis

The readme states that the process will create a PR into this repo and not just mine the data. So it will be curated.

Data that's collected when it meets the above stated criteria will be used to create a Pull Request into the ecosystem repo to add or update the information.

Is this enough? Did you maybe miss this?

If we don't use this data for the website and landscape, we're loosing a LOT of the value from doing this work in the first place.

gregsdennis · 2024-06-13T09:02:09Z

Yeah, posting that comment was more to make it public rather than something we just discussed in private. I'm aware that we'll have an opportunity to review a PR before anything goes in.

I was thinking data ingestion would be purely automatic. Then separately we'd have a curation step where the curated data is a new file (and included any overrides/changes we wanted).

Instead, you've chosen to have ingestion and curation together, but we also have overrides as a separate file. It's just a different way to do it. Also, I suppose the curation part would allow us to just decline to add something, but we'd need some way of tracking that we declined it or the bot will keep finding it.

Relequestual · 2024-06-13T14:27:31Z

You know, I can easily think of several reasons why we would actually want to have a seperate ingestion file vs curated file. So, let me amend the readme.

For example, it would be helpful to keep ingestion events logged in a file which can then be used to more easily make time series graphs.

...we'd need some way of tracking that we declined it or the bot will keep finding it.
Even with ingestion being automated and copying to the curated data being manual, it could be beneficial to know if an updated entry was rejected before, and why.

Fix spelling typo

…red. In response to json-schema-org#12

…ct originates outside of the JSON Schema project. In response to json-schema-org#12

…to another file. In response to json-schema-org#12

…cts object

Relequestual · 2024-06-17T10:37:19Z

🙏 Huge thanks for all your feedback on this work!
I'm convinced this is going to be the start of elevating the ecosystem to the next level.

…red. In response to #12

…ct originates outside of the JSON Schema project. In response to #12

…to another file. In response to #12

Initial version of schema for tooling self identification document

a1f6e2f

Relequestual force-pushed the self-reporting-tooling branch from 5aa4850 to a1f6e2f Compare May 28, 2024 14:58

Relequestual added 4 commits June 6, 2024 13:15

Make tooling type field an array to avoid needing duplicate entries f…

6c0fc78

…or multi-functional tools

Fix use of hyphen rather than camelcase

18f95d1

Add required fields to the base object

60bce43

Tighten up schema requirements

458de58

Relequestual linked an issue Jun 6, 2024 that may be closed by this pull request

Define self reporting tooling data structure #13

Closed

Relequestual added 3 commits June 6, 2024 14:48

Add requirement to include $schema

05e0f03

Add opt-out for appearing on the landscape, so tool could just appear…

ea0601a

… on the website

Add readme to explain the project and its purpose

a6054b3

Relequestual marked this pull request as ready for review June 6, 2024 14:39

Relequestual added 2 commits June 7, 2024 15:33

Add to languages enum and move some values to new envrionments field.

3815714

Also add editor-plugins as tooling category

Harden schema definition by preventing additional properties which wa…

0b77004

…s missed

Julian reviewed Jun 10, 2024

View reviewed changes

gregsdennis previously requested changes Jun 10, 2024

View reviewed changes

Relequestual and others added 8 commits June 11, 2024 09:17

Update projects/tooling-self-identification/identification.schema.json

81922bb

Fix typo Co-authored-by: Julian Berman <[email protected]>

Update projects/tooling-self-identification/identification.schema.json

e48621e

Fix typo Co-authored-by: Julian Berman <[email protected]>

Specify that non-unique names should use the full URL of the source r…

dc45af8

…epository In response to https://github.com/json-schema-org/ecosystem/pull/12/files/0b7700451f61c345c0b811a79f5c0fa0dba73284\?diff\=unified\&w\=0\#r1633520481

Remove project type and repo status. Hard to justify and may increase…

40924c4

… friction. In response to two PR commnets

Aligns naming convention with Bowtie, which makes sense.

3c0fae8

Remove lastUpdated field. We don't need this data here, and we can ge…

3b0f9d7

…t it if we do using the time the file was last updated. In response to https://github.com/json-schema-org/ecosystem/pull/12/files/0b7700451f61c345c0b811a79f5c0fa0dba73284\?diff\=unified\&w\=0\#r1633530303

Add paragraph to recognize that we want to support other source hosts.

9c09829

In response to https://github.com/json-schema-org/ecosystem/pull/12/files/0b7700451f61c345c0b811a79f5c0fa0dba73284\?diff\=unified\&w\=0\#r1633535131

Add definitions for each tooling category.

2749e06

Fix typos. Remove schema-to-documentation category from tooling types in schema. This seems the same as the category

Relequestual requested a review from gregsdennis June 12, 2024 14:11