Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to gengo #1152

Draft
wants to merge 25 commits into
base: main
Choose a base branch
from
Draft

Switch to gengo #1152

wants to merge 25 commits into from

Conversation

spenserblack
Copy link
Collaborator

@spenserblack spenserblack commented Aug 25, 2023

This makes some changes to our language support, which I'll be happy to discuss.

Some notable things:

  • IDK about the gix side, but gengo supports bare repos, so maybe we can remove that error message
  • Some ASCII art has been removed. Currently, that's JSX, TSX, Fish, and Bash.
  • Turns out having a dependency that hasn't been released for a while that depends on clap 2 was seriously bloating our dependency tree 😆 Gengo won't have anything like that since the library and binary are separate crates.

To Do

Resolves #26

Gengo will report the type (category) of the detected language.
@spenserblack
Copy link
Collaborator Author

Should the Language type in src/info/langs be dropped completely? I'm a bit Object-oriented obsessed, so I like calling language.get_ascii_art over get_ascii_art(language), but I'm not really seeing a point to this type anymore. Gengo returns a string instead of an enum (e.g. "C++" instead of CPlusPlus). So you'd really just match on the strings.

match language {
    "C++" => c_plus_plus_ascii
}

The other option is to make language a wrapper for String (struct Language(String)).

@spenserblack
Copy link
Collaborator Author

Hey, looks like the language titles in our vercel deployment are prettier now 🎉

@spenserblack
Copy link
Collaborator Author

spenserblack commented Aug 28, 2023

@o2sh How much do we care about breaking changes in the onefetch library?
6ac0ee0 is trying to be non-breaking by continuing to use a language enum so this usage doesn't break, but TBH they could probably just become functions with no implementing type that takes a &str, and also probably pub(crate) since I can't imaging these functions would be reasonable in a public API.

Like this:

Old

impl Language {
    pub fn get_ascii_art(&self) -> &'static str {
        // ...
    }
}

New

pub(crate) get_ascii_art(s: &str) -> &'static str {
    // ...
}

@o2sh
Copy link
Owner

o2sh commented Aug 28, 2023

@o2sh How much do we care about breaking changes in the onefetch library?

No one seems to be using onefetch as a dependency (https://crates.io/crates/onefetch/reverse_dependencies), so we can do whatever we want 😅

Should the Language type in src/info/langs be dropped completely? I'm a bit Object-oriented obsessed, so I like calling language.get_ascii_art over get_ascii_art(language)

Is there a reason why gengo doesn't use an enum to represent the list of possible languages? I'm a bit surprised, to be honest. 🤔 It seems like a natural choice.

@spenserblack
Copy link
Collaborator Author

spenserblack commented Aug 28, 2023

Is there a reason why gengo doesn't use an enum to represent the list of possible languages? I'm a bit surprised, to be honest. 🤔 It seems like a natural choice.

Yeah, I went back and forth a bit on this one. If you look in the commit history when I was initializing the project, you'll see that I did start of with an enum, building out Rust code with tera like onefetch and tokei does. I also experimented with a macro.

But after thinking about it a bit, I chose Language { name: String } over LanguageType::Foo for a few reasons:

  • I can drop the dependency on a templating library (I also wasn't happy with the macro because the results didn't look good in generated docs)
  • Simplifies contributing since you don't have to worry about the language name being valid Rust syntax
  • Allows definitions to be defined at runtime. A small delay to interpret those, and the variants aren't known at compile-time, but it makes the code more flexible and testable by allowing for arbitrary definitions.

But mostly, and this might be a bias from taking a break from Rust for a while, but a branch for every single language honestly just seemed like overkill. With several methods on an enum that each match on themselves to return a literal, it just kind of felt like an array of structs with extra steps 😆

In terms of testing: I recognize that coverage on its own isn't indicative of quality, but I don't really like a pattern where a simple addition to a data file will raise or lower test coverage. It's kind of cheating when you get +90% coverage only because you have 100+ branches and a repetitive test for each one 😆 Though, to be fair, all those tests would certainly guarantee that a language will never be removed.

@@ -17,7 +16,6 @@ Abap:
- "#EEEEEE"
chip: "#E8274B"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the chip color could also be given by gengo ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, we need it for the vercel UI

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if that's the only reason to keep it, we might as well remove it 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds good to me. I didn't add it initially since gengo's colors are different (onefetch's are all from linguist, right?)

Guess we could still get gengo's colors into the vercel UI with WASM, but that feels like major overkill 😆

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just how I prefer how PRs/diffs are made, but I'm thinking we could remove chip colors and use gengo's in a separate PR.

@o2sh
Copy link
Owner

o2sh commented Aug 28, 2023

Makes sense. It's just unfortunate that this makes the interface with onefetch somewhat awkward. Additionally, we might encounter runtime errors if we misspell a language name.

But I agree, that if we decide to take this route, we might as well drop the enum in onefetch as well.

@spenserblack
Copy link
Collaborator Author

🤔 With tokei we had to reference the docs or languages.json in order to find the correct way to name a language. Perhaps there's a way to include the contents of languages.yaml in the generated docs so it's more visible.

@spenserblack
Copy link
Collaborator Author

@o2sh Looping you in on some conversations over at gengo (spenserblack/gengo#164).

Gengo can now analyze submodules (thanks @Byron!), but they are excluded from statistics by default (marked as vendored). In fact, they're kind of a special case where they're always vendored. So a couple of questions since onefetch is a potential user of gengo as a library:

  1. Should onefetch report on submodule data?
  2. What should the "escape hatch" be to include submodules (a few possibilities are mentioned in Mark submodules as vendored by default spenserblack/gengo#164 (comment))?

Cargo.toml Outdated
@@ -37,6 +37,7 @@ bytecount = "0.6.3"
clap.workspace = true
clap_complete = "4.3.2"
crossbeam-channel = "0.5.8"
gengo = "^0.5.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that gengo is also using gix it's worth noting that both need to coordinate their gix versions while pre-1.0. Otherwise you would end up with two distinct gix-dependency trees which costs double to compile and in the final binary.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would want to create a group in the dependabot config so that gix and gengo are updated simultaneously. On gengo's side I can try to promptly create a new release for significant gix updates.

@o2sh
Copy link
Owner

o2sh commented Sep 9, 2023

Sorry for the late answer @spenserblack

Should onefetch report on submodule data?

It looks like linguist doesn't do that, so maybe we should follow their lead. And yeah, it'd be cool if gengo skipped processing submodules by default to speed things up, like mentioned in spenserblack/gengo#164 (comment)

In preparation of #1166.
Merge remote-tracking branch 'upstream/main' into chore/26/switch-to-gengo
@spenserblack
Copy link
Collaborator Author

Should onefetch report on submodule data?

It looks like linguist doesn't do that, so maybe we should follow their lead.

FYI

@spenserblack spenserblack mentioned this pull request Oct 17, 2023
1 task
@spenserblack
Copy link
Collaborator Author

Heads up that the latest version of gengo supports analyzing directories. You can play with it and compare performance between a fully-generic implementation that analyzes the filesystem vs. a git-specific implementation that analyzes git objects.

@o2sh
Copy link
Owner

o2sh commented Jan 14, 2024

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@spenserblack
Copy link
Collaborator Author

Just want to point out that analyzing a git history can make some fun tricks available. For example, building a history of language info:

# this example is for the gengo executable, but could easily be converted to a
# onefetch version.
for rev in $(git log --reverse --format="%H"); do
	gengo git --rev "$rev"
	echo  # add a newline
done

@spenserblack
Copy link
Collaborator Author

v0.11 Now uses an enum to define languages

@spenserblack spenserblack reopened this Feb 22, 2024
@spenserblack spenserblack marked this pull request as draft February 22, 2024 17:29
@o2sh o2sh removed the stale label Mar 16, 2024
@spenserblack spenserblack mentioned this pull request Apr 8, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect language detected (C++ as C, XML as TypeScript, etc.)
3 participants