Skip to content
This repository has been archived by the owner on Mar 16, 2024. It is now read-only.

Extend and format docstrings throughout the codebase #56

Open
emrgnt-cmplxty opened this issue Jun 23, 2023 · 0 comments
Open

Extend and format docstrings throughout the codebase #56

emrgnt-cmplxty opened this issue Jun 23, 2023 · 0 comments
Labels
help wanted Extra attention is needed

Comments

@emrgnt-cmplxty
Copy link
Owner

emrgnt-cmplxty commented Jun 23, 2023

Title: Enhancing Docstring Quality through LLM Integration and Formatting

Issue Description:

Our project's documentation is a unique hybrid of human contributions and content generated by our Language Model (LLM). The structure comprises three layers:

  • L1: Inline docstrings authored by humans
  • L2: Docstrings generated by our LLM, which utilizes both source code and L1 docstrings, enriched with SymbolRank
  • L3: Docstrings created by the LLM using source code, L1 & L2 docstrings, and SymbolRank

L3 documentation aids coding by being available alongside the source code. This layer presents a potential opportunity for improvement - we could utilize it to enrich our L1 docstrings, thereby enhancing the initial human-created documentation.

The purpose of this task is to create a mechanism to "bubble up" information from L3 back to L1. This means that we will take the enhanced documentation of L3, refine it, and integrate it into the L1 docstrings. By doing so, we can provide coders with more comprehensive and insightful documentation to follow.

Furthermore, this integration allows for recursive improvement over iterations. If the LLM pipeline is run a second time, it would start with the enriched docstrings from the first run, thus progressively amplifying the quality and richness of the documentation.

Let's take the Tensorflow library as an example to visualize the level of comprehensive docstrings we aim to achieve:

def run_with_all_saved_model_formats(
    test_or_class=None,
    exclude_formats=None):
  """Execute the decorated test with all Keras saved model formats).

  This decorator is intended to be applied either to individual test methods in
  a `keras_parameterized.TestCase` class, or directly to a test class that
  extends it. Doing so will cause the contents of the individual test
  method (or all test methods in the class) to be executed multiple times - once
  for each Keras saved model format.

  The Keras saved model formats include:
  1. HDF5: 'h5'
  2. SavedModel: 'tf'

  Note: if stacking this decorator with absl.testing's parameterized decorators,
  those should be at the bottom of the stack.

  Various methods in `testing_utils` to get file path for saved models will
  auto-generate a string of the two saved model formats. This allows unittests
  to confirm the equivalence between the two Keras saved model formats.

  For example, consider the following unittest:

  \`\`\`python
  class MyTests(testing_utils.KerasTestCase):

    @testing_utils.run_with_all_saved_model_formats
    def test_foo(self):
      save_format = testing_utils.get_save_format()
      saved_model_dir = '/tmp/saved_model/'
      model = keras.models.Sequential()
      model.add(keras.layers.Dense(2, input_shape=(3,)))
      model.add(keras.layers.Dense(3))
      model.compile(loss='mse', optimizer='sgd', metrics=['acc'])

      keras.models.save_model(model, saved_model_dir, save_format=save_format)
      model = keras.models.load_model(saved_model_dir)

  if __name__ == "__main__":
    tf.test.main()
  \`\`\`
  ...

This task, therefore, is not just about enhancing the quality of our documentation but also about creating a novel and innovative approach to docstring creation and maintenance. By using a recursive model of improvement, we can leverage the strengths of both human input and machine learning models to deliver highly effective and continuously improving documentation. This blend of human intuition and machine efficiency has the potential to revolutionize the way we think about and generate documentation in the coding process.

Feel free to post any questions or concerns you have about this implementation. Your contribution to this project is highly appreciated!

@emrgnt-cmplxty emrgnt-cmplxty added the help wanted Extra attention is needed label Jun 23, 2023
@emrgnt-cmplxty emrgnt-cmplxty changed the title Extend and format docstrings Extend and format docstrings throughout the codebase Jun 28, 2023
Huntemall pushed a commit to Huntemall/automata-dev that referenced this issue Oct 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant