This project is a VS Code extension that generates a comprehensive README.md file for a codebase using a Retrieval Augmented Generation (RAG) approach. The extension analyzes the codebase, extracts key information, and leverages a large language model (LLM) to create a professional README. It also generates a JSON summary and a PDF report of the codebase.
Codebase Extraction: The VS Code extension uses codebaseExtractor.js to recursively traverse the open workspace folder, gathering information about files and directories. It ignores common folders like node_modules and .git.
JSON and PDF Generation: The extracted data is saved as a JSON file (codesummary.json) and a PDF report (codesummary.pdf) is generated using jspdf. The PDF includes codebase statistics and source code snippets from text files.
README Generation (Python): The codesummary.pdf is passed to the Python backend (app.py). app.py extracts text from the PDF, splits it into chunks, and creates embeddings using Google Gemini's embedding model.
Vector Store Creation: The embeddings are stored in a FAISS vector store for efficient similarity search.
RAG Query: The user's question (a prompt requesting a comprehensive README) is embedded, and a similarity search retrieves the most relevant code chunks.
README Generation (LLM): The relevant code chunks and the user's question are passed to the Google Gemini chat model to generate the README content.
README Post-processing: The generated README is post-processed to ensure proper formatting (headers, code blocks, etc.).
Output: The generated README.md is saved to the workspace folder.
Installation
Prerequisites: Ensure you have Node.js and npm installed. The Python backend requires Python 3 and the packages listed in python/requirements.txt. You'll also need a Google Cloud project with the Gemini API enabled and a GOOGLE_API_KEY.
Install Python Dependencies:
cd python
pip install -r requirements.txt
Install VS Code Extension: Install the "Code Summary Generator" extension from the VS Code Marketplace (or clone this repository and build it).
Usage
Open your VS Code project.
Open the command palette (Ctrl+Shift+P or Cmd+Shift+P).
Type "Generate Code Summary" and select the command.
The extension will generate codesummary.json, codesummary.pdf, and README.md in your workspace folder.
The Python backend (app.py) exposes a single function, process_pdf, which takes the path to a PDF file as input and returns a JSON object containing the generated README content and other statistics. It uses command-line arguments for flexibility.
Configuration
The extension requires a GOOGLE_API_KEY environment variable. You can set this in your system's environment variables or in a .env file in the python directory. The example in extension.js shows how to set it directly in the code for testing purposes. Do not commit your actual API key to version control.
Contributing
Contributions are welcome! Please open an issue or submit a pull request. Ensure your code follows the style guidelines and includes comprehensive tests.