github chat bot

Create This Chatbot To Understand and Chat with Any GitHub Repo

Have you ever found yourself lost in a sea of code, trying to understand a new codebase on GitHub? Or maybe you’re just curious about how a particular project works, but don’t have the time or energy to read through all the code. Well, fear not! The GitHub Code Summary Chatbot is here to help.

This powerful tool allows you to chat with a chatbot about any codebase on GitHub and generate a summary of its code. Built using Streamlit, Langchain, OpenAI, and other libraries, this chatbot provides an interactive and user-friendly interface for exploring and understanding any codebase on GitHub.

In this blog post, we’ll take a closer look at how this chatbot works and how it can be implemented. We’ll break down the process into several steps to make it easier to follow and understand. So sit back, relax, and let’s dive into the world of the GitHub Code Summary Chatbot!

Step 1: Setting up the environment

The first step in implementing the GitHub Chatbot is to set up the environment. This involves installing the necessary libraries and dependencies, such as Streamlit, Langchain, OpenAI, and PyGithub. It also involves loading environment variables from a .env file and checking if OpenAI secrets are available.

Here is an example of how this can be done in the script:

import streamlit as st
import openai_secret_manager
import os
from dotenv import load_dotenv
from github import Github

# Load environment variables from .env file
load_dotenv()

# Check if OpenAI secrets are available
assert "openai" in openai_secret_manager.get_services()
secrets = openai_secret_manager.get_secret("openai")

# Get GitHub personal access token from environment variable
github_token = os.getenv("GITHUB_TOKEN")
g = Github(github_token)
Code language: PHP (php)

In this code snippet, we import the necessary libraries and use the load_dotenv function from the dotenv library to load environment variables from a .env file. We then use the openai_secret_manager library to check if OpenAI secrets are available and retrieve them. Finally, we use the os library to get the GitHub personal access token from an environment variable and use it to initialize a Github object from the PyGithub library.

Step 2: Initializing Langchain

The next step in implementing the GitHub Chatbot is to initialize Langchain with memory, prompt templates, and OpenAI secrets. Langchain is a powerful library that allows us to store information in memory, use prompt templates for human, AI, and system messages, and interact with OpenAI.

Here is an example of how this can be done in the script:

from langchain import Langchain

# Initialize Langchain with memory, prompt templates, and OpenAI secrets
lc = Langchain(memory={"repo_link": None, "code": None}, prompt_templates={
    "human": {
        "ask_repo_link": "Enter a GitHub repository link:",
        "invalid_link": "Invalid link. Please enter a valid GitHub repository link.",
        "ask_question": "Do you have any questions about the code?",
        "answer_question": "{answer}",
        "summary": "{summary}",
        "download_summary": "Download Summary as PDF"
    },
    "ai": {
        "summarize_code": "Write a summary of the following code, describing its main features and how they work: {code}",
        "answer_question": "Answer the following question about the code: {question}"
    },
    "system": {
        "title": "GitHub Code Summary Chatbot"
    }
}, openai_secrets=secrets)
Code language: PHP (php)

In this code snippet, we import the Langchain class from the langchain library and use it to initialize a new Langchain object. We pass several arguments to the Langchain constructor, including a dictionary of initial memory values, a dictionary of prompt templates for human, AI, and system messages, and the OpenAI secrets retrieved earlier.

The memory dictionary allows us to store information in memory for later use. In this case, we initialize it with two keys: repo_link and code, both set to None. These keys will be used later in the script to store the GitHub repository link entered by the user and the code retrieved from that repository.

The prompt templates dictionary allows us to define templates for human, AI, and system messages that will be used throughout the script. In this case, we define several human prompt templates for asking for a GitHub repository link, displaying an error message if the link is invalid, asking if the user has any questions about the code, displaying an answer to their question, displaying a summary of the code, and allowing them to download that summary as a PDF. We also define AI prompt templates for writing a summary of the code and answering questions about it. Finally, we define a system prompt template for setting the title of the Streamlit app.

Step 3: Setting up Streamlit

The next step in implementing the GitHub Code Summary Chatbot is to set up Streamlit. Streamlit is a powerful library that allows us to create interactive web apps using Python. In this case, we will use Streamlit to create the user interface for our chatbot app.

Here is an example of how this can be done in the script:

# Set title of Streamlit app using Langchain system prompt template
st.title(lc.system("title"))

# Create text input field for user to enter GitHub repository link using Langchain human prompt template
repo_link = st.text_input(lc.human("ask_repo_link"))
Code language: PHP (php)

In this code snippet, we use the st.title method from the streamlit library to set the title of our Streamlit app. We pass the title text to this method using the system method of our Langchain object, which retrieves the title text from the system prompt template we defined earlier.

We then use the st.text_input method from the streamlit library to create a text input field for the user to enter a GitHub repository link. We pass the label text for this field to the method using the human method of our Langchain object, which retrieves the label text from the human prompt template we defined earlier.

Step 4: Retrieving and summarizing the code

The next step in implementing the GitHub Code Summary Chatbot is to retrieve the code from the GitHub repository entered by the user and generate a summary of its main features. This involves using the PyGithub library to access the GitHub repository and retrieve its files, and then using an OpenAI GPT model to generate a summary of the code.

Here is an example of how this can be done in the script:

# Check if user has entered a repository link
if repo_link:
    # Check if entered link is a valid GitHub repository link
    if "github.com" not in repo_link:
        # Display error message using Langchain human prompt template
        st.error(lc.human("invalid_link"))
    else:
        # Store repository link in Langchain memory
        lc.memory["repo_link"] = repo_link
        
        # Get repository object using PyGithub library
        repo = g.get_repo(repo_link.split("/")[-2] + "/" + repo_link.split("/")[-1])
        # Get all files in repository
        files = repo.get_contents("")
        code = ""
        # Loop through all files and concatenate their content into one string
        for file in files:
            if file.type == "file":
                code += file.decoded_content.decode("utf-8") + "\n\n"
        
        # Store code in Langchain memory
        lc.memory["code"] = code
        
        # Create prompt for OpenAI GPT model to write summary of code using Langchain AI prompt template
        prompt = lc.ai("summarize_code", code=code)
        # Submit prompt to OpenAI GPT model and get response
        response = lc.openai(prompt)
        # Extract summary text from response
        summary = response["choices"][0]["text"]
        
        # Display summary on Streamlit app using Langchain human prompt template
        st.write(lc.human("summary", summary=summary))
Code language: PHP (php)

In this code snippet, we check if the user has entered a GitHub repository link using an if statement. If they have, we check if the entered link is a valid GitHub repository link by checking if it contains “github.com”. If it does not, we display an error message on the Streamlit app using the st.error method and a human prompt template from our Langchain object.

If the entered link is valid, we store it in our Langchain memory using the memory attribute of our Langchain object. We then use our Github object from the PyGithub library to get a repository object for that link. We use this repository object to get all files in that repository and loop through them, concatenating their content into one string. We then store this string in our Langchain memory as well.

Next, we create a prompt for an OpenAI GPT model to write a summary of our code using an AI prompt template from our Langchain object. We pass this prompt to the openai method of our Langchain object, which submits it to an OpenAI GPT model and returns its response. We extract the summary text from this response and display it on our Streamlit app using another human prompt template from our Langchain object.

Step 5: Adding interactive features

The final step in implementing the GitHub Code Summary Chatbot is to add interactive features that allow users to ask questions about the code and get answers from an OpenAI GPT model. This involves creating additional input fields and buttons on our Streamlit app and using more AI prompt templates from our Langchain object.

Here is an example of how this can be done in the script:

# Create text input field for user to ask questions about the code using Langchain human prompt template
question = st.text_input(lc.human("ask_question"))

# Check if user has asked a question about the code
if question:
    # Create prompt for OpenAI GPT model to answer question about code using Langchain AI prompt template
    prompt = lc.ai("answer_question", question=question)
    # Submit prompt to OpenAI GPT model and get response
    response = lc.openai(prompt)
    # Extract answer text from response
    answer = response["choices"][0]["text"]
    
    # Display answer on Streamlit app using Langchain human prompt template
    st.write(lc.human("answer_question", answer=answer))
Code language: PHP (php)

In this code snippet, we use the st.text_input method from the streamlit library to create a text input field for the user to ask questions about the code. We pass the label text for this field to the method using the human method of our Langchain object, which retrieves the label text from the human prompt template we defined earlier.

We then use an if statement to check if the user has asked a question about the code. If they have, we create a prompt for an OpenAI GPT model to answer their question using an AI prompt template from our Langchain object. We pass this prompt to the openai method of our Langchain object, which submits it to an OpenAI GPT model and returns its response. We extract the answer text from this response and display it on our Streamlit app using another human prompt template from our Langchain object.

In addition to allowing users to ask questions about the code, we can also add other interactive features, such as allowing users to download a PDF with a summary of the code. Here is an example of how this can be done in the script:

from fpdf import FPDF

# Create button to download summary as PDF (code not shown) using Langchain human prompt template
if st.button(lc.human("download_summary")):
    pdf = FPDF()
    pdf.add_page()
    pdf.set_font("Arial", size=12)
    pdf.multi_cell(0, 10, txt=summary)
    pdf.output("summary.pdf")
    
    with open('summary.pdf', 'rb') as f:
        pdf_content = f.read()
    
    b64_pdf_content = base64.b64encode(pdf_content).decode('utf-8')
    
    href = f'<a href="data:application/octet-stream;base64,{b64_pdf_content}" download="summary.pdf">Download Summary as PDF</a>'
    st.markdown(href, unsafe_allow_html=True)
Code language: PHP (php)

In this code snippet, we import the FPDF class from the fpdf library and use it to create a new PDF document. We add a page to this document and set its font and size. We then use the multi_cell method of our FPDF object to add the summary text of our code to this page. We output this PDF document to a file named “summary.pdf”.

We then read the content of this file and encode it in base64. We use this encoded content to create a download link for our PDF document and display it on our Streamlit app using the st.markdown method.

This allows users to download a PDF with a summary of the code by clicking on the download link displayed on our Streamlit app.

Conclusion

In conclusion, implementing the GitHub Chatbot involves several steps, including setting up the environment, initializing Langchain, setting up Streamlit, retrieving and summarizing the code, and adding interactive features. By following these steps and using libraries such as Streamlit, Langchain, OpenAI, PyGithub, and others, we can create a powerful chatbot that allows users to chat with it about any codebase on GitHub and generate a summary of its code.

This chatbot provides users with an interactive and user-friendly interface to explore and understand any codebase on GitHub. It can be customized and extended with additional features and functionality according to your needs. We hope that this blog post has provided you with a better understanding of how this chatbot works and how it can be implemented.