Build This Custom Ai Knowledge Bot to Have Conversations with Your Website

As a website owner or marketer, getting insights into your website’s performance and user behavior is crucial for making informed decisions and improving your online presence. However, analyzing website data can be a time-consuming and complex task, especially if you’re not familiar with tools like Google Analytics.

This is where AI comes in. By using AI, you can create a chatbot that can have conversations with you about your website and its data. This makes it easier for you to get insights into your website’s performance and user behavior, without having to spend hours analyzing data.

In this tutorial, we will show you how to build a chat app using Streamlit, LangChain, BeautifulSoup, and the Google Analytics API. This app allows you to input your website URL and connect to your Google Analytics account. The app retrieves data from your Google Analytics account based on the website URL you provided and uses BeautifulSoup to scrape your entire website to extract relevant information. The app then uses LangChain to create a custom-knowledge chatbot that can access the data retrieved from Google Analytics and the information extracted from the website to provide contextually rich interactions based on your website data. The chatbot is integrated into the Streamlit app using Streamlit’s chat elements.

Step 1: Set up Streamlit app and site info

First, we need to set up our Streamlit app and create user input fields for the website URL and Google Analytics authentication. We can do this using Streamlit’s set_page_config, title, text_input, and button functions:

import streamlit as st

st.set_page_config(page_title="Website Chatbot", page_icon=":robot_face:")
st.title("Website Chatbot")

website_url = st.text_input("Enter your website URL:")
auth_button = st.button("Authenticate Google Analytics")
Code language: JavaScript (javascript)

This code sets the page title and icon of our Streamlit app and creates a text input field for you to enter your website URL. It also creates a button that you can click to authenticate your Google Analytics account.

Step 2: Retrieve data from Google Analytics

Once you have entered your website URL and authenticated your Google Analytics account, we can use the Google Analytics API to retrieve data from your account based on the website URL you provided. We can do this using the google.oauth2.credentials, googleapiclient.discovery, and build modules:

from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build

creds = st.secrets["google"]
credentials = Credentials.from_authorized_user_info(info=creds)
service = build("analyticsreporting", "v4", credentials=credentials)
response = service.reports().batchGet(
    body={
        "reportRequests": [
            {
                "viewId": creds["view_id"],
                "dateRanges": [{"startDate": "30daysAgo", "endDate": "today"}],
                "metrics": [{"expression": "ga:sessions"}],
                "dimensions": [{"name": "ga:pagePath"}],
                "dimensionFilterClauses": [
                    {
                        "filters": [
                            {
                                "dimensionName": "ga:pagePath",
                                "operator": "EXACT",
                                "expressions": [website_url],
                            }
                        ]
                    }
                ],
            }
        ]
    }
).execute()
ga_data = response["reports"][0]["data"]["rows"][0]["metrics"][0]["values"][0]
Code language: JavaScript (javascript)

This code uses your Google Analytics credentials to create a service object that can interact with the Google Analytics API. It then uses this object to send a batchGet request to retrieve data from your account based on the website URL you provided. The response is parsed to extract the relevant data, which is stored in the ga_data variable.

Step 3: Scrape your entire website

Next, we need to use BeautifulSoup to scrape your entire website and extract relevant information. We can do this by defining a recursive function that follows links found on each page of the website and scrapes all pages:

from bs4 import BeautifulSoup
import requests
from urllib.parse import urljoin

def scrape_website(url, visited=None):
    if visited is None:
        visited = set()
    if url in visited:
        return ""
    visited.add(url)
    page_text = ""
    try:
        page = requests.get(url)
        soup = BeautifulSoup(page.content, "html.parser")
        page_text += soup.get_text()
        for link in soup.find_all("a"):
            href = link.get("href")
            if href and not href.startswith("#"):
                full_url = urljoin(url, href)
                page_text += scrape_website(full_url, visited)
    except:
        pass
    return page_text

page_text = scrape_website(website_url)
Code language: PHP (php)

This code defines a scrape_website function that takes a URL as input and recursively follows links found on each page of your website. The function uses BeautifulSoup to extract the text from each page and returns a concatenated string containing the text from all pages. The function is called with your website URL as an argument to scrape the entire website and extract its text, which is stored in the page_text variable.

Step 4: Create a custom knowledge chatbot

Once we have retrieved data from Google Analytics and scraped your website, we can use LangChain to create a custom knowledge chatbot that can access this data and provide contextually rich interactions based on your website data. We can do this using the LangChain class and its add_knowledge method:

from langchain import LangChain

lc = LangChain(api_key=st.secrets["langchain"]["api_key"])
lc.add_knowledge(
    f"The number of sessions on {website_url} in the past 30 days is {ga_data}."
)
lc.add_knowledge(f"The text on the entire {website_url} website is {page_text}.")
Code language: JavaScript (javascript)

This code creates a LangChain instance and uses its add_knowledge method to add custom knowledge about the data retrieved from Google Analytics and the text extracted from your website. This knowledge can then be used by the chatbot to provide more accurate and contextually rich responses when interacting with you.

Step 5: Integrate chatbot into Streamlit app

Finally, we need to integrate our chatbot into the Streamlit app using Streamlit’s chat elements. We can do this using Streamlit’s form, text_input, form_submit_button, and write functions:

with st.form(key="chat_form"):
    user_input = st.text_input("Enter your message:")
    submit_button = st.form_submit_button(label="Send")
    if submit_button:
        response = lc.chat(user_input)
        st.write(f"User: {user_input}")
        st.write(f"Chatbot: {response}")
Code language: JavaScript (javascript)

This code creates a form within our Streamlit app that includes a text input field for you to enter your message and a submit button to send your message to the chatbot. When you click the submit button, your message is sent to the chatbot, which generates a response using the custom knowledge we added earlier. Your message and the chatbot’s response are then displayed within the app using Streamlit’s write function.

Conclusion

In this tutorial, we showed you how to build a chat app using Streamlit, LangChain, BeautifulSoup, and the Google Analytics API. This app makes it easier for website owners and marketers to get insights into their website’s performance and user behavior by using AI to have conversations with their website and data. We hope this tutorial was helpful and that you are now able to build your own chat app using these tools!

Make sure to make it your own. Implement more data sources and play with templating and styling. You can also implement the ability to take advantage of Streamlit’s built in graphing components to allow the chat bot to create graphs for you. What do you plan to build?