I Coded… Actually ChatGPT Did…phpBB Scraper

Clicking a link to eBay [or other affiliations] may result in a referral commission being paid if a purchase is made.

Intro

Im a geek at heart and I’m impressed. Im probably well behind the times but ChatGPT has just coded me a script in Python.

I keep seeing social media posts suggesting using AI. So I did. At first I thought I was wasting my time but it got interesting quickly.

This has started from manually creating a list of all topics in a phpBB forum with one or more replies. (for this list here)

Initially, I asked chat GPT to do this directly;

On my website, fiestamk7.com, can you create a list/links to all topics with more than one reply in the Fiesta MK7 ‘Issues & Questions’ category

I got a a vague response with two examples. Being a bit more forceful & blunt;

Create a complete list

After telling me it knew it was browsing https://www.fiestamk7.com/viewforum.php?f=5, it said it can’t access it directly without me telling it the address. So I gave it to it;

https://www.fiestamk7.com/viewforum.php?f=5

To which it replied; ‘Would you like me to continue gathering results from more pages to build a truly complete list across the whole forum?’ Which of course was a yes. But it tells me it can’t navigate to each page, it needs a list of links to each page. This particular category in the forum has 52 pages, with 25 posts on each, so;

can you first create a list of urls following the pattern: https://www.fiestamk7.jamescooke.me.uk/viewforum.php?f=5&start=25 https://www.fiestamk7.jamescooke.me.uk/viewforum.php?f=5&start=50 https://www.fiestamk7.jamescooke.me.uk/viewforum.php?f=5&start=75 up to https://www.fiestamk7.jamescooke.me.uk/viewforum.php?f=5&start=1325

Lovely it gives me a complete list. Which I kindly give back to it with another prompt;

Create a complete list of topics with one or more replies from the following URLs https://www.fiestamk7.jamescooke.me.uk/viewforum.php?f=5&start=25 https://www.fiestamk7.jamescooke.me.uk/viewforum.php?f=5&start=50 https://www.fiestamk7.jamescooke.me.uk/viewforum.php?f=5&start=75 https://www.fiestamk7.jamescooke.me.uk/viewforum.php?f=5&start=100
etc all the way to start=1325

And I get back: This may take a bit of time due to the volume of pages (54+), but I’ll deliver a clean and complete list once it’s all ready. Would you prefer:

Plain text format,

HTML list (for embedding on a site), or

Downloadable file (CSV or Markdown)?

Now long of the short, it very kindly replies to all of my requests and what I end up with is a list like this, HTML formatted for pasting into my page.

spongy brakes
Note: 3 replies
Discussion on causes and potential fixes for spongy brake feel.
LINK TO PAGE

DRL Installation Guide
Note: 23 replies
Step-by-step instructions and discussion on installing daytime running lights.
LINK TO PAGE

MK7 TDCI, Died out and won’t start
Note: 3 replies
Troubleshooting a Mk7 TDCi that stalled and now won’t restart.
LINK TO PAGE

How good is that.

It even has generated a description based on the content. FFS I couldn’t achieve that in weeks, let alone minutes.

Onto the Python script.

At some point it has mentioned a script. So I’ve started another chat to make this script so I can do it myself (not via ChatGPT). It tells me that it will do it in Python (coding language).

At first it comes out with a basic script that achieved what I need. But then I thought lets make this a bit generic and after many many prompts this is what it spat out.

A brief overview of the script (this following description is AI generated)

This script is a phpBB forum scraper that:

🔍 Purpose

Scrapes topics from a phpBB-based forum category and collects:

Topic titles
Number of replies (only if replies ≥ 1)
Direct topic URLs

🧩 Main Features

Dependency Check: Ensures requests, beautifulsoup4, and tqdm are installed before proceeding.
User Input: Prompts the user to enter:
- Forum domain (e.g., https://exampleforum.com)
- Forum path (e.g., /viewforum.php?f=5)
- Start and end page offsets
- Page step value (e.g., 25 if 25 topics per page)
- Output filename for results
Scraping Logic:
- Iterates through forum pages using the specified range and step
- Extracts topics that have at least one reply
- Constructs the full URL for each topic
Output:
- Writes the results to an HTML file with topic titles, number of replies, and clickable links

💡 Notes

Makes use of tqdm for a progress bar during scraping
Adds a 2-second delay between page requests to be polite to the server

My thoughts

Although I’ve not learnt a great deal, these AI tools are incredibly powerful. I would love to learn to write this code from scratch but in reality, I don’t need to and I would rather spend the time doing other things.

Without knowing some basics (using terminal/understanding basics of languages & dependancies) I probably wouldn’t have got this far. But then again if this code can be written by AI, with prompts asking you to ask it to improve it further, it can show you everything else you don’t know how to do.

In this instance it has saved me time generating the list I wanted, going forward it will be doing other tasks for me:
– Like the featured image for this post
– And helping create a php/mysql based website for a specific use case. (soon)
– And anything else that has a reasonable amount of data that needs analysis (excel sheets anyone?)
– Come up with obscure set lists for DJing?

Possibilities are pretty open, why not embrace it.

Link to my conversation with ChatGPT about the Python code: https://chatgpt.com/share/6806c448-8a14-8002-9ca4-7eab18c444e5

The code itself;

# -*- coding: utf-8 -*-
# Dependency Check
try:
    import requests
    from bs4 import BeautifulSoup
    from urllib.parse import urljoin
    from tqdm import tqdm
    print("\n✅ Dependencies checked and loaded: requests, beautifulsoup4, tqdm")
except ImportError as e:
    missing = str(e).split()[-1].strip("'")
    print(f"\n🚨 Missing dependency: {missing}")
    print("🔧 Please install all required packages by running:\n")
    print("    pip3 install requests beautifulsoup4 tqdm\n")
    exit(1)

import time

headers = {
    "User-Agent": "Mozilla/5.0 (compatible; ForumScraper/1.0)"
}

def scrape_topics(base_url, start, headers, domain):
    url = f"{base_url}{start}"
    response = requests.get(url, headers=headers)
    if response.status_code != 200:
        print(f"❌ Failed to fetch page {start} (status {response.status_code})")
        return []

    soup = BeautifulSoup(response.text, "html.parser")
    topics = []

    for row in soup.select("li.row"):
        title_tag = row.select_one("a.topictitle")
        if not title_tag:
            continue

        replies_tag = row.select_one("dd.posts")
        if not replies_tag:
            continue

        replies_text = replies_tag.text.strip()
        replies_number = ''.join(filter(str.isdigit, replies_text))
        if not replies_number:
            continue
        replies = int(replies_number)

        if replies >= 1:
            title = title_tag.text.strip()
            link = urljoin(domain, title_tag["href"].replace("&amp;", "&"))
            topics.append((title, replies, link))

    return topics

def main():
    # Confirm what this script does
    print("\n📘 This script scrapes a phpBB forum category for topics with one or more replies.")
    print("It outputs a HTML file listing each topic title, number of replies, and a link to the topic.")

    proceed = input("\n❓ Do you want to continue? (yes/no): ").strip().lower()
    if proceed not in ["yes", "y"]:
        print("\n👋 Exiting script.")
        exit(0)

    print("\n🔧 Forum Scraper Setup 🔧\n")

    domain = input(
        "🌐 Enter the base domain of the forum (e.g., https://exampleforum.com):\n"
        "   Make sure it includes 'https://' or 'http://'\n> ").strip().rstrip('/')

    forum_path = input(
        "\n📁 Enter the forum path (e.g., /viewforum.php?f=5):\n> ").strip()

    base_url = f"{domain}{forum_path}&start="

    try:
        start_page = int(input(
            "\n📄 Enter the START PAGE number\n"
            "   This is usually 0 for the first page.\n> "))

        end_page = int(input(
            "\n📄 Enter the END PAGE number\n"
            "   This is the number from the end of the URL when viewing the last page of topics.\n"
            "   For example, if the URL ends in 'viewforum.php?f=5&start=1325', enter 1325.\n> "))

        step = int(input(
            "\n🔁 Enter the STEP value\n"
            "   This is how much the page offset increases per page.\n"
            "   For example, if there are 25 posts per page, enter '25'.\n> "))
    except ValueError:
        print("\n⚠️ Invalid input. Please enter numeric values for start, end, and step.")
        return

    output_file = input(
        "\n💾 Enter output filename (e.g., forum_scraper_output.html)\n"
        "   Press Enter to use default: 'forum_scraper_output.html'\n> ").strip()
    if not output_file:
        output_file = "forum_scraper_output.html"

    print("\n⏳ Scraping in progress...\n")
    all_topics = []
    page_offsets = list(range(start_page, end_page + step, step))

    for start in tqdm(page_offsets, desc="📄 Scraping pages", unit="page"):
        topics = scrape_topics(base_url, start, headers, domain)
        all_topics.extend(topics)
        time.sleep(2)

    with open(output_file, "w", encoding="utf-8") as f:
        for title, replies, link in all_topics:
            f.write('<p>\n')
            f.write(f'  <strong>{title}</strong><br>\n')
            f.write(f'  <em>Note: {replies} replies</em><br>\n')
            f.write(f'  <a href="{link}">LINK TO PAGE</a><br>\n')
            f.write('</p>\n\n')

    print(f"\n✅ Done! Output saved to: {output_file}")

if __name__ == "__main__":
    main()

Intro

Onto the Python script.

A brief overview of the script (this following description is AI generated)

My thoughts

The code itself;

Leave a ReplyCancel reply

Related News

HIKVision IP Camera Stuck On Black / White (Night Mode) – Fixed

Crypto Currency, Some For Nothing?