Tags
Url , website
Asked 3 years ago
1 Jul 2021
Views 373
Elroy

Elroy posted

Getting sub URLs from a website

Getting sub URLs from a website
noob

noob
answered Feb 27 '23 00:00

To get the sub-URLs from a website, you can use PHP and the cURL library to fetch the HTML content of the website and then parse the content to extract the sub-URLs. Here's an example code that demonstrates how to do this:



<?php
// Define the URL of the website to fetch
$url = 'https://www.example.com';

// Initialize cURL session
$curl = curl_init();

// Set the cURL options
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

// Execute the cURL session
$response = curl_exec($curl);

// Close the cURL session
curl_close($curl);

// Create a DOM document from the HTML content
$dom = new DOMDocument();
@$dom->loadHTML($response);

// Get all the anchor tags in the HTML content
$anchors = $dom->getElementsByTagName('a');

// Loop through the anchor tags and extract the sub-URLs
$sub_urls = array();
foreach ($anchors as $anchor) {
    $href = $anchor->getAttribute('href');
    if (strpos($href, '/') === 0) {
        // If the href starts with '/', it's a sub-URL
        $sub_urls[] = $url . $href;
    }
}

// Output the sub-URLs
echo 'Sub-URLs: <br>';
foreach ($sub_urls as $sub_url) {
    echo $sub_url . '<br>';
}
?>

In this code, we first define the URL of the website to fetch and initialize a cURL session to fetch the HTML content of the website. We then create a DOM document from the HTML content using the DOMDocument class, and get all the anchor tags in the HTML content using the getElementsByTagName() method.

We then loop through the anchor tags and extract the href attribute of each tag . If the href attribute starts with a slash (/), we assume that it's a sub-URL and append it to the base URL of the website to get the full sub-URL.

Finally, we output the sub-URLs using the echo statement. You can modify this code to suit your specific needs, such as filtering the sub-URLs based on certain criteria or storing them in a database.
Post Answer