Universal Scraper Enterprise

Welcome to the official documentation for Universal Scraper Enterprise – our premium solution for dynamic web scraping. Designed for enterprise use, our product offers advanced stealth techniques to bypass modern web security, execute custom instructions, and seamlessly integrate via multiple languages.

Introduction

Universal Scraper Enterprise provides your organization with unparalleled control over web data extraction. Our solution handles:

Server Endpoints

Root Endpoint

URL: /
Method: GET
Description: Serves this documentation page.

Scraping Job Endpoint

URL: /v1
Methods: GET, POST
Description: Initiates a scraping job and returns a jobId along with a status URL.

Job Status Endpoint

URL: /status/:jobId
Method: GET
Description: Retrieves the current status and results of your scraping job.

Request Parameters & Options

The /v1 endpoint accepts the following parameters:

JS Instruction Set

The instructions parameter defines the actions executed within the browser. Supported instructions include:

Basic Navigation & Waits

User Interactions

Scrolling & Evaluation

Captcha & Cloudflare Bypass

Frame-specific Actions

Session & Proxy Management

Persistent browser sessions are maintained for 30 minutes. Provide a session_id to reuse an instance; use delete_session_id to terminate it. Proxies can be configured via the proxies parameter and refined with the country_code.

Integration Examples

The following helper functions demonstrate how to integrate with Universal Scraper Enterprise in PHP, Python, and Node.js. For each language, separate examples are provided for GET and POST requests.

PHP Helper Function


<?php
function scrape(
    $targetUrl,
    $instructions = [],
    $proxies = [],
    $headers = null,
    $httpMethod = 'GET',
    $postData = false,
    $block_resources = [],
    $js_render = false,
    $json_response = false,
    $wait_param = 0,
    $wait_for_param = false,
    $session_id = null,
    $delete_session_id = false,
    $country_code = "any",
    $waitSeconds = 60,
    $maxAttempts = 20
) {
    if (empty($targetUrl) || !filter_var($targetUrl, FILTER_VALIDATE_URL)) {
        throw new Exception("Parameter 'targetUrl' must be a valid URL.");
    }
    if (!is_array($instructions)) {
        throw new Exception("Parameter 'instructions' must be an array.");
    }
    $baseUrl = "https://advanced-scraper.com/v1/";
    $statusBaseUrl = "https://advanced-scraper.com/status/";
    $params = [
        "url"               => $targetUrl,
        "instructions"      => json_encode($instructions),
        "useProxy"          => (!empty($proxies)) ? "true" : "false",
        "proxies"           => json_encode($proxies),
        "customHeaders"     => ($headers !== null) ? "true" : "false",
        "headers"           => ($headers !== null) ? json_encode($headers) : "null",
        "requestMethod"     => strtoupper($httpMethod),
        "postData"          => $postData ?? "false",
        "block_resources"   => json_encode($block_resources),
        "js_render"         => $js_render ? "true" : "false",
        "json_response"     => $json_response ? "true" : "false",
        "wait_param"        => $wait_param,
        "wait_for_param"    => $wait_for_param ? "true" : "false",
        "session_id"        => ($session_id !== null) ? $session_id : "null",
        "delete_session_id" => $delete_session_id ? "true" : "false",
        "country_code"      => $country_code
    ];
    $queryString = http_build_query($params);
    $requestUrl = $baseUrl . "?" . $queryString;
    $ch = curl_init($requestUrl);
    curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
    curl_setopt($ch, CURLOPT_POSTFIELDS, $queryString);
    curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: application/x-www-form-urlencoded"));
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    curl_setopt($ch, CURLOPT_TIMEOUT, 30);
    $response = curl_exec($ch);
    if (curl_errno($ch)) {
        $err = curl_error($ch);
        curl_close($ch);
        throw new Exception('Job creation error: ' . $err);
    }
    curl_close($ch);
    $data = json_decode($response, true);
    if (!$data || !isset($data['jobId'])) {
        throw new Exception("Job creation failed. Response: " . $response);
    }
    $jobId = $data['jobId'];
    $statusUrl = $statusBaseUrl . $jobId;
    $attempts = 0;
    $statusData = null;
    do {
        sleep($waitSeconds);
        $chStatus = curl_init($statusUrl);
        curl_setopt($chStatus, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($chStatus, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($chStatus, CURLOPT_CONNECTTIMEOUT, 10);
        curl_setopt($chStatus, CURLOPT_TIMEOUT, 30);
        $statusResponse = curl_exec($chStatus);
        if (curl_errno($chStatus)) {
            $errStatus = curl_error($chStatus);
            curl_close($chStatus);
            throw new Exception("Failed to fetch job status: " . $errStatus);
        }
        curl_close($chStatus);
        $statusData = json_decode($statusResponse, true);
        if (!$statusData || !isset($statusData['status'])) {
            throw new Exception("Invalid status response: " . $statusResponse);
        }
        $attempts++;
        if ($attempts >= $maxAttempts) {
            throw new Exception("Maximum polling attempts reached.");
        }
    } while ($statusData['status'] === 'processing');
    if ($statusData['status'] === 'completed') {
        return $statusData['result'];
    } else {
        throw new Exception("Job failed: " . ($statusData['error'] ?? "Unknown error."));
    }
}
?>
            

Usage Example (GET):


// Scrape a page via GET
$result = scrape(
    "https://example.com",
    [ { "wait_for": "#main-content" } ]
);
            

Usage Example (POST):


// Submit data via POST
$result = scrape(
    "https://example.com/api/submit",
    [],
    [],
    [ "Content-Type" => "application/json" ],
    "POST",
    '{"name":"John Doe","email":"john@example.com"}'
);