Table of Content

Table of Content

Table of Content

How Much Does It Really Cost to Run Browser Based Web Scraping at Scale?

Learn the real cost of running 1000 browser based web scraping requests. Compare commercial JS rendering providers vs cloud setups, and find out when it’s cheaper to run your own scraping infrastructure.

A robot showing how it is using a browser
A robot showing how it is using a browser

Introduction

Web scraping at scale comes with a variety of challenges. Once you move from local development to production, issues inevitably arise. The most common are anti-scraping mechanisms, but there are other, subtler problems. In this blog, I’ll focus on one of those silent but significant challenges:

Running browsers at scale

Why Use Browsers for Web Crawling and Scraping?

There are two main reasons to use browsers:

  1. JavaScript Rendering: Some websites won’t display content unless JavaScript is rendered. That's something that only a browser can handle properly.

  2. Avoiding Detection: Sending raw HTTP requests instead of using a browser can quickly flag your scraper as a bot. That can lead to bans and force frequent proxy changes. This is an oversimplification, but enough to make the point.

How much does it cost to run 1000 requests in a browser?

Comercial Options

Before diving into technical details, it's helpful to benchmark against commercial providers. We're not talking about advanced anti-scraping solutions, just basic JavaScript rendering.

The following prices reflect high-volume use, assuming you're already spending thousands of euros monthly. If you're under €1,000/month, expect higher costs.

Providers

Price JS-Render [$/1000 requests]

Blat

0.364

Firecrawl

0.798

Scrapingbee

0.374

Spider Cloud

0.310

Zenrows

0.394

These prices give a good sense of market rates. Self-hosting is generally cheaper.

Real Cost of Running 1,000 Browser Requests (Excluding Proxies)

We're focusing on the browser cost only. Proxy pricing varies greatly depending on the vendor, quality, and target sites.
The Key Cost Variables of running a pool of browsers in the cloud is the Cloud Provider.

Cloud Provider

The pricing breakdown shown below is based on the following assumptions:

  • Each browser instance needs ~2GB RAM and 1 CPU.

  • Average page load time: 10 seconds.

More powerful machines might decrease the time required to load a page in the browser, but they are also more expensive.

Serverless Function (Lambda)
Description

Using serverless functions is ideal to absorb bursts of requests that need to be handled in near real time.

However, part of the requests you are sending will suffer the cold start extra time, as the docker image needs to be loaded into the serverless function. For instance, Google Cloud Functions promises cold starts of 2 seconds. There are some other cloud providers like Scaleway Serverless Functions where the cold start is closer to 15 seconds.

Keep in mind that if you send one request to a serverless function, you will be charged not only for the execution time of your request, but for the entire time the lambda is up and running. This retention period is not explicitly documented but is generally observed to last between 5 to 15 minutes of inactivity.

Pricing

The cost of executing 1000 requests follows the following formula:

total_cost = 1000 · ( CC · T + F)

Where:

  • CC is the compute cost per second.

  • T is the average execution time per invocation in seconds.​

  • F is the fixed cost per 1 million requests.​

For instance, for the Google Cloud Functions is calculated as follows:

1000 * (0.000029*10 + 0.4/1000000) = 0.29040

Cloud Provider

Compute Cost of 2 GB and 1 cpu (CC) [$/s]

Request Cost per 1M Invocations (F)
[$/1M requests]

Total Cost
[$/1k requests]

AWS Lambda

0.0000333

0.20

0.33320

Azure Functions

0.000052

0.20

0.52020

Google Cloud Functions

0.000029

0.40

0.29040

Scaleway Serverless Functions

0.000024

0.15

0.24015

Virtual Servers (on demand)
Description

Unlike serverless functions, virtual servers require you to manage the infrastructure, adding complexity. Additionally, launching a new virtual server takes minutes, significantly longer than the seconds needed for serverless function cold starts.

The main benefit of this approach is that it can reduce the costs of serverless functions by a factor of ~3.

Pricing

We are comparing machines with 4 Gb of Ram and 2 Cpus. This means we can run 2 browsers in the same machine. Remember 1 browser needs around 2Gb and 1 Cpu to run smoothly.

In this case the formula to calculate the `Total Cost` is a bit different:

total_cost = c_vm · T / (3600 · N)

Where:

  • c_vm is the cost of the Virtual Machine [$/h].

  • T is the average execution time per invocation in seconds.​

  • N is amount of browser you can run in the instance. In this case this is equal to 2 as the machines have 4 Gb of RAM and 2 CPUs.

For instance, for the AWS EC2 is calculated as follows:

(0.08925 * 10 / (3600 * 2)) * 1000  = 0.08925

Cloud Provider

Machine

Cost per hour c_vm
[$/h]

total_cost
[$/1k requests]

AWS EC2

c7i.large

0.08925

0.12395

Azure Virtual Machines

F2s v2

0.0846

0.1175

Google Virtual Machines

c2d-highcpu-2

0.07496

0.1041138

Scaleway Virtual Instances

POP2-HC-2C-4G

0.06100

0.084722

Long term commitments of consumption allows you to access to more competitive pricing. Some cloud providers offer ~30% and ~50% savings for 1 and 3 years commitment respectively.

Threshold (n)

Knowing the costs of running 1000 requests in a browser (c), we can now calculate the number of requests (threshold) where it makes no longer sense to keep externalizing the execution of browsers (from a cost perspective).

As a rule of thumb we can use the formula below, to know when it's better to internalize or externalize your pool of browsers.

(p - c) · n <= 2 · s

Where:

  • p is the price per request offered by the commercial solutions [$/request]

  • c is the cost per request [$/request]

  • n number of requests per month [requests/month]

  • s salary of a senior data engineer [$/month]

It makes sense to run your own pool of browsers (or even proxies) if the formula above is false.
Even if it is false there might still be reasons to externalize it like:

  • Quick time to market to validate products fast.

  • Keep the focus of your company on those parts that are part of the core, externalize the rest.

In the formula above we are considering that at least you need 2 engineers to ensure your pool of browsers is always up and running and there is always someone available (holidays, sick leaves, etc) in case something breaks.

Assuming your team does not have the knowledge of managing their own infrastructure, they are forced to use the Serverless Function (Lambda).

Cost (c): $0.24015/1000 requests
Price Comercial Solution (p): $0.364/1000 requests (Blat solution).
Salary (s): $80.361,86 per year (considering average salary of a data engineer in Germany)

n = 2 * s / (p - c)
n = 2 * (80361.86/12) / ((0.364 - 0.24015)/1000)
n = 108 million requests / month
n = 108 / 30 = 3.6 million requests / day

So, with these prices (p) consider to internalize your pool of browsers at 3.6 million requests / day.

In case your team does have the knowledge (and the time) of managing their own infrastructure, they might be interested in using the Virtual Servers (on-demand).

In this case, their cost (c) most probably is $0.084722/1000 requests instead of $0.24015/1000 requests .

If we do the numbers again,

n = 2 * s / (p - c)
n = 2 * (80361.86/12) / ((0.364 - 0.084722)/1000)
n = 48 million requests / month
n = 48 / 30 = 1.6 million requests / day

Looks like your limit in this case is not 3.6 million requests, but 1.6 million requests instead.

Tip: keep in mind that these calculations only take into account the cost of running browsers in the cloud, without considering proxies. Adding the cost of proxies will increase the costs (c), reducing the benefit (p-c)from the commercial solution, so increasing the threshold (n).

Conclusion

If you're scraping at scale (1.6M - 3.6M requests/day or more), talk to your browser provider. You may be eligible for significant discounts, or it might be time to consider other browser providers or building your own pool.

Read Next