Table of Content

Table of Content

Table of Content

Scraping mobile app traffic

Intercept network traffic to scrape mobile apps. Capture HTTP requests to understand data exchange. This tutorial can be used to to scrape Android and iOS apps, even though the example is with Android.

Robot scraping mobile app traffic
Robot scraping mobile app traffic

What will you learn?

This tutorial teaches you how to intercept HTTP traffic from Android mobile apps, enabling you to scrape data from android mobile apps. It covers the basics but not advanced Reverse Engineering techniques.

Introduction

The first step when scraping mobile apps is to intercept the traffic between the mobile app and the server. This is done by setting up a proxy server and configuring the mobile device to route its traffic through the proxy server.

In other words, we will use this proxy server to act as a middleman between the mobile app and the server. So the proxy server (middleman) can see all the traffic between the app and the server. This traffic with all the requests and responses will be saved in a Http ARchive (also known as HAR Files).

The intercepted traffic will help us to understand how the mobile app communicates with the server and this will allow us to create a code to scrape the mobile app. This technique can be used to scrape android but also to scrape iphone apps.

Intercept Traffic

Requirements

What do you need?

  1. Target application

  2. Rooted mobile device (real or emulator)

  3. HTTP Traffic Interceptor (Proxy Server)

Target Application

The target application is the app that you want to scrape. The following tutorial is focused on Android, so we are looking to download an APK file.

It can be any app that you want to extract data from. You can easily download them from

Personally, I do like uptodown, but this is just a matter of taste. As long as you find your target APK, you are good to go.

BONUS
These website usually allow you to download different version of the same APP. This might be interesting as the older versions might be less protected and still in use, so it can be easier for you to intercept the traffic from them.

Rooted Mobile Device

Just pick one, Root Real Device or an emulator. As long as you can I do recommend to root an emulator as you get rid of this extra physical device dependency.

Root Real Device

To root a real device you will most probably need magisk and there can be some custom instructions depending on your vendor. For instance, in this case below you can see a video about how to root S22 android smartphone. I recommend to search on google "How to root <your phone model>". Example: how to root a s22 smartphone

If you want to root a Android Virtual Emulator, I do recommend to use rootAVD project.
There are many good tutorials already about how to do that.

For those who do recommend to run the emulator in a docker container, I do recommend docker-android.

BONUS
Some mobile applications do detect if they are running in a rooted phone and they stop working. If that's the case you need to apply advance Reverse Engineering techniques.

HTTP Traffic Interceptor

There are multiple tools that you can use to intercept the traffic between the mobile app and the server. Some of the most popular ones are:

Personally, we do recommend HTTP Toolkit as it is super easy to set up and you can quickly use it to intercept the traffic of any mobile apps for both real devices and emulators. The emulators can be running locally or in the cloud (such as Genymotion. It also can automatically inject Frida to intercept only the traffic that comes from your target app, ignoring everything else (yes, that's cool!).

Hands-on Tutorial

In this case we are going to intercept the API that the Booking APK (version 53.7.0.1) uses to communicate with the servers and requests information like the price of the hotels (among other stuff).

Download the APK

In this tutorial we are intercepting some requests from Booking Android APK using HTTP Toolkit. You can download the booking APK (version 53.7.0.1) from this link.

Setup your Android device

In this tutorial we are using an emulator, instead of a real device as it has some advantages when intercepting traffic (keep reading). In particular, we do recommend to follow this tutorial to install the Android Studio. Once installed, you can configure the Android Virtual Device (AVD) following this tutorial.

Once you have installed the emulator, you can run it following these instructions. Now you should be able to see an image similar to this:

Be aware though that using an emulator comes with some potential drawbacks.

  1. Some APKs cannot be installed in some emulators straight away because they need to be installed in an ARM architecture. For instance, this problem exists with Genymotion emulators when you run those on your local machine. Other people might have faced this problem before you, as you can see here and you can fix it installing the ARM Translation.


  2. Even though the point above solves the problem, we do recommend to use other emulators which do not have this problem, like Android Emulator.

  3. Some APK do detect if the app is running in an emulator and they can perform differently on purpose. This can even affect traffic. As a result, if you intercept the traffic of the APK with an emulator can be different from the intercepted with a real device. Maybe some requests are not sent if running in the emulator, or maybe different tokens used in the headers can also change.

Install the APK

This can be as easy as dragging and dropping the APK file into the emulator.

Or, if you prefer to do it in a programmatic way, you can use the Android Debug Bridge (ADB) to install the APK. You can only do it that way if you know how to use a terminal.

First you need to double check that you have access to the emulator via adb. You can check that by checking the devices that adb can see by running the following command:

If the list is empty, means that adb is not detecting the emulator, and something went wrong with the installation.

In case you can see the emulator in the list, you can install the APK by running the following command:

Install the HTTP Traffic Interceptor

In this case, we will use HTTP Toolkit because of the reasons explained above. You can easily download it from their website, it works for Linux, Mac and Windows. Install it in your local computer.

Configure the emulator to use the HTTP Traffic Interceptor

A long story short, nothing to do here if you are using HTTP Toolkit and an emulator.

BONUS
That's one of the best advantages of using HTTP Toolkit is that it can automatically configure the emulator to use the proxy server. It automatically installs the certificates in the emulator so you can start intercepting traffic straightaway. In case you would use a real device, this point would be a bit more complicated, as you would need to root the phone and also install the certificates manually.

Connect the emulator to the HTTP Traffic Interceptor

Open the HTTP Toolkit and you will see a screen like this (in this case we show HTTP Toolkit next to emulator):

Click the Android Device via ADB button marked in the image above, to start intercepting the traffic. You will not be able to see these options if there is no emulator running in your local computer.

HTTP Toolkit will start intercepting the traffic of the entire emulator.

This can easily become problematic, as a lot of the HTTP traffic you might end up intercepting is not related at all with your target application. HTTP Toolkit offers a solution for that, and this is the Android App via Frida.

This option installs Frida in your emulator and HTTP Toolkit is using it to filter out all the traffic that does not come from your target application. Even though this feature is really cool and we encourage to use it, there are many APK that have mechanisms to detect if Frida is running in the Android device. In the same way they can also detect if the APK is running in a rooted device or not.

Start intercepting the traffic

Now you should be able to see the traffic in the HTTP Toolkit. You can see the requests and responses in the right panel. All what I've done here is:

  1. Open the booking.com Mobile APP

  2. Search for hotels in Barcelona during the 4th and 5th of March (point 1 in the image).

As you can see in the image above, point 2 shows there have been 40 HTTP requests that have been properly intercepted by HTTP Toolkit.

Generate the HAR file

It's time to click that button in the following image (point 1).

This will generate a HAR File with all the requests and responses that you have intercepted. You can save this file and use it later to understand how the app communicates with the server. You can easily import this file in HTTP Toolkit, or even the Chrome or Firefox DevTools network.

Analyze HAR file

You can also analyze the HTTP traffic straight away in the HTTP Toolkit.
In this guide you have all the filters you can use in the tool.

For example with the traffic above, we can search for th HTTP requests that contains the content from the pricing $148, $1,394 etc.

You can easily do that in HTTP Toolkit:

  1. In the HTTP Toolkit search bar, just add the following filter: contains(1,394)

In point 1 you add the filter contains(1,394), in point 2 you can see the content that you are looking for and in point 3 you can also see interested information provided by the endpoint, like the reviewCount. There is also the checkinDate,reviewScoreWordetc. And much more!

As a result, you can isolate the request sent from the mobile application to the server, you can see it below:

curl --request GET \
  --url 'https://mobile-apps.booking.cn/json/mobile.saba?search_type=city&dest_ids=-372490&dest_name=Barcelona&location_source=autocomplete&autoextend=1&add_district_filter=1&encoded_autocomplete_meta=GiA2N2M1YzkwMTAxN2QwMWJlOWJjNmQ4NjAzNzFhODVlMCAAKAEyAmVuOgliYXJjZWxvbmFAAEoAUAA%3D&arrival_date=2025-03-04&departure_date=2025-03-05&flex_window=0&guest_qty=2&room_qty=1&children_qty=0&children_age=&categories_filter=&order_by=auto&source=startup_page&reason=new_search&outcome=search_results&detail_level=1&rows=auto&include_url=1&include_count=1&low_av_alternatives=0&show_is_personalized_result=0&show_reinforcement_text=v2&urgency=1&add_ctrip_info=1&include_taxes=1&show_extra_charges=1&include_translations=2&show_if_no_cc_allowed=2&show_deals=flash%2Clastm%2Csmart%2Cgenius&show_flash_saving=1&show_last_minute_saving=1&show_last_reservation_text=1&show_if_class_is_estimated=1&request_less_fields=2&show_business_badge=1&show_if_can_checkin_today=1&show_no_prepayment=1&add_autoextend_filter_count=1&include_extended_count=1&unblock_cuba_exp=1&show_filtered_facilities=1&include_hd_rackrate=1&ga_enhanced_ecommerce_tracking=1&include_auto_late_deals=1&add_breakfast_included_filter=1&include_accommodation_type_name=1&show_if_china_pos=1&use_layouts=1&is_include_review_recommendation=1&include_composite_breakdown=1&include_badges_in_price_breakdown=1&blackout_old_breakdown=1&include_genius_badge=1&add_cma_banner=1&timezone=Europe%2FBerlin&currency_code=USD&include_rtb=1&show_location_score=1&show_refundable=1&price_filter_currencycode=USD&check_price_is_final=1&show_ufi_top_rated_flag=1&include_city_in_trans=1&include_persuasion_fix=2&add_inclusion=1&saba=1&saba_version=123&min_saba_version=31&languagecode=en-us&24_hour_format=0&temperature_unit=f&distance_unit=imperial&screen_density=4&saba_response=0&time_zone=Europe%2FBerlin&sls=2&price_buckets=100&show_if_city_center=1&show_facilities_review_score=1&show_continent_id=1&show_if_hotel_is_rare_find=1&include_user_nr_bookings_for_searched_dates=1&generate_map_bounding_box=1&include_in_city_trans=1&fix_dotd=1&units=imperial&include_bh_quality_classification=1&include_cpc_non_trader_copy=1&exposure_api_version=2.55&saba_payload_root=1&route_path=appsearch&mobile_saba_variant=1&notification_auth_status=1&user_os=15&user_version=53.7.0.1-android&device_id=00fe3297-ff78-461b-a7db-96a5f2d62d29&network_type=wifi&display=normal_xxxhdpi&affiliate_id=337862' \
  --compressed \
  --header 'accept-encoding: gzip' \
  --header 'authorization: Basic dGhlc2FpbnRzYnY6ZGdDVnlhcXZCeGdN' \
  --header 'connection: Keep-Alive' \
  --header 'cookie: aws-waf-token=5e447f42-b460-40a8-8984-bb858dad5f17:HQoAe5dqdJwDBAAA:zc1L/fnID+CzbwZAykBjl8Jq494UsgGoWK2TJvcUbgMhCyeyJ1TYgL9prlHpcLWHYpygnUtG6qx+fDY8p3+z1TqgECoEBBRU1QfsUOIuvxzFfwk=' \
  --header 'host: mobile-apps.booking.cn' \
  --header 'user-agent: Booking.App/53.7.0.1 Android/15; Type: mobile; AppStore: vivo; Brand: google; Model: sdk_gphone64_x86_64;' \
  --header 'x-access-token: CAESmAUSxgQIAhDu1vCpARq7BO74RX8X8Z1-h74Z_FqX7lCRyYt0PWoHInZ5sG_zsAhAkfVvdq-J20Aaa1-5KHAcxl2qAOaSPiFuLlpcy3AM5REd73Rr7eQtAQe1_0Yj2Cp10P6v1CZToRlo1QdkyzYppGOINKE0lLRi1qH08G4LggonVFF-ubzBSDJeAAXgJX0SXrYTxQmZLc-_qFMykcPv3scTRSDwDa5TEFb8SN8WNyZwSOYa9oZHsHvZOGg8VV3B5VPem01EzCeSn7IOh6hSGW6GJhL42UT7FVy1ms1edZIaYlbpmq52uP92KBC8916-2WEzV9jdy10HOYZI2tEenMlSF9m_qY1ALgyoEDMJIun3INSU4YYjSlfpUIN7BZddZD7M8SVBhoDQq2LTNIs0RykDNRQmNwWWncAVaqt9dOkZJXcIyMYQdYGuzAS6ZRLfei8_wARC60Z8JppV5jKjcAykG9SE5bEPhJKyHEJiKTMRySeZTPlTWjwcljhuY9Hni1aDQpFQMPqXOyxroBVsKzXDDtQzUwRPeTlu1BPwILfde2p3U1Jy5JsXsbznKk7rXls9FpMKpr1SN5Grffmxp7XHkhH8KxJ5bB__R14Z9W_Y6rp4-ew7aVpikR_ryZrKF99vLYauzr4bNZ63tTKRSNzxeNtXAjnKxSqY1ALsDfccXQSNa4EbpxbVHfsnhcGCL0a1KPK3qXWeRI2V4Tcq3W17BvFEL6S7Z3VHlXoJf33uDCkck2Z3O3ig6V2gyTe5iReJZN0-IrphK9IaQAPY4Cn0TmFynq_Qh9EnzJHSouIJSHeDpS0oa0ib6T9jH5Mv4V0GPR7pVwZmiSced1UjPYS-QHnZXZhPGHh6pAQgAioBBTDt6_aKAjgB' \
  --header 'x-booking-api-version: 1' \
  --header 'x-booking-iam-access-token: CAESmAUSxgQIAhDu1vCpARq7BO74RX8X8Z1-h74Z_FqX7lCRyYt0PWoHInZ5sG_zsAhAkfVvdq-J20Aaa1-5KHAcxl2qAOaSPiFuLlpcy3AM5REd73Rr7eQtAQe1_0Yj2Cp10P6v1CZToRlo1QdkyzYppGOINKE0lLRi1qH08G4LggonVFF-ubzBSDJeAAXgJX0SXrYTxQmZLc-_qFMykcPv3scTRSDwDa5TEFb8SN8WNyZwSOYa9oZHsHvZOGg8VV3B5VPem01EzCeSn7IOh6hSGW6GJhL42UT7FVy1ms1edZIaYlbpmq52uP92KBC8916-2WEzV9jdy10HOYZI2tEenMlSF9m_qY1ALgyoEDMJIun3INSU4YYjSlfpUIN7BZddZD7M8SVBhoDQq2LTNIs0RykDNRQmNwWWncAVaqt9dOkZJXcIyMYQdYGuzAS6ZRLfei8_wARC60Z8JppV5jKjcAykG9SE5bEPhJKyHEJiKTMRySeZTPlTWjwcljhuY9Hni1aDQpFQMPqXOyxroBVsKzXDDtQzUwRPeTlu1BPwILfde2p3U1Jy5JsXsbznKk7rXls9FpMKpr1SN5Grffmxp7XHkhH8KxJ5bB__R14Z9W_Y6rp4-ew7aVpikR_ryZrKF99vLYauzr4bNZ63tTKRSNzxeNtXAjnKxSqY1ALsDfccXQSNa4EbpxbVHfsnhcGCL0a1KPK3qXWeRI2V4Tcq3W17BvFEL6S7Z3VHlXoJf33uDCkck2Z3O3ig6V2gyTe5iReJZN0-IrphK9IaQAPY4Cn0TmFynq_Qh9EnzJHSouIJSHeDpS0oa0ib6T9jH5Mv4V0GPR7pVwZmiSced1UjPYS-QHnZXZhPGHh6pAQgAioBBTDt6_aKAjgB' \
  --header 'x-library: okhttp+network-api' \
  --cookie 'aws-waf-token=5e447f42-b460-40a8-8984-bb858dad5f17:HQoAe5dqdJwDBAAA:zc1L/fnID+CzbwZAykBjl8Jq494UsgGoWK2TJvcUbgMhCyeyJ1TYgL9prlHpcLWHYpygnUtG6qx+fDY8p3+z1TqgECoEBBRU1QfsUOIuvxzFfwk='

Usually the filtering offered by HTTP Toolkit is enough, but if you find yourself intercepting a lot of HTTP traffic for scraping purposes, you might be wondering if there are some extra tools to could make your life easier.

Which tools though?

Check the request above and think about how you would send the request over and over again to get data from all the booking hotels in any city in the world.

The HTTP request above has many tokens in both headers and cookies.

Here an example:

How are all those tokens created? Are those tokens created in the response of other requests or they are internally created in the APK? Do they actually really matter to call the API.

in taking a look at what we are building at Blat.

BONUS
If you are struggling trying to intercept the HTTP traffic from your target app, let us know. We are planning to keep write about advance reverse engineering techniques to scrape mobile app data.

Conclusions

As you can see, there are multiple tools that exist out there to generate the HAR File and the process is pretty straightforward as long as you do not need to start applying Reverse Engineering strategies.

Once intercepted, the information in the HAR file is really powerful, as it has a lot of the information required to understand how the app communicates with the server. The problem though is that sometimes it can be too noisy as there is a lot of information there, and you can easily feel overwhelmed. Sometimes you might not even know where certain header, body or cookies are created. The entire message could be even encrypted!

Read Next