Boost Your Web Scraping Game with Axios and Proxies
Axios is a JavaScript champ when it comes to fetching data from the web. But sometimes, websites aren't so welcoming—they block your scraper. No worries! You can dodge those blockers using proxies.
In this guide, we'll get into how you can do this with some cool, hands-on examples. From free to premium proxies, we've got it all covered. Let's dive in!
Setting Up Your Project
Step 1: Check If NodeJS and npm Are Installed
First off, you need NodeJS and npm (Node Package Manager) on your computer. Open your terminal (Command Prompt on Windows, Terminal on Mac) and type the following:
node -v
npm -v
If you see version numbers for both, you're good to go! If not, you'll need to install NodeJS and npm first.
Step 2: Create a New Folder
Next, let's make a special folder where all the magic will happen. Type in:
mkdir myAxiosScraper
This makes a new folder named "myAxiosScraper."
Step 3: Enter the Folder
You need to "go into" this folder in the terminal. Simple, just type:
cd myAxiosScraper
Step 4: Initialize Your NodeJS Project
Now, let's set up a new NodeJS project within this folder. Type this:
npm init -y
This will create a file called package.json in your folder. Think of it as the recipe book for your project.
Step 5: Install Axios
Axios is the tool that'll help you get data from websites. To put it in your toolkit, type:
npm install axios
And there you have it! You've successfully set up a NodeJS project and installed Axios. You're all ready to start scraping websites like a pro!
Simple Proxy Setup with Axios
Let's kick things off with a basic example. We'll use httpbin.org as our target website and a fictional proxy IP.
IP: '203.42.142.32', Port: '8080'
const axios = require('axios');
axios.get('https://httpbin.org/ip', {
proxy: {
protocol: 'http',
host: '203.42.142.32',
port: 8080
}
})
.then(res => console.log(res.data))
.catch(err => console.log('Oops!', err));
Run it. If you see the IP 203.42.142.32 in the output, you've just sent a request via a proxy. High five!
Handling JSON
Axios is pretty smart; it can handle JSON data natively. But if you're paranoid about running into non-JSON responses, here's how to deal with them:
axios.get('https://httpbin.org/ip', {
proxy: rotateProxy()
})
.then(res => {
let data;
try {
data = JSON.parse(res.data);
} catch (e) {
data = res.data;
}
console.log(data);
})
.catch(err => console.log('Uh-oh!', err));
Premium Proxies for Smooth Sailing
Free proxies can be dangerous and therefore we don't recommend using those. For something more reliable, consider paid options. Here's how to set it up (replace the fields with the actual proxy data):
axios.get('https://httpbin.org/ip', {
proxy: {
protocol: 'http',
host: 'premium.proxy.com',
port: 8080,
auth: {
username: 'yourUsername',
password: 'yourPassword'
}
}
})
.then(res => console.log(res.data))
.catch(err => console.log('Yikes!', err));
Going Auto-Pilot with Environment Variables
You can automate proxy settings by storing them as environment variables. Do this in your terminal:
export HTTP_PROXY=http://203.42.142.32:8080
Then, your Axios request becomes:
axios.get('https://httpbin.org/ip')
.then(res => console.log(res.data))
.catch(err => console.log('Oops!', err));
Rotate Proxies Like a DJ Spins Tracks
If you want to level up your proxy rotation, you might want to add features like priority-based selection and error-handling.
Priority-Based Proxy Selection
Keep track of each proxy's performance and choose the best one for your next request.
let performanceMetrics = {
'203.42.142.32:8080': { successCount: 10, errorCount: 2 },
'150.24.126.73:8080': { successCount: 5, errorCount: 5 },
};
const getBestProxy = () => {
// Some logic to pick the best proxy based on performanceMetrics
};
Mixing Proxy Types
You can use different kinds of proxies for better success rates.
const proxies = [
{ ip: '203.42.142.32', port: '8080', type: 'residential' },
{ ip: '150.24.126.73', port: '8080', type: 'datacenter' },
];
Parallel Requests
Use Promise.all to send multiple requests at once, each with a different proxy.
const requests = [axios.get(url1, { proxy: getNextProxy() }), axios.get(url2, { proxy: getNextProxy() })];
Promise.all(requests)
.then(responses => {
// Handle responses
})
.catch(err => {
// Handle error
});
Error Handling
Add retries and timeouts to your axios requests.
axios.get('https://httpbin.org/ip', { proxy: getNextProxy(), timeout: 5000 })
.then(res => console.log(res.data))
.catch(err => {
if (err.code === 'ECONNABORTED') {
// Retry logic
}
console.log('Error!', err);
});
There you go. With these snippets, you're not just rotating proxies; you're optimizing your entire scraping process.
Acing with Premium Proxy Services
Consider going for high-quality, optimized residential proxies. These proxies allow you to easily bypass website blocks and run your Axios project smoothly! Here's an example:
axios.get('https://restrictedwebsite.com', {
proxy: {
protocol: 'http',
host: 'residential.proxy.com',
port: 8080,
auth: {
username: 'yourProviderAPIKey',
password: 'topSecret'
}
}
})
.then(res => console.log(res.data))
.catch(err => console.log('Ouch!', err));
Wrapping It Up
Using Axios with a well-picked proxy can make your web scraping unstoppable. We've gone from setting up a basic proxy to smoothly rotating between multiple proxies. And if you want reliability and efficiency, premium proxies are worth the investment. Now, go out there and scrape the web like a pro!