Archive for April, 2009

Noobies Guide on How to Scrape: Part 2 – URLs, URL Variables, and using Live HTTP Headers

Wednesday, April 8, 2009 21:11 1 Comment

Understanding the fundamentals of how sites communicate with themselves, and how we communicate with them, is crucial in being able to reverse engineering a site for our scraper.   Luckily it’s pretty easy for the most part.
Anatomy of a URL

The protocol your using.
The website your trying to get to.  Although www is synonymous with the base [...]

This was posted under category: Automation, Blackhat, Noobie Scraping Guide Tags: , , , ,

Noobies Guide on How to Scrape: Part 1 – Intro & Tools

Monday, April 6, 2009 0:03 2 Comments

Welcome to the Noobies Guide to Scraping: Part 1.  In this installment we are only going to focus on a few very basic things that we are going to need to get started, and no code will be wrote.
What is scraping?  Scraping is the process of getting / gathering data from some web source, whether [...]

This was posted under category: Automation, Blackhat, Noobie Scraping Guide Tags: