Archive for the ‘Blackhat’ Category

Noobies Guide on How to Scrape: Part 4 – cURL

Monday, May 11, 2009 13:01 3 Comments

Now we get the idea of POST and GET.  We found our target, we know it’s url structure, we know where the data is, but how do we use PHP to fetch the webpages?
Luckily we have what is call cURL (from PHP.net):
PHP supports libcurl, a library created by Daniel Stenberg, that allows [...]

This was posted under category: Blackhat, Noobie Scraping Guide Tags:

Noobies Guide on How to Scrape: Part 2 – URLs, URL Variables, and using Live HTTP Headers

Wednesday, April 8, 2009 21:11 1 Comment

Understanding the fundamentals of how sites communicate with themselves, and how we communicate with them, is crucial in being able to reverse engineering a site for our scraper.   Luckily it’s pretty easy for the most part.
Anatomy of a URL

The protocol your using.
The website your trying to get to.  Although www is synonymous with the base [...]

This was posted under category: Automation, Blackhat, Noobie Scraping Guide Tags: , , , ,

Noobies Guide on How to Scrape: Part 1 – Intro & Tools

Monday, April 6, 2009 0:03 2 Comments

Welcome to the Noobies Guide to Scraping: Part 1.  In this installment we are only going to focus on a few very basic things that we are going to need to get started, and no code will be wrote.
What is scraping?  Scraping is the process of getting / gathering data from some web source, whether [...]

This was posted under category: Automation, Blackhat, Noobie Scraping Guide Tags: