000 11268nam a22005173i 4500
001 EBC3017639
003 MiAaPQ
005 20240729124046.0
006 m o d |
007 cr cnu||||||||
008 240724s2012 xx o ||||0 eng d
020 _a9781593274320
_q(electronic bk.)
020 _z9781593273972
035 _a(MiAaPQ)EBC3017639
035 _a(Au-PeEL)EBL3017639
035 _a(CaPaEBR)ebr10574793
035 _a(OCoLC)795714370
040 _aMiAaPQ
_beng
_erda
_epn
_cMiAaPQ
_dMiAaPQ
050 4 _aTK5105.884 -- .S37 2012eb
082 0 _a025.04
100 1 _aSchrenk, Michael.
245 1 0 _aWebbots, Spiders, and Screen Scrapers :
_bA Guide to Developing Internet Agents with PHP/CURL.
250 _a2nd ed.
264 1 _aSan Francisco :
_bNo Starch Press, Incorporated,
_c2012.
264 4 _c©2012.
300 _a1 online resource (394 pages)
336 _atext
_btxt
_2rdacontent
337 _acomputer
_bc
_2rdamedia
338 _aonline resource
_bcr
_2rdacarrier
505 0 _aIntro -- Brief Contents -- Contents In Detail -- Introduction -- Old-School Client-Server Technology -- The Problem with Browsers -- What to Expect from This Book -- Learn from My Mistakes -- Master Webbot Techniques -- Leverage Existing Scripts -- About the Website -- About the Code -- Requirements -- Hardware -- Software -- Internet Access -- A Disclaimer (This Is Important) -- PART I: Fundamental Concepts and Techniques -- 1: What's in It for You? -- Uncovering the Internet's True Potential -- What's in It for Developers? -- Webbot Developers Are in Demand -- Webbots Are Fun to Write -- Webbots Facilitate "Constructive Hacking" -- What's in It for Business Leaders? -- Customize the Internet for Your Business -- Capitalize on the Public's Inexperience with Webbots -- Accomplish a Lot with a Small Investment -- Final Thoughts -- 2: Ideas for Webbot Projects -- Inspiration from Browser Limitations -- Webbots That Aggregate and Filter Information for Relevance -- Webbots That Interpret What They Find Online -- Webbots That Act on Your Behalf -- Figure 2-3: An example pokerbot -- A Few Crazy Ideas to Get You Started -- Help Out a Busy Executive -- Save Money by Automating Tasks -- Protect Intellectual Property -- Monitor Opportunities -- Verify Access Rights on a Website -- Create an Online Clipping Service -- Plot Unauthorized Wi-Fi Networks -- Track Web Technologies -- Allow Incompatible Systems to Communicate -- Final Thoughts -- 3: Downloading Web Pages -- Think About Files, Not Web Pages -- Downloading Files with PHP's Built-in Functions -- Downloading Files with fopen() and fgets() -- Downloading Files with file() -- Introducing PHP/CURL -- Multiple Transfer Protocols -- Form Submission -- Basic Authentication -- Cookies -- Redirection -- Agent Name Spoofing -- Referer Management -- Socket Management -- Installing PHP/CURL -- LIB_http.
505 8 _aFamiliarizing Yourself with the Default Values -- Using LIB_http -- Learning More About HTTP Headers -- Examining LIB_http's Source Code -- Final Thoughts -- 4: Basic Parsing Techniques -- Content Is Mixed with Markup -- Parsing Poorly Written HTML -- Standard Parse Routines -- Using LIB_parse -- Splitting a String at a Delimiter: split_string() -- Parsing Text Between Delimiters: return_between() -- Parsing a Data Set into an Array: parse_array() -- Parsing Attribute Values: get_attribute() -- Removing Unwanted Text: remove() -- Useful PHP Functions -- Detecting Whether a String Is Within Another String -- Replacing a Portion of a String with Another String -- Parsing Unformatted Text -- Measuring the Similarity of Strings -- Final Thoughts -- Don't Trust a Poorly Coded Web Page -- Parse in Small Steps -- Don't Render Parsed Te xt While Debugging -- Use Regular Expressions Sparingly -- 5: Advanced Parsing with Regular Expressions -- Pattern Matching, the Key to Regular Expressions -- PHP Regular Expression Types -- PHP Regular Expressions Functions -- Resemblance to PHP Built-In Functions -- Learning Patterns Through Examples -- Parsing Numbers -- Detecting a Series of Characters -- Matching Alpha Characters -- Matching on Wildcards -- Specifying Alternate Matches -- Regular Expressions Groupings and Ranges -- Regular Expressions of Particular Interest to Webbot Developers -- Parsing Phone Numbers -- Where to Go from Here -- When Regular Expressions Are (or Aren't) the Right Parsing Tool -- Strengths of Regular Expressions -- Disadvantages of Pattern Matching While Parsing Web Pages -- Which Are Faster: Regular Expressions or PHP's Built-In Functions? -- Final Thoughts -- 6: Automating Form Submission -- Reverse Engineering Form Interfaces -- Form Handlers, Data Fields, Methods, and Event Triggers -- Form Handlers -- Data Fields -- Methods.
505 8 _aMultipart Encoding -- Event Triggers -- Unpredictable Forms -- JavaScript Can Change a Form Just Before Submission -- Form HTML Is Often Unreadable by Humans -- Cookies Aren't Included in the Form, but Can Affect Operation -- Analyzing a Form -- Final Thoughts -- Don't Blow Your Cover -- Correctly Emulate Browsers -- Avoid Form Errors -- 7: Managing Large Amounts of Data -- Organizing Data -- Naming Conventions -- Storing Data in Structured Files -- Storing Text in a Database -- Storing Images in a Database -- Database or File? -- Making Data Smaller -- Storing References to Image Files -- Compressing Data -- Removing Formatting -- Thumbnailing Images -- Final Thoughts -- PART II: Projects -- 8: Price-Monitoring Webbots -- The Target -- Designing the Parsing Script -- Initialization and Downloading the Target -- Further Exploration -- 9: Image-Capturing Webbots -- Example Image-Capturing Webbot -- Creating the Image-Capturing Webbot -- Binary-Safe Download Routine -- Directory Structure -- The Main Script -- Further Exploration -- Final Thoughts -- 10: Link-Verification Webbots -- Creating the Link-Verification Webbot -- Initializing the Webbot and Downloading the Target -- Setting the Page Base -- Parsing the Links -- Running a Verification Loop -- Generating Fully Resolved URLs -- Downloading the Linked Page -- Displaying the Page Status -- Running the Webbot -- LIB_http_codes -- LIB_resolve_addresses -- Further Exploration -- 11: Search-Ranking Webbots -- Description of a Search Result Page -- What the Search-Ranking Webbot Does -- Running the Search-Ranking Webbot -- How the Search-Ranking Webbot Works -- The Search-Ranking Webbot Script -- Initializing Variables -- Starting the Loop -- Fetching the Search Results -- Parsing the Search Results -- Final Thoughts -- Be Kind to Your Sources -- Search Sites May Treat Webbots Differently Than Browsers.
505 8 _aSpidering Search Engines Is a Bad Idea -- Familiarize Yourself with the Google API -- Further Exploration -- 12: Aggregation Webbots -- Choosing Data Sources for Webbots -- Example Aggregation Webbot -- Familiarizing Yourself with RSS Feeds -- Writing the Aggregation Webbot -- Adding Filtering to Your Aggregation Webbot -- Further Exploration -- 13: FTP Webbots -- Example FTP Webbot -- PHP and FTP -- Further Exploration -- 14: Webbots That Read Email -- The POP3 Protocol -- Logging into a POP3 Mail Server -- Reading Mail from a POP3 Mail Server -- Executing POP3 Commands with a Webbot -- Further Exploration -- Email-Controlled Webbots -- Email Interfaces -- 15: Webbots That Send Email -- Email, Webbots, and Spam -- Sending Mail with SMTP and PHP -- Configuring PHP to Send Mail -- Sending an Email with mail() -- Writing a Webbot That Sends Email Notifications -- Keeping Legitimate Mail out of Spam Filters -- Sending HTML-Formatted Email -- Further Exploration -- Using Returned Emails to Prune Access Lists -- Using Email as Notification That Your Webbot Ran -- Leveraging Wireless Technologies -- Writing Webbots That Send Text Messages -- 16: Converting a Website into a Function -- Writing a Function Interface -- Defining the Interface -- Analyzing the Target Web Page -- Using describe_zipcode() -- Final Thoughts -- Distributing Resources -- Using Standard Interfaces -- Designing a Custom Lightweight "Web Service" -- PART III: Advanced Technical Considerations -- 17: Spiders -- How Spiders Work -- Example Spider -- LIB_simple_spider -- harvest_links() -- archive_links() -- get_domain() -- exclude_link() -- Experimenting with the Spider -- Adding the Payload -- Further Exploration -- Save Links in a Database -- Separate the Harvest and Payload -- Distribute Tasks Across Multiple Computers -- Regulate Page Requests -- 18: Procurement Webbots and Snipers.
505 8 _aProcurement Webbot Theory -- Get Purchase Criteria -- Authenticate Buyer -- Verify Item -- Evaluate Purchase Triggers -- Make Purchase -- Evaluate Results -- Sniper Theory -- Get Purchase Criteria -- Authenticate Buyer -- Verify Item -- Synchronize Clocks -- Time to Bid? -- Submit Bid -- Evaluate Results -- Testing Your Own Webbots and Snipers -- Further Exploration -- Final Thoughts -- 19: Webbots and Cryptography -- Designing Webbots That Use Encryption -- SSL and PHP Built-in Functions -- Encryption and PHP/CURL -- A Quick Overview of Web Encryption -- Final Thoughts -- 20: Authentication -- What Is Authentication? -- Types of Online Authentication -- Strengthening Authentication by Combining Techniques -- Authentication and Webbots -- Example Scripts and Practice Pages -- Basic Authentication -- Session Authentication -- Authentication with Cookie Sessions -- Authentication with Query Sessions -- Final Thoughts -- 21: Advanced Cookie Management -- How Cookies Work -- PHP/CURL and Cookies -- How Cookies Challenge Webbot Design -- Purging Temporary Cookies -- Managing Multiple Users' Cookies -- Further Exploration -- 22: Scheduling Webbots and Spiders -- Preparing Your Webbots to Run as Scheduled Tasks -- The Windows XP Task Scheduler -- Scheduling a Webbot to Run Daily -- Complex Schedules -- The Windows 7 Task Scheduler -- Non-calendar-based Triggers -- Final Thoughts -- Determine the Webbot's Best Periodicity -- Avoid Single Points of Failure -- Add Variety to Your Schedule -- 23: Scraping Difficult Websites with Browser Macros -- Barriers to Effective Web Scraping -- AJAX -- Bizarre JavaScript and Cookie Behavior -- Flash -- Overcoming Webscraping Barriers with Browser Macros -- What Is a Browser Macro? -- The Ultimate Browser-Like Webbot -- Installing and Using iMacros -- Creating Your First Macro -- Final Thoughts.
505 8 _aAre Macros Really Necessary?.
588 _aDescription based on publisher supplied metadata and other sources.
590 _aElectronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
650 0 _aWeb search engines.
650 0 _aInternet programming.
650 0 _aInternet searching.
650 0 _aIntelligent agents (Computer software).
655 4 _aElectronic books.
776 0 8 _iPrint version:
_aSchrenk, Michael
_tWebbots, Spiders, and Screen Scrapers
_dSan Francisco : No Starch Press, Incorporated,c2012
_z9781593273972
797 2 _aProQuest (Firm)
856 4 0 _uhttps://ebookcentral.proquest.com/lib/orpp/detail.action?docID=3017639
_zClick to View
999 _c59208
_d59208