<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="FeedCreator 1.8" -->
<?xml-stylesheet href="http://mynotes.babies.vn/lib/exe/css.php?s=feed" type="text/css"?>
<rdf:RDF
    xmlns="http://purl.org/rss/1.0/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel rdf:about="http://mynotes.babies.vn/feed.php">
        <title>my notes - crawler</title>
        <description></description>
        <link>http://mynotes.babies.vn/</link>
        <image rdf:resource="http://mynotes.babies.vn/lib/exe/fetch.php?media=wiki:dokuwiki.svg" />
       <dc:date>2026-05-17T09:24:48+00:00</dc:date>
        <items>
            <rdf:Seq>
                <rdf:li rdf:resource="http://mynotes.babies.vn/doku.php?id=crawler:scrapy&amp;rev=1667060147&amp;do=diff"/>
                <rdf:li rdf:resource="http://mynotes.babies.vn/doku.php?id=crawler:scrapyarchitecturecode&amp;rev=1667060147&amp;do=diff"/>
                <rdf:li rdf:resource="http://mynotes.babies.vn/doku.php?id=crawler:scrapyexamples&amp;rev=1667060147&amp;do=diff"/>
            </rdf:Seq>
        </items>
    </channel>
    <image rdf:about="http://mynotes.babies.vn/lib/exe/fetch.php?media=wiki:dokuwiki.svg">
        <title>my notes</title>
        <link>http://mynotes.babies.vn/</link>
        <url>http://mynotes.babies.vn/lib/exe/fetch.php?media=wiki:dokuwiki.svg</url>
    </image>
    <item rdf:about="http://mynotes.babies.vn/doku.php?id=crawler:scrapy&amp;rev=1667060147&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2022-10-29T16:15:47+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>scrapy</title>
        <link>http://mynotes.babies.vn/doku.php?id=crawler:scrapy&amp;rev=1667060147&amp;do=diff</link>
        <description>Scrapy

refer:

	*  &lt;http://doc.scrapy.org/en/latest/&gt;

code examples:

	*  &lt;https://code.google.com/p/scrapy-tutorial/&gt;
git clone https://code.google.com/p/scrapy-tutorial/

	*  &lt;https://code.google.com/p/scrapy-spider/&gt;
svn checkout http://scrapy-spider.googlecode.com/svn/trunk/ scrapy-spider-read-only

	*  &lt;https://code.google.com/p/kateglo-crawler/&gt;
svn checkout http://kateglo-crawler.googlecode.com/svn/trunk/ kateglo-crawler-read-only</description>
    </item>
    <item rdf:about="http://mynotes.babies.vn/doku.php?id=crawler:scrapyarchitecturecode&amp;rev=1667060147&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2022-10-29T16:15:47+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>scrapyarchitecturecode</title>
        <link>http://mynotes.babies.vn/doku.php?id=crawler:scrapyarchitecturecode&amp;rev=1667060147&amp;do=diff</link>
        <description>Scrapy Architecture Code

Scrapy commands

Overview about scrapy commands

	*  Scrapy command format

scrapy --help
Scrapy 1.0.3 - project: templatedownload

Usage:
  scrapy &lt;command&gt; [options] [args]

Available commands:
  bench         Run quick benchmark test
  check         Check spider contracts
  commands
  crawl         Run a spider
  edit          Edit spider
  fetch         Fetch a URL using the Scrapy downloader
  genspider     Generate new spider using pre-defined templates
  list    …</description>
    </item>
    <item rdf:about="http://mynotes.babies.vn/doku.php?id=crawler:scrapyexamples&amp;rev=1667060147&amp;do=diff">
        <dc:format>text/html</dc:format>
        <dc:date>2022-10-29T16:15:47+00:00</dc:date>
        <dc:creator>Anonymous (anonymous@undisclosed.example.com)</dc:creator>
        <title>scrapyexamples</title>
        <link>http://mynotes.babies.vn/doku.php?id=crawler:scrapyexamples&amp;rev=1667060147&amp;do=diff</link>
        <description>Scrapy Examples

Download entire site with scrapy


from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor as sle

class BabySpider(CrawlSpider):
    name = &quot;baby&quot;
    allowed_domains = [&quot;babies.vn&quot;]
    start_urls = [
        &quot;http://shop.babies.vn/index.php&quot;
    ]
    rules = [
        Rule(sle(allow=(&quot;/*.html&quot;)), callback=&#039;parse_template&#039;),
    ]

    def parse_template(self, response)…</description>
    </item>
</rdf:RDF>
