You are here: Home / LBN / Up2date / Plone and Zope / BastionLinux 13 / transmogrify.webcrawler-1.2.1-2.lbn13.noarch

transmogrify.webcrawler-1.2.1-2.lbn13.noarch

Package Attributes
RPM  transmogrify.webcrawler-1.2.1-2.lbn13.noarch.rpm Architecture  noarch Size  1219768 Created  2017/08/04 11:07:55 UTC
Package Specification
Summary Crawling and feeding html content into a transmogrifier pipeline
Group Application/Internet
License ZPL
Home Page http://pypi.python.org/packages/source/t/transmogrify.webcrawler/transmogrify.webcrawler-1.2.1.zip
Description

A source blueprint for crawling content from a site or local html files.

Webcrawler imports HTML either from a live website, for a folder on disk, or a folder on disk with html which used to come from a live website and may still have absolute links refering to that website.

To crawl a live website supply the crawler with a base http url to start crawling with. This url must be the url which all the other urls you want from the site start with.

Requires
rpmlib(PayloadFilesHavePrefix)  
rpmlib(FileDigests)  
/bin/sh  
rpmlib(CompressedFileNames)  
rpmlib(PartialHardlinkSets)  
rpmlib(PayloadIsXz)  
Provides
transmogrify.webcrawler
Obsoletes
transmogrify.webcrawler-egginfo

Document Actions