摘要:开源安装通过通过使用方法作为运行请先确保当前主机已经安装和启动通过命令启动访问假设运行于端口访问以获取某个爬虫任务的日志分析详情配合实现爬虫进度可视化详见在代码中使用
GitHub 开源
my8100 / logparser
安装通过 pip:
pip install logparser
通过 git:
git clone https://github.com/my8100/logparser.git cd logparser python setup.py install使用方法 作为 service 运行
请先确保当前主机已经安装和启动 Scrapyd
通过命令 logparser 启动 LogParser
访问 http://127.0.0.1:6800/logs/stats.json (假设 Scrapyd 运行于端口 6800)
访问 http://127.0.0.1:6800/logs/projectname/spidername/jobid.json 以获取某个爬虫任务的日志分析详情
配合 ScrapydWeb 实现爬虫进度可视化详见 my8100 / scrapydweb
In [1]: from logparser import parse In [2]: log = """2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo) ...: 2018-10-23 18:29:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats: ...: {"downloader/exception_count": 3, ...: "downloader/exception_type_count/twisted.internet.error.TCPTimedOutError": 3, ...: "downloader/request_bytes": 1336, ...: "downloader/request_count": 7, ...: "downloader/request_method_count/GET": 7, ...: "downloader/response_bytes": 1669, ...: "downloader/response_count": 4, ...: "downloader/response_status_count/200": 2, ...: "downloader/response_status_count/302": 1, ...: "downloader/response_status_count/404": 1, ...: "dupefilter/filtered": 1, ...: "finish_reason": "finished", ...: "finish_time": datetime.datetime(2018, 10, 23, 10, 29, 41, 174719), ...: "httperror/response_ignored_count": 1, ...: "httperror/response_ignored_status_count/404": 1, ...: "item_scraped_count": 2, ...: "log_count/CRITICAL": 5, ...: "log_count/DEBUG": 14, ...: "log_count/ERROR": 5, ...: "log_count/INFO": 75, ...: "log_count/WARNING": 3, ...: "offsite/domains": 1, ...: "offsite/filtered": 1, ...: "request_depth_max": 1, ...: "response_received_count": 3, ...: "retry/count": 2, ...: "retry/max_reached": 1, ...: "retry/reason_count/twisted.internet.error.TCPTimedOutError": 2, ...: "scheduler/dequeued": 7, ...: "scheduler/dequeued/memory": 7, ...: "scheduler/enqueued": 7, ...: "scheduler/enqueued/memory": 7, ...: "start_time": datetime.datetime(2018, 10, 23, 10, 28, 35, 70938)} ...: 2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)""" In [3]: d = parse(log, headlines=1, taillines=1) In [4]: d Out[4]: OrderedDict([("head", "2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo)"), ("tail", "2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)"), ("first_log_time", "2018-10-23 18:28:34"), ("latest_log_time", "2018-10-23 18:29:42"), ("elapsed", "0:01:08"), ("first_log_timestamp", 1540290514), ("latest_log_timestamp", 1540290582), ("datas", []), ("pages", 3), ("items", 2), ("latest_matches", {"resuming_crawl": "", "latest_offsite": "", "latest_duplicate": "", "latest_crawl": "", "latest_scrape": "", "latest_item": "", "latest_stat": ""}), ("latest_crawl_timestamp", 0), ("latest_scrape_timestamp", 0), ("log_categories", {"critical_logs": {"count": 5, "details": []}, "error_logs": {"count": 5, "details": []}, "warning_logs": {"count": 3, "details": []}, "redirect_logs": {"count": 1, "details": []}, "retry_logs": {"count": 2, "details": []}, "ignore_logs": {"count": 1, "details": []}}), ("shutdown_reason", "N/A"), ("finish_reason", "finished"), ("last_update_timestamp", 1547559048), ("last_update_time", "2019-01-15 21:30:48")]) In [5]: d["elapsed"] Out[5]: "0:01:08" In [6]: d["pages"] Out[6]: 3 In [7]: d["items"] Out[7]: 2 In [8]: d["finish_reason"] Out[8]: "finished"
文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。
转载请注明本文地址:https://www.ucloud.cn/yun/43066.html
摘要:支持一键部署项目到集群。添加邮箱帐号设置邮件工作时间和基本触发器,以下示例代表每隔小时或当某一任务完成时,并且当前时间是工作日的点,点和点,将会发送通知邮件。除了基本触发器,还提供了多种触发器用于处理不同类型的,包括和等。 showImg(https://segmentfault.com/img/remote/1460000018772067?w=1680&h=869); 安装和配置 ...
摘要:时间永远都过得那么快,一晃从年注册,到现在已经过去了年那些被我藏在收藏夹吃灰的文章,已经太多了,是时候把他们整理一下了。那是因为收藏夹太乱,橡皮擦给设置私密了,不收拾不好看呀。 ...
摘要:现已全面发布,采用主线内核,并且支持离线安装,给你更好的部署体验。在中,新的服务装载着内核服务,下载源代码后进行编译,接着创建并启动一种可以在操作台显示的服务。 RancherOS v0.8.0现已全面发布,采用Linux 4.9.9主线内核,并且支持离线安装,给你更好的部署体验。同时,还有更早启动cloud-init、支持cloud-config验证、新的ZFS服务等一系列新功能。 ...
阅读 1682·2021-11-25 09:43
阅读 2635·2019-08-30 15:53
阅读 1781·2019-08-30 15:52
阅读 2880·2019-08-29 13:56
阅读 3294·2019-08-26 12:12
阅读 551·2019-08-23 17:58
阅读 2098·2019-08-23 16:59
阅读 905·2019-08-23 16:21