最近在爬取bloomberg上的新闻,所以在这里记录一下过程。 思路 通过网站的sitemap获取链接,解析链接通过scrapy框架爬取。 网站链接的获取: https://www.bloomberg.com/robots.txt 这是网站的robots.txt,如下: # Bot rules:# 1. A bot may not injure a human being
个人博客地址 https://mengjiexu.com/post/bloomberg-api/ Motivation Bloomberg has integrated massive data from various of data vendors. However, as a typical finance terminal designed for traders, it’s tech