scrapy 2.3 选择元素属性

2021-06-03 10:55 更新

有几种方法可以获得属性的值。首先，可以使用XPath语法：

>>> response.xpath("//a/@href").getall()
['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']

xpath语法有几个优点：它是标准的xpath特性，并且 @attributes 可用于xpath表达式的其他部分-例如，可以按属性值筛选。

scrapy还提供了对css选择器的扩展 (::attr(...) )它允许获取属性值：

>>> response.css('a::attr(href)').getall()
['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']

除此之外，还有 .attrib 选择器的属性。如果您喜欢在Python代码中查找属性，而不使用xpath或CSS扩展，则可以使用它：

>>> [a.attrib['href'] for a in response.css('a')]
['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']

此属性在SelectorList上也可用；它返回一个字典，其中包含第一个匹配元素的属性。当选择器预期给出单个结果时（例如，当按元素ID选择时，或在页面上选择唯一元素时），使用它非常方便：

>>> response.css('base').attrib
{'href': 'http://example.com/'}
>>> response.css('base').attrib['href']
'http://example.com/'

.attrib 空SelectorList的属性为空：

>>> response.css('foo').attrib
{}

以上内容是否对您有帮助：

写笔记

我要补充

推荐文章