手把手带你爬虫_开云全站APP官网
发布时间:2023-05-26
本文摘要:

url=self.url

f.write(result+ 'n')

fromlxml importetree


手把手带你爬虫_开云全站APP官网(图1)

results=target.xpath( '//div[@class="read-content j_readContent"]/p/text')

fromlxml importetree

def__init__(self):

defmian(self):

#剖析链接地址

forresult inresults:


手把手带你爬虫_开云全站APP官网(图2)

3.获取图片的链接地址。

url=self.url

f.write(result+ 'n')

fromlxml importetree


手把手带你爬虫_开云全站APP官网(图3)

开云全站APP官网

results=target.xpath( '//div[@class="read-content j_readContent"]/p/text')

fromlxml importetree

def__init__(self):

defmian(self):

#剖析链接地址

forresult inresults:


手把手带你爬虫_开云全站APP官网(图5)

3.获取图片的链接地址。importrequests


手把手带你爬虫_开云全站APP官网(图6)

2.发送请求,获取网页。defget_html(self,url):

开云全站APP官网

print(name)

links=target.xpath( '//ul[@class="cf"]/li/a/@href')

defparse_html(self,html):

fori inrange( 1, 100):

names=target.xpath( '//span[@class="content-wrap"]/text')

html=response.content.decode( 'utf-8')

开云全站APP官网

forlink,name inzip(links,names):

开云全站APP官网

self.headers = {

html=response.content.decode( 'utf-8')

links=target.xpath( '//ul[@class="cf"]/li/a/@href')

开云全站APP官网

self.headers = {

res=requests.get(host,headers=self.headers)

defmain(self):

开云全站APP官网

print(name)

c=res.content.decode( 'utf-8')

target=etree.HTML(c)

fromlxml importetree

forname innames:

打开网址:

fromfake_useragent importUserAgent

forlink inlinks:

classphoto_spider(object):

5.生存为txt文件到当地。

withopen( 'F:/pycharm文件/document/'+ name + '.txt', 'a') asf:

fromfake_useragent importUserAgent

names=target.xpath( '//span[@class="content-wrap"]/text')

fori inrange( 1, 100):

'User-Agent': ua.random

开云全站APP官网

forlink,name inzip(links,names):

开云全站APP官网

self.headers = {

html=response.content.decode( 'utf-8')

开云全站APP官网

self.headers = {

res=requests.get(host,headers=self.headers)

defmain(self):

开云全站APP官网

print(name)

html=self.get_html(url)

forlink inlinks:

classphoto_spider(object):

5.生存为txt文件到当地。withopen( 'F:/pycharm文件/document/。


本文关键词:开云全站APP官网

本文来源:开云全站APP官网-www.lywtzz.cn

咨询电话
068-68047683