Requests

Requests

1. 安装requests

命令行使用pip安装:pip install requests

2. get请求

2.1 get请求

该方法用于向目标网址发送请求,接收响应

get方法中的参数说明:

  • url:必填,指定请求的目标网址
  • params:字典类型,指定请求参数

该方法返回一个response对象,常用的方法和属性如下:

import requests

r = requests.get('https://api.github.com/events')  # 使用get请求访问github
print(r.status_code)                      # 打印响应状态码
print(r.text)                             # 返回str类型的响应
print(r.url)                              # 返回请求网站的url
print(r.encoding)                         # 返回响应时间的编码方式
print(r.cookies)                          # 返回响应的cookies信息
print(r.headers)                          # 返回响应头部
print(r.content)                          # 返回bytes类型的响应体
print(r.json())                           # 返回字典类型的响应体
知识兔

2.2 传递URL参数

注意:httpbin.org是一个开源的,用于测试网页请求的网站

在get请求中,使用params关键字参数,可将字典类型的自定义参数传入到URL中进行拼接

import requests

url = 'http://httpbin.org/get'   # 定义请求目标网址
params = {                       # 定义自定义请求参数
    'key1':'value1',
    'key2':'value2'
}
r = requests.get(url, params=params)  # get请求中传入自定义参数
print(r.status_code)               # 打印状态码
print(r.url)                       # 打印传入参数后的URL
print(r.json())                    # 返回dict响应
知识兔
# 状态码200# 传入参数后的urlhttp://httpbin.org/get?key1=value1&key2=value2    # 返回的dict类型数据中可以看到自定义参数{'args': {'key1': 'value1', 'key2': 'value2'}, 'headers': {'Accept': '*/*', 'Accept-Encodinh': 'gzip, deflate', 'Connectioo': 'keep-alive', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.18.4'}, 'origin': '119.123.77.203, 119.123.77.203', 'url': 'https://httpbin.org/get?key1=value1&key2=value2'}

2.3 实例

以百度搜索为例

import requestsurl = 'http://www.baidu.com/s'params = {'wd':'python'}r = requests.get(url, params=params)  # get方法中传入params参数,可以讲自定义参数拼接到URL中print(r.url)                          # 打印拼接过后的URLprint(r.text)                         # 打印响应文本内容
http://www.baidu.com/s?wd=python

3. post请求

3.1 data

如果想要发送一些编码为表单形式的数据,只需要传递一个字典给data关键字参数,你的字典数据在发出请求时自动编码为表单形式

import requests# 给data传入一个字典数据r = requests.post('http://httpbin.org/post', data={'key1':'value1', 'key2':'value2'})print(r.text)
{  "args": {},   "data": "",   "files": {},   "form": {               # data传入到这里,以表单形式    "key1": "value1",     "key2": "value2"  },   "headers": {    "Accept": "*/*",     "Accept-Encoding": "gzip, deflate",     "Content-Length": "23",     "Content-Type": "application/x-www-form-urlencoded",     "Host": "httpbin.org",     "User-Agent": "python-requests/2.18.4"  },   "json": null,   "origin": "119.123.77.203, 119.123.77.203",   "url": "https://httpbin.org/post"}

3.2 json

import requestsdata = {'key1':'value1', 'key2':'value2'}r = requests.post('http://httpbin.org/post', json=json.dumps(data))print(r.text)
{  "args": {},   "data": "\"{\\\"key1\\\": \\\"value1\\\", \\\"key2\\\": \\\"value2\\\"}\"",   # json传到这里  "files": {},   "form": {},   "headers": {    "Accept": "*/*",     "Accept-Encoding": "gzip, deflate",     "Content-Length": "46",     "Content-Type": "application/json",     "Host": "httpbin.org",     "User-Agent": "python-requests/2.18.4"  },   "json": "{\"key1\": \"value1\", \"key2\": \"value2\"}",   "origin": "119.123.77.203, 119.123.77.203",   "url": "https://httpbin.org/post"}

3.3 实例

ecshop登录实例:

首先,通过fiddler抓包,查看ecshop登录请求头中的Content-Type,可以看到请求中的数据是以表单的形式发送到服务器的,如图:

所以,在使用requests发送post请求时,用户名与密码都会以data关键字参数传入

import requestsimport reurl = 'http://localhost:82/ecshop/user.php'datas = {'username':'liyihang', 'password':'tashi123', 'act':'act_login'}r = requests.post(url, data=datas)print(r.text)# 可以用re模块,正则匹配的方式,提取出用户名t = re.findall('<font class="f4_b">(.+?)</font>', r.text)print(t)
['liyihang']

4. 保持会话

一般在项目中,很多操作都是需要先登录再进行,所以需要先发送登录请求,登录成功后,保持登录状态,再进行后续操作,在requests中保持会话的方式如下:

import requests# 保持登录,然后充值100s = requests.session()  # 建立一个session# 定义登录与充值需要的数据url = 'http://localhost:82/ecshop/user.php'datas1 = {'username':'liyihang', 'password':'tashi123', 'act':'act_login'}datas2 = {'amount':'100', 'user_note':'jiekou', 'payment_id':'2', 'act':'act_account'}# 使用建立的session对象发送登录请求# 登录成功后会返回的cookie会被保存在该session中r = s.post(url, data=datas1)# print(r.cookies)# 发送充值请求r2 = s.post(url, data=datas2 )print(r2.text)

5.请求头

5.1 获取请求头

httpbin.org/headers会返回发送请求的请求头

请求头内容可以用r.headers获取

import requestsr = requests.get('http://httpbin.org/headers')print(r.headers)print(r.text)

结果如下,可以看到user-agent为requests

{  "headers": {    "Accept": "*/*",     "Accept-Encodinh": "gzip, deflate",     "Connectioo": "keep-alive",     "Host": "httpbin.org",     "User-Agent": "python-requests/2.18.4"     }}

5.2 自定义请求头

很多网站为了保证安全会设置反爬虫机制,也就是说对于非浏览器的访问拒绝响应,在这种情况下,就需要修改请求中的headers信息伪装成浏览器访问

使用浏览器访问httpbin 获取headers

打开浏览器,在地址栏中输入:http://httpbin.org/headers

该URL会返回发送请求的请求头,所以会将浏览器发送的请求头部返回,如下:

{  "headers": {    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",     "Accept-Encodinh": "gzip, deflate",     "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",     "Connectioo": "keep-alive",     "Host": "httpbin.org",     "Upgrade-Insecure-Requests": "1",     "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"  }}

headers中各参数的作用如下:

  • accept:指定客户单接受的内容类型,/表示任意类型,其余按照字义
  • accept-encoding:指定客户端接受的编码类型
  • accept-language:指定客户端接受的语言类型,q=0.9表示喜欢程度,不写则是1,数值越高越喜欢
  • connection:指定长链接处理方式,keep-alive表示希望保持传输链接,http本身是无状态的
  • host:服务器主机名
  • user-agent:用户代理,服务器从此处知道客户端的系统类型以及版本等信息

将获取到的headers 传给get

import requestsurl = 'http://httpbin.org/headers'header = {    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",    "Accept-Encodinh": "gzip, deflate",    "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",    "Connectioo": "keep-alive",    "Host": "httpbin.org",    "Upgrade-Insecure-Requests": "1",    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"  }# url = 'http://www.baidu.com'r = requests.get(url, headers=header)  # 将自定义headers信息通过headers关键字参数传入# print(r.headers)print(r.text)

自定义headers后,返回的结果如下:

{  "headers": {    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",     "Accept-Encodinh": "gzip, deflate,gzip, deflate",     "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",     "Connectioo": "keep-alive,keep-alive",     "Host": "httpbin.org",     "Upgrade-Insecure-Requests": "1",     "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"  }}

5.3 实例

知乎发现页面为例,在没有通过自定义请求头伪装成真实浏览器时,是获取不到该页面中的信息的,只有在将请求头伪装成与真实浏览器一样以后,才可以获取到该页面的信息

首先,用fiddler抓取知乎-发现页面请求信息,在headers中找到user-agent信息,并将其复制

然后在代码中自定义header

import requests# 未定义请求头之前# r = requests.get('https://www.zhihu.com/explore')# print(r.text)   # 不能正常访问页面信息header = {    "User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 "                 "(KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"}# 通过headers关键字参数将自定义header传入r = requests.get('https://www.zhihu.com/explore', headers=header)print(r.text)    # 正常访问知乎发现页面

6.HTTPS

requests在发送HTTPS请求时,会报证书错误,代码与错误如下:

import requestsurl = 'https://login.51job.com/login.php'r = requests.post(url)print(r.encoding)   # 查看响应的编码格式r.encoding = 'gbk'  # 指定响应的编码格式,不指定响应内容中的中文会乱码print(r.text)       # 查看响应的文本信息

执行上述代码会报证书错误:requests.exceptions.SSLError: HTTPSConnectionPool

解决办法为:

import requestsimport urllib3urllib3.disable_warnings()   # 使用urllib3关闭警告url = 'https://login.51job.com/login.php'r = requests.post(url, verify=False)    # 关闭证书验证print(r.encoding)r.encoding = 'gbk'print(r.text)
计算机