Yes, I know. But my PC behind a proxy, so I just found some tutorial relate to urllib2 and BeautifulSoup. Wait, request also support proxy.
Thank for your mention.
Problem solve still use urllib2 and bs4. I redirect the result to *.txt file so next step will create a def to call an download manager program and automatically run it.
Python version : 2.7.10
OS Platform : Windows 7
# coding:utf-8
import urllib2, re
from bs4 import BeautifulSoup
# declare proxy configuration
proxy = urllib2.ProxyHandler({'http': 'http://192.168.1.1:8080'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
# specify the url
quote_page = "http://web_you_wanna_crawl"
page = urllib2.urlopen(quote_page)
soup = BeautifulSoup(page, "html.parser")
#create an empty list link
link = []
for i in soup.find_all('a'):
j = i.get('href')
# type(j) is unicode so convert it to string
if re.findall('.rar',str(j)):
print j
link.append(j)
with open('linkdownload.txt', 'wb') as file:
for item in link:
file.write("%s\n" % item)
Your solution got an error. Could you check it ?
print(link.attrs['href'])
AttributeError: 'ResultSet' object has no attribute 'attrs'