Categories
Python Code web scraping

Web Scraping with bs4

BeautifulSoup

Conditonal logic with soup.select

'''make a new list "ls_expose" and add hrefs to it if they contain 'expose' in the url.'''

for a in soup.select('a'):
            if 'expose' in a['href']:
                        ls_expose.append(a['href'])

Get all dropdown values from html

(using page saved locally, for testing)
from bs4 import BeautifulSoup
import requests

my_page = "Wohnung mieten im Umkreis von 50 km von Mannheim - Immob.html"

soup = BeautifulSoup(open(my_page),"lxml")
soup = soup.find('div',class_ = "select-input-wrapper")
items = soup.select('option[value]')
values = [item.get('value') for item in items]
textValues = [item.text for item in items]
print(textValues)

soup.select(‘option[value]’)

how to get dropdown option values with bs4
127 option values as seen in the html, we need to be able to get this with our code….

soup.select output
Output from script – this shows the values as text, they need to be converted to integers if you are going to use them in a for loop. See below for the code to do this, using a list comprehension.
 from bs4 import BeautifulSoup
 import requests

 my_page = "Wohnung mieten im Umkreis von 50 km von Mannheim - Immob.html"
 soup = BeautifulSoup(open(my_page),"lxml")
 soup = soup.find('div',class_ = "select-input-wrapper")
 items = soup.select('option[value]')
 values = [item.get('value') for item in items]
 x = [int(item) for item in values]
 print(x)

Note how I use soup.find to narrow down the find, otherwise I would have found other dropdowns as well