Note: This article is published for learning purposes only and does not encourages to scrape any website without their permission.
Yelp is a market place which hosts data and information about local businesses and allows users to leave a review of such businesses. Information provided by this website includes phone numbers, addresses, and user reviews.
Scraping this information is relatively simple using UiPath.
- Searching for restaurants in San Francisco, CA area in yelp, which returns the following result.
- We need three specific information from this page. The first is the restaurant name, the second is the Phone number, and the third is the address.
- By using the Data Scraping activity, we will first select the title of the restaurant, which is "Fog Harbor Fish House”.
- Next, we will select the title of the second restaurant that is Kitchen Story. This will allow UiPath to map all the titles available on the current webpage.
- Once the second title is successfully selected, UiPath will show this image. As we want text information, we will select Extract Text and choose a column name. In this example, we are naming it "Restaurant name".
- UiPath easily extracts the titles from the webpage. Now we need to extract the phone numbers of all those restaurants. To do so we will use the "Extract Correlated Data" option.
- The user needs to select the phone number from the top right corner, as visible from the image, and perform the same action for the Second Element.
- Once successfully mapped, UiPath easily extracts the phone numbers for all the restaurants evident from the image.
- Next, we will extract the Addresses from the web page and use a similar approach to select the element containing the address for the first and second restaurants.
- The addresses of the restaurants are mentioned in two different lines and also in two different elements. As a result, we need to extract the addresses separately in two different columns.
- Upon successfully selecting the addresses, UiPath scrapes all the addresses from the first line.
- However, we also need to scrape the address for the second line and hence we will use “Extract Correlated Data” again.
- The image shows the process of selecting the second address line and it is the same as previous cases.
- Upon successful selection, UiPath should be able to grab all the addresses from the second line into a new column.
- This marks the end of the entire scraping operation using the Data Scraping Wizard.
- To make UiPath Robot scrape data from multiple web pages, we need to select the pagination after clicking on the "Finish" button.
- On selecting “Yes”, the user is redirected to a pointer which acts as a selector, with which we need to point to the next button on the webpage.
- There is a total of 24 pages, and UiPath Robot will go to each of those pages and scrape all the data.
- On selecting the > button, Data Scraping Wizard closes to create a Data Scraping in the main workflow which is automatically coded into a sequence.
- The extracted information is stored inside the ExtractDataTable variable. Using this variable, a user can write the table to a CSV file or excel worksheet.
- The image shows the final data that is extracted from the Yelp search page into tabular data. This data can be further manipulated is UiPath as per a user’s requirement.
- To write this table into an excel worksheet, we need to use the Write Range/Append Range activity.
- Append Range activity is the safer option as it does not replace pre-existing data and only adds to whatever that is present on the user-defined worksheet.
- We are from the System > File > Workbook > Append Range, which inserts the activity in the workflow as shown in the image.
- The first box represents the save location of the excel sheet, whereas the second box points to the sheet name to which the data will be written.
- Finally, the last box asks for the information to be written which is supplied using the ExtractDataTable variable generated by the Data Scraping activity.