What is data scraping in UiPath?
Data Scraping activity is used to scrape information from specific elements inside an application or web browser. The user can select one element and the Data Scraping activity will be able to scrape information for all such elements present in the window. Hence, it is useful to get data such as Title, Links, Numbers, etc.
Data Scraping in Amazon Using UiPath
We will learn how to scrape data from Amazon using UiPath RPA in this article. Amazon is an online shopping site that houses millions of data and hence contains a lot of information which can be useful for various purpose. We will search for mobiles and will scrape the product description, product name, and price. The search results will vary depending upon your country but the entire process will stay similar.
To scrape product information, we need to open up each search result. To do this we can use Data Scraping to get all the URLs or use For Each Children. Each page has approx. 20 search results and to scrape information from all of them we can use Find Children and associate it with a Click activity.
In this tutorial, we will follow a Get Children instead of Data Scraping Activity. This is because Data Scraping fails to recognize all the search results as we will witness from the following section.
Using Data Scraping Wizard
- First, open Data Scraping Wizard and select the product name as the first item, as shown in the image. Similarly, select the second result.
- Upon successfully selecting the two search results, UiPath will be able to map all the search results present on that webpage and bring up the Configure Columns Dialog Box. In this, we need to Extract Text and Extract URL and will save them under different column names.
- The Data Scraping activity is unable to get all the search results as it is evident from the image. We will use Find Children as a workaround for this issue.
Using Find Children
- First, we need an Open Browser activity which will redirect to Amazon.in, where we need a Type Into activity to search for our required keyword.
- It is assumed that an Excel File containing a bunch of keywords are present, which we will read using Read Range and then create a For Each Row activity to loop through them.
- Then, we will type the keyword into Amazon’s search box, which will bring out the search results.
- To use Find Children, we need to identify the common element with which the search results are contained. For that, we will UI Explorer in UiPath.
- We will select the search title, which will give up the following information in UI Explorer.
- The class containing the element is a-size-medium a-color-base a-text-normal, we can confirm this by Indicating the second search result using UI Explorer.
- Now, we have found the required Filter to be used with Find Children. First, we will select the entire search result using Indicate on screen as shown in the image.
- Next, we will put out a class filter that we recognized using UI Explorer.
- The tag that we have added is an essential requirement for the filter selector to work correctly. If you observe UI Explorer closely, it shows the tag just below the properties panel under Target Element. The output is saved inside the Children variable.
- Another crucial property that requires to be altered is the Scope of the Children activity. We will change the value from FIND_CHILDREN to FIND_DESCENDANTS. This asks the Find Children activity to look for all the elements descending from that element we have put in as the filter.
- Next, we will use the For Each activity to loop through each item that was found.
- Now, we need to change the TypeArgument from Object to UiPath.Core.UiElement. If you are unable to find this Data Type, just select Browse for Types and select UiPath.Core.UiElement.
The final output will look like this.
- This is crucial as it allows the items inside the Children variable to be used as elements inside other activities. We will include a Click activity inside For Each which will click all the search results one by one.
- Now we need three Get Text activity, for each of the information that we need starting from Name followed by M.R.P, and then finally the Product Description.
- We can also get the URL of the product by using Get Attribute activity and passing the “URL” as the attribute to look for.
- We need to add this to a Data Table in order to write it to an Excel or CSV file. Using Add Data Row we will add all Get Text values into different columns like this. We have supplied a Data Table variable DToutput where the information will be saved.
- We also need a Go Back activity to navigate to the last page so that the next click can be performed.
- We need to Build a Data Table which will contain three columns and we will add this activity at the beginning of the workflow sequence.
- The search results span for multiple pages and hence we need to assign an activity that will help UiPath to navigate to the next page. We will simply assign a Click activity and point it to the Next Button.
- The entire scraped information can be saved into a local machine by using a Write/Append Range activity which can be included inside the For Each Children loop or within For Each Row loop but needs a Clear Data Table function to be placed with it to make sure the same data is not being written multiple times.
The final output looks similar to the image above.
Note: This article is only for learning purposes only.