Sabari M Sabari M
Updated date Oct 27, 2020
In this article, we will see how to scrape data from Linkedin social media website using UiPath RPA. we will use UiPath to extract Company as well as Employee information from LinkedIn based on data stored inside an Excel file.

LinkedIn is an online portal that is solely based on hiring, recruitment, jobs, and employment purposes. Millions of employees of various organizations over the world can connect to collaborate, share ideas, or look for better opportunities. As a result, this website houses an incredible amount of crucial information ranging from Company to Employee information.

In this tutorial, we will use UiPath to extract Company as well as Employee information from LinkedIn based on data stored inside an Excel file.

We have a list of 10 companies for which we will look for information on LinkedIn. We are interested in three important information that is Number of Employees, Website, and Location.

  • Firstly, we will use two Read Range activities and saved them into two distinct Data Tables that is DTinput and DTinput2. However, we will use the Clear Data Table for DTinput2 and use it to add extracted data from LinkedIn using the Add Data Row.

  • Now, we will use Open Browser activity and point to the duckduckgo.com search engine. We are using Duckduckgo instead of Google because Google has anti-scraping mechanisms that redirect to Captchas very often. Duckduckgo allows easy searching and has not anti-bot mechanisms.
  • We will then create a looping sequence using For Each Row and input DTinput into its properties.

Then we will point to the Duckduckgo search bar and type the Company Name followed by “site:linkedin.com/company/”. This filters the search to include only results from LinkedIn and that too containing company details page.

  • For example, searching for Wells Fargo brings up the exact company page for Wells Fargo, which we can open up to extract the required information. Let us open up the link and see what LinkedIn provides us.

  • The required information is present under the About section, which does not open up by default. We can use a click activity to navigate to About but the element containing the About option changes frequently and hence a static Click activity won't work for N number of searches.

  • Hence, we will use a workaround. First, we will get the required Link using Get Text by pointing it to the element containing the URL.

  • Now, we will use an If activity and check whether the Link contains “about” already. If it contains “about” then we will navigate straight to that link or we will add “/about/” and open that URL. This will open up the LinkedIn page straight to the About section where the required information is available.

  • Now we will use three separate Get Text activities and store the information into them.

  • Using Add Data Row we will input the extracted information into DTinput2 Data Table.

  • Finally, we will Append the Data Table to an Excel worksheet using Append Range. However, Clear Data Table is used to clear the Add Data Row information, as Append Range repeats for each row containing company information, which can duplicate rows in Excel worksheet.

  • The last activity we will use is Go Back, which will send the browser back to the DuckDuckgo search page for inputting the next company to be searched for.
  • The final extracted information for the 10 companies listed above looks like this

Mining Employee information from LinkedIn

Suppose we have a list of job titles, employee names, company names, to look for in LinkedIn and scrape information from it. We will again use Duckduckgo to perform our search and for simplification will look for employees associated with RPA Development in Wells Fargo. This entire process can be made much simpler by using LinkedIn Sales Navigator but for this tutorial, we will stick with using normal search engines.

  • Using a Type Into Activity we will search for the required job title that is “RPA Developer” and then use “site:linkedin.com/in”. This ensures that all search results are filtered and contains results from LinkedIn employee profile pages only.
  • We are also using “-“ to further filter the searches such that most of the results are user profiles rather than something else. Searching for  RPA developers in Wells Fargo gives the following search results.

  • As it is evident from the image, the search filters are able to bring up accurate results that we can use to get our required information. Now let us open LinkedIn and use Get Text to extract the required information.

  • We will use three Get Text activity to scrape the Job Title as well as the employee name and company information from this Experience section of the profile page.
  • It is important to note that we are not scraping information from the first “about” section as the Job Title and Company Name are within the same element and hard to separate.
  • There are 10 results per page and for each of them, we need to perform a click action and go to the Experience section and scrape the data. To do this, we need the help of only a few more activities

Find Children

In the previous tutorial, Get Children is already explained in detail. Using these 10 clicks can be performed using a looping sequence using For Each.

For Each

Will create a loop for each child present within the variable, that is, it loops for each search result that is present on the web page.

Send Hotkey

We need to scroll down to the Experience section for that we will Send Hotkey and put Page Down as the key.

Click

Will perform a click on each search result and when the “item” is passed as Element under properties.

Get Text

Three of this activity for scraping Job Title and Company, and Employee Name.

Go Back

To go back to the search results page to perform the next click.

  • The Final sequence will look like this.

  • We can use Add Data Row to save this information into a Data Table and write to an Excel or CSV file.

ABOUT THE AUTHOR

Sabari M
Sabari M
Software Professional, India

IT professional with 14+ years of experience in Microsoft Technologies with a strong base in Microsoft .NET (C#.Net, ASP.Net MVC, ASP.NET WEB API, Webservices,...Read More

https://www.techieclues.com/profile/alagu-mano-sabari-m

Comments (0)

There are no comments. Be the first to comment!!!