Scrape Facebook Posts

Data scientists need for many times to analyse the different data shared on the diffrent social networks so to extract analytic subjects or find a solution to company or society problems.

So, in this post, I'll show you how to scrape public data from a facebook page, even if you're neither an admin of this page , nor a follower.
I'll use python 2.7 in my code, and I need to cread a Facebook Graph API.

Facebook Graph API

In order to be able to gather data from facebook, you should create an API, this API will be as an intermediate between your program and the FB page. Before creating it, you shoud subscribe as a developper in : https://developers.facebook.com/ 
folow the instructions and subscribe your account developper. but you should specify your phone number in your facebook profile (www.facebook.com) in order to recieve a confirmation code via SMS.

Now after validating your account, you can start working.

Create a Graph API:

    1. Click on 'add new app'


    2. Give a name to your app, then clic on create App ID


Now you are redirected to the APP dashbord, where you find:

    - The App id
    - The App secret
    - The app version


you will need these information next time.
If you complete this step=> your app is created successfuly

Acess Token

Each Graph APP has an access token, it's considered as a ticket that identifies your app when you want to gather data from facebook. You should khnow that their is 2 types of Access tockens.

    1. Temporary access token: that endures1 to 2 hours => you get it from here (https://developers.facebook.com/tools/explorer). In graph Api explorer select your app,

than select Get user access token (you can select all permissions, or just the main ones).


After that, a token will be generated for you, but you should know that this token is valid during 1 to 2 hours (this info you can get it from (i).


    2. Permanent access token: this is given by combining your APP ID and APP Secret (APP ID|App SECRET) , it's available during 60 days.

How to write your Url ?

You should know that to gather data from FB, you shoul write an URL that return you data in a json format. this url can be build here when you clic on Submit , you will get info required, and you can select all the fiels that you need in your request according to the permissions acording to your app.

In this article i'll focus on the page posts.
So in order to get the page data, you can provide the id of a page in the url (to get the id of a page we have two ways :

    1. Go to the facebook page that you want to analyse, click ctrl-u, search (page_ID))=> It's a       number
    2. Use this site: that return the ID of a given page or user: https://findmyfbid.com/

   Or you give directly the name of the page.

As shown in the figure, replace 'me' by  the 'pageID' or 'PageName' and then start selecting the fields that you need.


after that, you can copy this request fields and combiend them with: https://graph.facebook.com/v2.11/page_name/posts?fields=list_of_fiels_separated_by_comma&access_token=your_access_token


This url when past in your brower will give you the information needed in a JSON format. for more information about the list and meaning of fields. I invite you to visit this offcial doc of facebook (it's frequently updated) https://developers.facebook.com/docs/graph-api/reference
For our example : posts, visit this page: https://developers.facebook.com/docs/graph-api/reference/v2.11/post , section 'Reading'.

Of course, you can enjoy by trying to modify this url https://graph.facebook.com/v2.11/page_name/posts?fields=list_of_fiels_separated_by_comma&access_token=your_access_token by giving all fields that you need, and analyse the results at each time.

You should know that this url returns a part of data , because data are in pages, so to pass to the next page, at the end of the results you will find paging, click and click on next to pass to the next part of data and so on.

Python Code

In this section, I give you the python 2.7 code  that you can use, in order to gather public posts of a page, then you store them in a csv file. You should just modify your acces token, the name of the page, the location & name of your output file.

Of course, in the url, you can add or remove fields according to your need , and refer you to : https://developers.facebook.com/docs/graph-api/reference/v2.11/post to avoid errors.

You can run this file in linux=> type in Terminal (python postsVF.py) => don't forget to move to the folder where you have put the code.

or run it using jupyter notebook.



Comments

Post a Comment

Put your comments here, please.
Share with us you vision...

Popular posts from this blog