Welcome to My Blog
Twitter Mining
Twitter is an American microblogging and social networking service on which users post and interact with messages known as “tweets”. Registered users can post, like, and retweet tweets, but unregistered users can only read them as defined by Wikipedia. Twitter allows us to get developer access which can allow us to do data analysis using Twitter api. To be able to extract data from the twitter api you should first create a developer account on the Twitter apps site, Then log in after that you will need to create an app so as to be able to get the consumer key, consumer secret, access key and access secret which will help you in Authentication.
Installation
To use Tweepy you should first install it using pip and import other necessary modules:
**Modules**
`!pip install tweepy
import tweepy
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import tweepy
# Fill the X's with the credentials obtained by From The Twitter api
consumer_key = "XXXXXXXXXXXXXXXXXXXXXXXX"
consumer_secret = "XXXXXXXXXXXXX"
access_key = "XXXXXXXXXXXXXXXXXXXXXX"
access_secret = "XXXXXXXXXXXXXXXXXX"
# Function to extract tweets
def get_tweets(username):
## This handles Twitter authentification and the connection to Twitter Streaming API
# Authorization to consumer key and consumer secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# Authorization to user's access key and access secret
auth.set_access_token(access_key, access_secret)
# Calling api
api = tweepy.API(auth)
# tweets to be extracted
tweets = api.user_timeline(screen_name=username, count=20)
# Empty Array
u =[]
tweet = [i.text for i in tweets]
for i in tweets:
# Appending tweets to the empty array u
u.append(i)
# Printing the tweets
print(u)
`
From This code above We first install tweepy and import modules, Then we authenticate using the credentials obtained from the Twitter developer account. Write a function of getting tweets for Twitter handles by first authenticating your key and calling the api Then Extract data from the api. For more details see Twitter mining For more information.
Web Scraping
Web Scraping is extracting data from websites. All you have to do is access the html of the website. You will have to install the beautiful soup package so as to be able to scrape data from the website.
Data extraction from the web using Python’s Beautiful Soup module
**Modules**
`!pip install requests BeautifulSoup4 fire
from requests import get
from requests.exceptions import RequestException
from contextlib import closing
from bs4 import BeautifulSoup
import pandas as pd
import os, sys
import fire
def simple_get(url):
try:
with closing(get(url, stream=True)) as resp:
if is_good_response(resp):
return resp.content
else:
return None
except RequestException as e:
log_error('Error during requests to {0} : {1}'.format(url, str(e)))
return None
def is_good_response(resp):
content_type = resp.headers['Content-Type'].lower()
return (resp.status_code == 200
and content_type is not None
and content_type.find('html') > -1)
def log_error(e):
print(e)
`
To scrape data you will first install modules and packages of Beautiful soup The simple_get function Attempts to get the content at url by making an HTTP GET request, If the content-type of response is some kind of HTML/XML, return the
text content, otherwise return None. is_good response function Returns True if the response seems to be HTML, False otherwise.
and the log_error helps us to log errors, This function just prints them.
For more information about web scraping check out it here : Real Python or Datacamp
Author
Names : Evelyne Umuhire
Email: Uevelyne44@gmail.com