In this post we will create python script that is able to get data from WordPress (WP) blog using WP API. This script will save downloaded data into csv file for further analysis or other purposes.
WP API is returning data in json format and is accessible through link http://hostname.com/wp-json/wp/v2/posts. The WP API is packaged as a plugin so it should be added to WP blog from plugin page[6]
Once it is added and activated in WordPress blog, here’s how you’d typically interact with WP-API resources:
GET /wp-json/wp/v2/posts to get a collection of Posts.
Other operations such as getting random post, getting posts for specific category, adding post to blog (with POST method, it will be shown later in this post) and retrieving specific post are possible too.
During the testing it was detected that the default number of post per page is 10, however you can specify the different number up to 100. If you need to fetch more than 100 then you need to iterate through pages [3]. The default minimum per page is 1.
Here are some examples to get 100 posts per page from category sport:
http://hostname.com/wp-json/wp/v2/posts/?filter[category_name]=sport&per_page=100
To return posts from all categories:
http://hostname.com/wp-json/wp/v2/posts/?per_page=100
To return post from specific page 2 (if there are several pages of posts) use the following:
http://hostname.com/wp-json/wp/v2/posts/?page=2
Here is how can you make POST request to add new post to blog. You would need use method post with requests instead of get. The code below however will not work and will return response code 401 which means “Unauthorized”. To do successful post adding you need also add actual credentials but this will not be shown in this post as this is not required for getting posts data.
import requests
url_link="http://hostname.com/wp-json/wp/v2/posts/"
data = {'title':'new title', "content": "new content"}
r = requests.post(url_link, data=data)
print (r.status_code)
print (r.headers['content-type'])
print (r.json)
The script provided in this post will be doing the following steps:
1. Get data from blog using WP API and request from urlib as below:
from urllib.request import urlopen
with urlopen(url_link) as url:
data = url.read()
print (data)
2. Save json data as the text file. This also will be helpful when we need to see what fields are available, how to navigate to needed fields or if we need extract more information later.
3. Open the saved text file, read json data, extract needed fields(such as title, content) and save extracted information into csv file.
In the future posts we will look how to do text mining from extracted data.
Here is the full source code for retrieving post data script.
# -*- coding: utf-8 -*-
import os
import csv
import json
url_link="http://hostname.com/wp-json/wp/v2/posts/?per_page=100"
from urllib.request import urlopen
with urlopen(url_link) as url:
data = url.read()
print (data)
# Write data to file
filename = "posts json1.txt"
file_ = open(filename, 'wb')
file_.write(data)
file_.close()
def save_to_file (fn, row, fieldnames):
if (os.path.isfile(fn)):
m="a"
else:
m="w"
with open(fn, m, encoding="utf8", newline='' ) as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
if (m=="w"):
writer.writeheader()
writer.writerow(row)
with open(filename) as json_file:
json_data = json.load(json_file)
for n in json_data:
r={}
r["Title"] = n['title']['rendered']
r["Content"] = n['content']['rendered']
save_to_file ("posts.csv", r, ['Title', 'Content'])
Below are the links that are related to the topic or were used in this post. Some of them are describing the same but in javascript/jQuery environment.
References
1. WP REST API
2. Glossary
3. Get all posts for a specific category
4. How to Retrieve Data Using the WordPress API (javascript, jQuery)
5. What is an API and Why is it Useful?
6. Using the WP API to Fetch Posts
7. urllib Tutorial Python 3
8. Using Requests in Python
9.Basic urllib get and post with and without data
10. Submit WordPress Posts from Front-End with the WP API (javascript)
You must be logged in to post a comment.