Python is widely used programming language in many fields, such as data science, machine learning, web development. It is also a great choice for automation of different computer tasks such as data downloading from websites or data manipulation. Python can open, read or save data in different file formats.
In this post we will look how to use python for extracting data from different CSV files and saving the data in some archive file. This is a typical task that can appear when you download data from one or more websites into different files and then need to combine and save the data in some location that can be database or archive file.
The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. There is no “CSV standard”, so the format is operationally defined by the many applications which read and write it. [1]
In our sample script the python program goes through specified folder and finds all CSV files.
The CSV file then is opened and needed data is selected. Here we skip first five header rows.
The selected data is then saved in one master.
Here is the full source script.
import os
import csv
## utf8 was used to support international characters
## newline='' was used because otherwise it will put blank line between lines
## F1, F2 - just field names
## path is the path folders where CSV files with the data are located
with open('data.csv', 'w', encoding="utf8", newline='' ) as csvfile:
fieldnames = ['F1', 'F2']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
path="C:\\Users\\Owner\\Downloads"
list_of_files = {}
for filename in os.listdir(path):
# if the element is an CSV file then...
if filename[-4:] == ".csv":
list_of_files[filename] = path + "\\" + filename
print (list_of_files[filename])
with open(list_of_files[filename], encoding="utf8" ) as f:
csv_f = csv.reader(f)
for i, row in enumerate(csv_f):
if i > 5 and len(row) > 1 :
print(row)
writer.writerow({'F1': row[0], 'F2': row[1]})
In case you want save combined data in database for example MongoDB here is the script source code. The script is using MongoDB in the cloud at MongoLab. Refer to [2] if you need to install MongoDB on windows system
import os
import csv
from pymongo import MongoClient
#To connect using the MongoDB on MongoLab via the standard MongoDB URI:
#mongodb://:@xxxxxxx.mlab.com:xxxxx/database_name
# to learn more about MongoLab visit http://www.mongolab.com [3]
# insert your data in the line below for your MongoLab connection string data
connection = MongoClient("mongodb://user_name:password@xxx.mlab.com:xxx/database_name")
# insert your data for database_name and collection_name
db = connection.database_name.collection_name
path="C:\\Users\\Owner\\Downloads"
list_of_files = {}
for filename in os.listdir(path):
# if the element is a csv file then..
if filename[-4:] == ".csv":
list_of_files[filename] = path + "\\" + filename
print (list_of_files[filename])
with open(list_of_files[filename], encoding="utf8" ) as f:
csv_f = csv.reader(f)
for i, row in enumerate(csv_f):
if i > 5 and len(row) > 1 :
print(row)
db.insert({'F1': row[0], 'F2': row[1]})
# find all documents
results = db.find()
print()
print('==============================')
# display documents from collection
for record in results:
# print out the document
print(record['F1'] + ',',record['F2'])
print()
# close the connection to MongoDB
connection.close()
The scripts above can be easy customized for your specific needs. Feel free to ask questions or provide any feedback or comments. Below you can find references and useful links.
References
1. CSV File Reading and Writing
2. How to install mongodb on windows
3. mLab – Database-as-a-Service for MongoDB
4. MongoDB and Python: A Simple Example