Python writing reading large datasets script

Below is a script to read and write to and from large datasets saved as csv files.

Writing datasets to an csv file.

import csv
#writing data into csv file
writer = csv.writer(open('dataset.csv', 'wb', buffering=0))
writer.writerows([
('GOOG', 'Google Inc.', 123.44, 0.32, 0.09),
('YHOO', 'Yahoo! Inc.', 2.33, 99.23, 0.123),
('IBM', 'IBM Inc.', 223.44, 212.32, 6.42)
])

Reading from large datasets csv files

import csv
dataset = csv.reader(open('dataset.csv', 'rb'))
status_labels = {-1: 'down', 0: 'unchanged', 1: 'up'}
for ticker, name, price, change, pct in dataset:
	status = status_labels[cmp(float(change), 0.0)]
print '%s is %s (%s%%)' % (name, status, pct)

This script is good for importing large datasets for you Hadoop jobs.

Published by Thys Michels

Cloud Architect - Salesforce and everything related View all posts by Thys Michels

Leave a Comment