Experiments / Project 702

CRU import using BCP

I have been having a bit of trouble with the bcp utility for SQL Server. IT seems that the format of the input file is the critical thing, and that you need to make sure it is just right in order to avoid bcp errors. Ideally, I would like to get into this more, try to break the problem apart and find the offending lines, but at this stage I need to get this going and move on. As a temporary work-around, I am converting the TSV CRU files to CSV files (using python) and then calling the bcp utility within a batch script:

import os
import sys
import time

def convert(dir):
    os.chdir(dir)
    cruFiles = os.listdir(dir)
    csvFiles = [i[:-4] + '.csv' for i in cruFiles]
    for fIndex in range(len(cruFiles)):
        with open(csvFiles[fIndex], 'w') as fout:
            for line in open(cruFiles[fIndex]):
                data = ','.join(line.split()) + '\n'
                fout.write(data)

def bcpImport():
    """ Call a bcp shell script to load data """
    os.system("D:/Users/nick-burns/Desktop/bcpImport.bat")

if __name__ == '__main__':
    homeDir = sys.argv[1]
    start = time.time()

    print("Converting CRU -> CSV")
    convert(homeDir)

    print("BCP import...")
    bcpImport()   

    print("TOTAL TIME: ", time.time() - start)

Tested this with the 2009 CRU data, and timing are as follows:

process Time (sec)
convert() 13.80
bcpImport() 94.16
TOTAL 107.96
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s