Using Databases with Python-Counting Organizations

Counting Organizations

This application will read the mailbox data (mbox.txt) count up the number email messages per organization (i.e. domain name of the email address) using a database with the following schema to maintain the counts.

CREATE TABLE Counts (org TEXT, count INTEGER)


If you run the program multiple times in testing or with dfferent files, make sure to empty out the data before each run.

The data file for this application: http://www.pythonlearn.com/code/mbox.txt.

The program can be speeded up greatly by moving the commit operation outside of the loop. In any database program, there is a balance between the number of operations you execute between commits and the importance of not losing the results of operations that have not yet been committed.


import sqlite3

#connects to database, creates a sqlite file
conn = sqlite3.connect('emaildb.sqlite')
cur = conn.cursor()

cur.execute('''
DROP TABLE IF EXISTS Counts''')

cur.execute('''
CREATE TABLE Counts (org TEXT, count INTEGER)''')

#convert txt into table
fname = raw_input ('Enter file name:') #<span style="font-family: Arial, Helvetica, sans-serif;">fname = 'mbox-short.txt'</span>
if (len(fname) < 1) : fname = 'mbox-short.txt'
fh = open(fname)
for line in fh:
    if not line.startswith('From: ') : continue
    pieces = line.split('@')
    org = pieces[1]
    org = org.rstrip()#remove the blank spaces
    cur.execute('SELECT count FROM Counts WHERE org = ?', (org, ))
    try:
        count = cur.fetchone()[0]#get the current count
        cur.execute('UPDATE Counts SET count=count+1 WHERE org = ?', (org, ))
    except:
        cur.execute('''INSERT INTO Counts (org, count)
            VALUES(?, 1)''', (org, ))
    conn.commit()


sqlstr = 'SELECT org, count FROM Counts ORDER BY count DESC' #DESC LIMIT 10

for row in cur.execute(sqlstr):
    print str(row[0]), row[1]

cur.close()


版权声明:本文为suiqiji206原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。