Add data to a database from a CSV using Python, Pandas, and SQL
Do you have a CSV file and a database to transfer the data into?
The data needs to go into your database and you want to automate this?
For instance : you have been web scraping and extracted publicly available data and now need to store it in a database table?
This guide will show you a basic example of how to get the data from a CSV, convert it to a pandas dataframe, and insert it into your chosen database table.
# |r|e|d|a|n|d|g|r|e|e|n|.|c|o|.|u|k| # CSV to PANDAS to SQL example
Import a CSV and convert it into a pandas dataframe
import pandas as pd
df = pd.read_csv(r’claims.csv‘)
df
pat_num | claim | status | |
---|---|---|---|
0 | 111222 | 123 | accepted |
1 | 111333 | 234 | accepted |
2 | 222444 | 432 | rejected |
3 | 444555 | 431 | rejected |
4 | 666777 | 655 | accpted |
Connect to existing SQL database and check we can query some data
# Module Imports import mariadb import sys # Connect to MariaDB Platform try: conn = mariadb.connect( user="user2", password="password2", host="192.168.1.9", port=3306, database="newz" ) except mariadb.Error as e: print(f"Error connecting to MariaDB Platform: {e}") sys.exit(1) # Get Cursor cur = conn.cursor()
# Select the first 8 results as a check cur.execute( "SELECT title,story FROM tnewz WHERE publication=? LIMIT 8", ("independent.co.uk",) )
# Print results selection for (title, publication) in cur: print(f"Title: {title}, Story: {publication}")
Title: We need to get back to normality, and fast – I appear to have fallen in love with my robot vacuum cleaner, Story: It is absurdly easy to anthropomorphise this creature, which spins around picking up fluff – especially in the absence of a puppy or a baby in the house, writes Title: ‘This ends now’: Biden demands Trump make national address to end Capitol siege, Story: President-elect condemns Capitol violence: 'This is not dissent. It's disorder. It's chaos. It borders on sedition’ Title: ‘We love you, you’re all very special’: Trump tells mob to ‘go home’ but repeats false claims about election, Story: 'I know your pain. I know you're hurt’ Title: Senate evacuates after sudden recess while certifying Biden’s win amid reports of shots fired on Capitol Hill, Story: Congress calls surprise recess as protesters storm Capitol building Title: Capitol Hill riots: Dramatic videos show Trump supporters storming Capitol building, Story: Videos posted to social media from reporters and lawmakers depicted a chaotic, terrifying scene of pro-Trump rioters breaking into the Capitol building Title: Capitol riots: The Simpsons eerily predicted incident in 1996, Story: Scene in question shows characters running up Capitol steps, firing guns Title: Prince William tells children of NHS staff sacrifices ‘every day’ during pandemic, Story: 'We're making sure the children understand all of the sacrifices that all of you are making,' says Duke of Cambridge Title: Sriwijaya Air flight SJ182: Boeing 737 loses contact in Indonesia, Story: The aircraft is a ‘classic’ 737 — two generations older than the Boeing 737 Max
Connect to new database and prepare to insert dataframe
# Connect to MariaDB Platform try: conn = mariadb.connect( user="user3", password="redandgreen", host="192.168.1.9", port=3306, database="exampledb" ) except mariadb.Error as e: print(f"Error connecting to MariaDB Platform: {e}") sys.exit(1) # Get Cursor cursor = conn.cursor()
Create new table (do this just once)
# Create Table cursor.execute('CREATE TABLE IF NOT EXISTS patients3 (id INT AUTO_INCREMENT PRIMARY KEY, pat_num varchar(50), claim varchar(50), status varchar(50))')
for row in df.itertuples(): cursor.execute(""" INSERT INTO patients3 (pat_num, claim, status) VALUES (%s,%s,%s) """, (row.pat_num, row.claim, row.status, ) ) conn.commit()
Check your database table for new data
If you have access to your database server then you can create a database and user with these commands:
## Useful mariadb commands :
# sudo mysql -u root -p
# CREATE DATABASE exampledb;
# CREATE USER 'user3'@'localhost' IDENTIFIED BY 'redandgreen';
# DROP USER IF EXISTS 'user3'@'%';
# CREATE USER 'user3'@'%' IDENTIFIED BY 'redandgreen';
# GRANT ALL PRIVILEGES ON exampledb.* TO 'user3'@'%';
# FLUSH PRIVILEGES;
# The % allows remote connections
Query the SQL database and print the Patient Numbers
cur = conn.cursor() cur.execute( "SELECT pat_num FROM patients3" )
# Print results selection from SQL query for (pat_num) in cur: print(f"PatientNumber: {pat_num}")
PatientNumber: ('111222',) PatientNumber: ('111333',) PatientNumber: ('222444',) PatientNumber: ('3444555',) PatientNumber: ('4666777',) PatientNumber: ('111222',) PatientNumber: ('111333',)
Print results selection from SQL using Pandas with an SQL query
# Create SQL query for Pandas to run sql = """ SELECT * FROM patients3 """ df = pd.read_sql(sql, conn) df.head()
id | pat_num | claim | status | |
---|---|---|---|---|
0 | 1 | 111222 | 123 | accepted |
1 | 2 | 111333 | 234 | accepted |
2 | 3 | 222444 | 432 | rejected |
3 | 4 | 3444555 | 431 | rejected |
4 | 5 | 4666777 | 65 | accpted |
Summary
We have shown how to :
- Import a CSV into a pandas dataframe using Python
- Connect to a MariaDB (SQL) database
- INSERT the contents of the dataframe into a table
- SELECT values from the database using SQL and also using Pandas
Note : (If you don’t have access to the SQL database and the table you will need the database administrator to give you the permissions to it).