Categories
MySQL Pandas Python Code Raspberry Pi web scraping

Add data to a database from a CSV using Python, Pandas, and SQL

Do you have a CSV file and a database to transfer the data into?

The data needs to go into your database and you want to automate this?

For instance : you have been web scraping and extracted publicly available data and now need to store it in a database table?

This guide will show you a basic example of how to get the data from a CSV, convert it to a pandas dataframe, and insert it into your chosen database table.

# |r|e|d|a|n|d|g|r|e|e|n|.|c|o|.|u|k|

# CSV to PANDAS to SQL example

Import a CSV and convert it into a pandas dataframe

import pandas as pd

df = pd.read_csv(r’claims.csv‘)

df

pat_numclaimstatus
0111222123accepted
1111333234accepted
2222444432rejected
3444555431rejected
4666777655accpted
The example csv as a pandas dataframe

Connect to existing SQL database and check we can query some data

# Module Imports
import mariadb
import sys

# Connect to MariaDB Platform
try:
    conn = mariadb.connect(
        user="user2",
        password="password2",
        host="192.168.1.9",
        port=3306,
        database="newz"

    )
except mariadb.Error as e:
    print(f"Error connecting to MariaDB Platform: {e}")
    sys.exit(1)

# Get Cursor
cur = conn.cursor()
# Select the first 8 results as a check

cur.execute(
    "SELECT title,story FROM tnewz WHERE publication=? LIMIT 8", ("independent.co.uk",)
)
# Print results selection

for (title, publication) in cur:
    print(f"Title: {title}, Story: {publication}")
Title: We need to get back to normality, and fast – I appear to have fallen in love with my robot vacuum cleaner, Story: It is absurdly easy to anthropomorphise this creature, which spins around picking up fluff – especially in the absence of a puppy or a baby in the house, writes 
Title: ‘This ends now’: Biden demands Trump make national address to end Capitol siege, Story: President-elect condemns Capitol violence: 'This is not dissent. It's disorder. It's chaos. It borders on sedition’
Title: ‘We love you, you’re all very special’: Trump tells mob to ‘go home’ but repeats false claims about election, Story: 'I know your pain. I know you're hurt’
Title: Senate evacuates after sudden recess while certifying Biden’s win amid reports of shots fired on Capitol Hill, Story: Congress calls surprise recess as protesters storm Capitol building
Title: Capitol Hill riots: Dramatic videos show Trump supporters storming Capitol building, Story: Videos posted to social media from reporters and lawmakers depicted a chaotic, terrifying scene of pro-Trump rioters breaking into the Capitol building
Title: Capitol riots: The Simpsons eerily predicted incident in 1996, Story: Scene in question shows characters running up Capitol steps, firing guns
Title: Prince William tells children of NHS staff sacrifices ‘every day’ during pandemic, Story: 'We're making sure the children understand all of the sacrifices that all of you are making,' says Duke of Cambridge
Title: Sriwijaya Air flight SJ182: Boeing 737 loses contact in Indonesia, Story: The aircraft is a ‘classic’ 737 — two generations older than the Boeing 737 Max

Connect to new database and prepare to insert dataframe

# Connect to MariaDB Platform
try:
    conn = mariadb.connect(
        user="user3",
        password="redandgreen",
        host="192.168.1.9",
        port=3306,
        database="exampledb"

    )
except mariadb.Error as e:
    print(f"Error connecting to MariaDB Platform: {e}")
    sys.exit(1)

# Get Cursor
cursor = conn.cursor()

Create new table (do this just once)

# Create Table
cursor.execute('CREATE TABLE IF NOT EXISTS patients3 (id INT AUTO_INCREMENT PRIMARY KEY, pat_num varchar(50), claim varchar(50), status varchar(50))')
for row in df.itertuples():
    cursor.execute("""
                INSERT INTO patients3
                (pat_num, claim, status)
                VALUES 
                (%s,%s,%s)
                """,
                (row.pat_num, 
                row.claim,
                row.status,
                )
)
conn.commit()

Check your database table for new data

SQL-output

If you have access to your database server then you can create a database and user with these commands:

## Useful mariadb commands :

# sudo mysql -u root -p
# CREATE DATABASE exampledb;
# CREATE USER 'user3'@'localhost' IDENTIFIED BY 'redandgreen';

# DROP USER IF EXISTS 'user3'@'%';
# CREATE USER 'user3'@'%' IDENTIFIED BY 'redandgreen';
# GRANT ALL PRIVILEGES ON exampledb.* TO 'user3'@'%';
# FLUSH PRIVILEGES;

# The % allows remote connections 

Query the SQL database and print the Patient Numbers

cur = conn.cursor()
cur.execute(
    "SELECT pat_num FROM patients3"
)
# Print results selection from SQL query

for (pat_num) in cur:
    print(f"PatientNumber: {pat_num}")
PatientNumber: ('111222',)
PatientNumber: ('111333',)
PatientNumber: ('222444',)
PatientNumber: ('3444555',)
PatientNumber: ('4666777',)
PatientNumber: ('111222',)
PatientNumber: ('111333',)

Print results selection from SQL using Pandas with an SQL query

# Create SQL query for Pandas to run
sql = """

SELECT *
FROM patients3

"""
df = pd.read_sql(sql, conn)
df.head()
idpat_numclaimstatus
01111222123accepted
12111333234accepted
23222444432rejected
343444555431rejected
45466677765accpted
Data retrieved from the SQL table using Pandas “pd.read”

Summary

We have shown how to :

  • Import a CSV into a pandas dataframe using Python
  • Connect to a MariaDB (SQL) database
  • INSERT the contents of the dataframe into a table
  • SELECT values from the database using SQL and also using Pandas

Note : (If you don’t have access to the SQL database and the table you will need the database administrator to give you the permissions to it).

Leave a Reply