Sooner or later you will need to resort to using regular expressions if you are scraping text. This article will show you some useful Python (version 3) examples which you can use over and over again..
We’ll be using “re.search” and “group”
example 1 – parse a paragraph of text for the number of floors, so you need to find the word “floor” or equivalent, and then extract the number(s) directly before it.
example 2 – parse a paragraph of text and extract the vendor details, so locate the text that immediately follows, but no more…
Once in production you will need to ensure you have a copy of your data for many reasons which we can all identify with.
So, before getting too far into a web scraping project using Scrapy with MySQL let’s spare a moment to get familiar with some basic back up and restore commands.
This is about as simple as can be, so why not try them for yourself if you have MySQL installed, make a sample database, and create a back up, check it, and then delete the original and restore your back up!
First, inspect the server to check your database name(s) that you want to back up for future restores…
sudo mysql -u root -p
MariaDB [(none)]> show databases;
MariaDB [(none)]> exit;
Now run this command (using your own username and password that has access privileges to the database)
— MySQL dump 10.17 Distrib 10.3.25-MariaDB, for debian-linux-gnueabihf (armv7l)
— Host: localhost Database: newz
— Server version 10.3.25-MariaDB-0+deb10u1
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */; /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */; /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */; /*!40101 SET NAMES utf8mb4 */; /*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */; /*!40103 SET TIME_ZONE=’+00:00′ */; /*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */; /*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */; /*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE=’NO_AUTO_VALUE_ON_ZERO’ */; /*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
— Table structure for table tnewz
DROP TABLE IF EXISTS tnewz; /*!40101 SET @saved_cs_client = @@character_set_client */; /*!40101 SET character_set_client = utf8 */; CREATE TABLE tnewz ( id int(11) NOT NULL AUTO_INCREMENT, title varchar(255) NOT NULL, publication varchar(255) DEFAULT NULL, author varchar(50) DEFAULT NULL, story varchar(255) DEFAULT NULL, url varchar(255) DEFAULT NULL, posted timestamp NOT NULL DEFAULT current_timestamp() ON UPDATE current_timestamp(), PRIMARY KEY (id) ) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8mb4; /*!40101 SET character_set_client = @saved_cs_client */;
— Dumping data for table tnewz
LOCK TABLES tnewz WRITE; /*!40000 ALTER TABLE tnewz DISABLE KEYS */; INSERT INTO tnewz VALUES (1,’title’,NULL,’author’,’story’,’/news/uk/1377686/tier-shake-up-boris-johnson-meeting-tier-5-school-closures-reopening-coronavirus-update’,’2020-12-29 15:33:34′),(2,’title’,NULL,’author’,’story’,’/news/politics/1377709/nicola-sturgeon-scotland-lockdown-update-announcement-boris-johnson-tiers-tier-5′,’2020-12-29 15:33:3
To Restore the backup of your Database
pi@pi4:~ $ mysql -u user1 -p newz < newzdump.sql
We have looked at how to back up and restore a MySQL database using:
mysqldump -u user1 -p newz > newzdump.sql
mysql -u user1 -p newz < newzdump.sql
If you want to transfer a copy of your database to a remote location then you can use “rsync” or “scp”, or if you are doing from a home pc you could always use “winscp”.