Python for Data Science – Importing XML to Pandas DataFrame

In my previous post, I showed how easy to import data from CSV, JSON, Excel files using Pandas package. Another popular format to exchange data is XML. Unfortunately Pandas package does not have a function to import data from XML so we need to use standard XML package and do some extra work to convert the data to Pandas DataFrames.

Here’s a sample XML file (save it as test.xml):

We want to convert his to a dataframe which contains customer name, email, phone and street:

As you can see, we need to read attribute of an XML tag (customer name), text value of sub elements (address/street), so although we will use a very simple method, it will show you how to parse even complex XML files using Python.

How to Import Data (to Oracle RDBMS) from XML

I wrote a blog post about how to import data from a XML file to Oracle about 7 years ago. I demonstrated how we can use DBMS_XMLSTORE package to parse XML data. Yesterday, I replied a question from a blog reader about that post. He had problem with running my sample script, I did troubleshoot and find out the problem with his script, and while testing the sample script, I noticed that I can write much better script.

This is my sample XML (employees.xml):

I created a table to load the data and a directory link so we can access the file:

How to Generate XML from the Oracle Database

Oracle provides plenty of XML generation methods, I’ll try to demonstrate two of them: SYS_XMLGEN and DBMS_XMLQUERY.

I’ll start with creating a simple table and fill it with sample data:

To be able to use SYS_XMLGEN, I’ll create an object to map our data:

How to Import Data from XML

One of my blog readers asked me how to import data from XML file. There are lots of ways to do it but I’ll show only a simple one by using DBMS_XMLSTORE.

We will need a simple XML file to import. So I created a folder as “/oracle/xmltest” and a text file named (sample.xml) in this folder. Oracle should be able to access this folder and file, so check OS permissions of them and be sure that they are readable by oracle user.

I entered the followed data into my sample xml file:

As you see, our employees have only 2 properties, “no” and “name”. So I’ll create a table to store these properties: