Python for Data Science – Importing table data from a web page

This is another blog post about using Pandas package. This time, I’ll show you how to import table data from a web page. To be able to get table data, there should be a table defined with table tags (table,td,tr) in the web page we access. Unfortunately most web sites do not use “tables” anymore. They usually prefer to use “div” tags, so if this code doesn’t work, check HTML source code of the page.

For testing purposes, I’ll try to fetch exchange rates from CNN Money International web site. There are two tables in the page, one for the exchange rates and one for the world markets.

Python code is very simple:

I examined the HTML code of the page and see that these tables have different IDs. The ID of the exchange rates table is “wsod_currencyExhangeRatesTable”. I use this ID to fetch only the exchange rates table:

The read_html function returns a list of DataFrames even there’s only one table. We need to use indexes (i.e. df_list[0]) to access the first table.

You probably noticed that the last column contains both min and max values and it could be better to extract these data into separate columns. Here’s the script:

and the output:

So we successfully fetched the table data and parsed it from a web site. Did you see how easy to manipulate columns of Pandas DataFrames? See you next blog post!

Please share this post Share on Facebook3Share on Google+0Share on LinkedIn76Share on Reddit0Tweet about this on Twitter

Gokhan Atil is a database administrator who has hands-on experience with both RDBMS and noSQL databases (Oracle, PostgreSQL, Microsoft SQL Server, Sybase IQ, MySQL, Cassandra, MongoDB and ElasticSearch), and strong background on software development. He is certified as Oracle Certified Professional (OCP) and is awarded as Oracle ACE (in 2011) and Oracle ACE Director (in 2016) for his continuous contributions to the Oracle users community.

Leave Comment

Your email address will not be published. Required fields are marked *