Skip to content

BraidenKirkland/TableToCSV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TableToCSV

A program that accepts the contents of an html file through standard input and sends the contents of the table(s) contained in the file to standard output. Relevant data is extracted using regular expressions. The extracted content of the tables is in .csv format.

Getting Started

  1. Clone the repository using git clone https://github.com/YOURUSERNAME/TableToCSV.git
  2. Run the program from the command line using the following format python table_to_csv.py < input.html > output.txt

Example

The code for the following tables was taken from https://www.w3schools.com/html/html_tables.asp. You can view this code in the example file called index.html which is located in the repository.

Company Contact Country
Alfreds Futterkiste Maria Anders Germany
Centro comercial Moctezuma Francisco Chang Mexico
Ernst Handel Roland Mendel Austria
Island Trading Helen Bennett UK
Laughing Bacchus Winecellars Yoshi Tannamuri Canada
Magazzini Alimentari Riuniti Giovanni Rovelli Italy
Firstname Lastname Age
Jill Smith 50
Eve Jackson 94

$ python table_to_csv.py < input.html > output.txt

The contents of output.txt are shown below. Notice that tables are labeled in the same order as they appear in the document.

TABLE 1:
Company,Contact,Country
Alfreds Futterkiste,Maria Anders,Germany
Centro comercial Moctezuma,Francisco Chang,Mexico
Ernst Handel,Roland Mendel,Austria
Island Trading,Helen Bennett,UK
Laughing Bacchus Winecellars,Yoshi Tannamuri,Canada
Magazzini Alimentari Riuniti,Giovanni Rovelli,Italy

TABLE 2:
Firstname,Lastname,Age
Jill,Smith,50
Eve,Jackson,94

Use

The purpose of this program is to extract the contents of html tables and put it into .csv format. Once the data is converted into .csv format, the user can use different programs to analyze this data. One program they could use to do this is the online analytical processing program (OLAP.py) located in the OnlineAnalyticalProcessing repository.

Restrictions

This program will not work if the tables make use of the rowspan or colspan html attributes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published