Skip to content

Excel

Excel Module for dost. This module contains functions for working with excel and spreadsheets

Examples:

>>> excel.get_row_column_count(df=df)
    (10, 5)
>>> excel.get_single_cell(df=df,column_name="Column1", cell_number=1)
    "abc"
>>> excel.excel_create_file(output_folder="C:\Users\user\Desktop", output_filename="test.xlsx", output_sheetname="Sheet1")

This module contains the following functions:

  • authenticate_google_spreadsheet(credential_file_path): This creates authentication object for google spreadsheet.
  • get_dataframe_from_google_spreadsheet(auth, spreadsheet_url, sheet_name): Get dataframe from google spreadsheet.
  • tabular_data_from_website(website_url, table_number): Get tabular data from website.
  • upload_dataframe_to_google_spreadsheet(auth, spreadsheet_url, sheet_name, df): Upload dataframe to google spreadsheet.
  • create_file(output_folder, output_filename, output_sheetname): Create excel file.
  • to_dataframe(input_filepath, input_sheetname, header): Convert excel file to dataframe.
  • get_row_column_count(df): Get row and column count of dataframe.
  • dataframe_to_excel(df, output_folder, output_filename, output_sheetname, mode): Convert dataframe to excel file.
  • set_single_cell(df, column_name, cell_number, value): Set single cell value in excel file.
  • get_single_cell(df, column_name, cell_number, header): Get single cell value from excel file.
  • get_all_header_columns(df): Get all header columns from excel file.
  • get_all_sheet_names(input_filepath): Get all sheet names from excel file.
  • drop_columns(df, cols): Drop columns from data frame.
  • clear_sheet(df): Clear sheet from excel file.
  • remove_duplicates(df, column_name): Remove duplicates from excel file.
  • isNaN(value): Check if value is NaN.
  • df_from_list(list_of_lists, column_names): Create dataframe from list of lists.
  • df_from_string(df_string, word_delimiter, line_delimiter, column_names): Create dataframe from string.
  • df_extract_sub_df(df, row_start, row_end, column_start, column_end): Extract sub dataframe from dataframe.
  • set_value_in_df(df, row_number, column_number, value): Set value in dataframe.
  • get_value_in_df(df, row_number, column_number): Get value from dataframe.
  • df_drop_rows(df, row_start, row_end): Drop rows from dataframe.

authenticate_google_spreadsheet(credential_file_path)

Creates authentication object for google spreadsheet.

Parameters:

Name Type Description Default
credential_file_path str || WindowsPath

Credential file path.

required

Returns:

Name Type Description
object object

Authentication object.

Examples:

>>> auth = excel.authenticate_google_spreadsheet(credential_file_path="C:\Users\user\Desktop\credentials.json")

clear_sheet(df)

Clears the sheet

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required

Returns:

Name Type Description
df pd.DataFrame

Modified dataframe

Examples:

>>> excel.clear_sheet(df=df)
df

create_file(output_folder, output_filename, output_sheetname='Sheet1')

Creates an excel file with a sheet in the specified folder.

Parameters:

Name Type Description Default
output_folder str || WindowsPath

Output folder path.

required
output_filename str

Output file name.

required
output_sheetname str || list

Output sheet name. Defaults to "Sheet1".

'Sheet1'

Examples:

>>> excel.create_file(output_folder="C:\Users\user\Desktop", output_filename="test.xlsx", output_sheetname="Sheet1")

dataframe_to_excel(df, output_folder, output_filename, output_sheetname='Sheet1', mode='a')

Converts the dataframe to excel file

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required
output_folder str || WindowsPath

Output folder path.

required
output_filename str

Output file name.

required
output_sheetname str

Output sheet name. Defaults to "Sheet1".

'Sheet1'
mode str

Mode of the excel file. Defaults to 'a'.

'a'

Examples:

>>> excel.dataframe_to_excel(df=df, output_folder="C:\Users\user\Desktop", output_filename="test.xlsx", output_sheetname="Sheet1", mode='a')

df_drop_rows(df, row_start, row_end)

Drops rows from dataframe

Parameters:

Name Type Description Default
df pd.DataFrame

dataframe

required
row_start int

row start (inclusive)

required
row_end int

row end (exclusive)

required

Returns:

Name Type Description
df pd.DataFrame

dataframe with rows dropped

Examples:

>>> df = excel.df_drop_rows(df=df, row_start=1, row_end=2)

df_extract_sub_df(df, row_start, row_end, column_start, column_end)

Extracts sub dataframe from the given dataframe

Parameters:

Name Type Description Default
df pd.DataFrame

dataframe

required
row_start int

row start (inclusive)

required
row_end int

row end (exclusive)

required
column_start int

column start (inclusive)

required
column_end int

column end (exclusive)

required

Returns:

Name Type Description
df pd.DataFrame

sub dataframe

Examples:

>>> excel.df_extract_sub_df(df=df,row_start= 1, row_end=2, column_start=3, column_end=4)
sub_dataframe

df_from_list(list_of_lists, column_names)

Converts list of lists to dataframe

Parameters:

Name Type Description Default
list_of_lists list

list of lists to be converted to dataframe

required
column_names list

column names

required

Returns:

Name Type Description
df pd.DataFrame

dataframe

Examples:

>>> excel.df_from_list(list_of_lists=[[1,2,3],[4,5,6]], column_names=["col1", "col2", "col3"])
dataframe
..   col1  col2  col3
0     1     2     3
1     4     5     6

df_from_string(df_string, word_delimiter=' ', line_delimiter='\n', column_names=None)

Converts string to dataframe

Parameters:

Name Type Description Default
df_string str

string to be converted to dataframe

required
word_delimiter str

word delimiter.Defaults to space

' '
line_delimiter str

line delimiter. Defaults to new line

'\n'
column_names list

column names. Defaults to None

None

Returns:

Name Type Description
df pd.DataFrame

dataframe

Examples:

>>> print(excel.df_from_string(df_string="a b c;d e f",word_delimiter=" ",line_delimiter= ";",column_names= ["Column 1","Column 2","Column 3"]))
dataframe 
..   Column 1  Column 2  Column 3
0     a     b     c
1     d     e     f

drop_columns(df, cols)

Drops the columns from the excel file

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required
cols str || list

Column name to be dropped.

required

Returns:

Name Type Description
df pd.DataFrame

Modified dataframe

Examples:

>>> excel.drop_columns(df=df, cols="column_name")
df

get_all_header_columns(df)

Gets all header columns from the excel file

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required

Returns:

Name Type Description
data list

List of header columns

Examples:

>>> excel.get_all_header_columns(df=df)
["Column1", "Column2"]

get_all_sheet_names(input_filepath)

Gets the sheet names from the excel file

Parameters:

Name Type Description Default
input_filepath str || Windowspath

Path of the excel file.

required

Returns:

Name Type Description
data list

List of sheet names

Examples:

>>> excel.get_all_sheet_names(input_filepath="demo")
["Sheet1", "Sheet2"]

get_dataframe_from_google_spreadsheet(auth, spreadsheet_url, sheet_name='Sheet1')

Get dataframe from google spreadsheet

Parameters:

Name Type Description Default
auth object

Authentication object.

required
spreadsheet_url str

Spreadsheet URL.

required
sheet_name str

Sheet name.

'Sheet1'

Returns:

Name Type Description
df pd.DataFrame

Dataframe object.

Examples:

>>> excel.get_dataframe_from_google_spreadsheet(auth=auth,spreadsheet_url="https://docs.google.com/spreadsheets/d/1X2X3X4X5X6X7X8X9X/edit#gid=0", sheet_name="Sheet1")
df

get_row_column_count(df)

Returns the row and column count of the dataframe

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required

Returns:

Name Type Description
tuple tuple

Row and column count of the dataframe.

Examples:

>>> excel.get_row_column_count(df=df)
(10, 5)

get_single_cell(df, column_name, cell_number, header=1)

Gets the text from the desired column/cell number for the given excel file

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required
column_name str

Column name of the excel file. Defaults to "".

required
cell_number int

Cell number of the excel file. Defaults to 1.

required
header int

Header row number. Defaults to 1.

1

Returns:

Name Type Description
data str

Text from the desired column/cell number for the given excel file

Examples:

>>> excel.get_single_cell(df=df, column_name="Column 1",cell_number= 1)
"abc"

get_value_in_df(df, row_number, column_number)

Gets value from dataframe

Parameters:

Name Type Description Default
df pd.DataFrame

dataframe

required
row_number int

Row number of the cell

required
column_number int

Column number of the cell

required

Returns:

Name Type Description
str str

value in the cell

Examples:

>>> excel.get_value_in_df(df=df, row_number=1, column_number=2)
abc

isNaN(value)

Checks if the value is NaN

Parameters:

Name Type Description Default
value str of number

value to be checked

required

Returns:

Name Type Description
bool bool

True if value is NaN, False otherwise

Examples:

>>> excel.isNaN(value="abc")
False

remove_duplicates(df, column_name)

Removes the duplicates from the given column

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required
column_name str || int || list

Column name of the excel file..

required

Returns:

Name Type Description
df pd.DataFrame

Modified dataframe

Examples:

>>> excel.remove_duplicates(df=df, column_name="column 1")
df

set_single_cell(df, column_name, cell_number, value)

Description

Writes the given text to the desired column/cell number for the given excel file

Parameters:

Name Type Description Default
df pd.DataFrame

Dataframe of the excel file.

required
column_name str

Column name of the excel file. Defaults to "".

required
cell_number int

Cell number of the excel file. Defaults to 1.

required
value str

Text to be written to the excel file. Defaults to "".

required

Returns:

Name Type Description
df pd.DataFrame

Modified dataframe

Examples:

>>> df=excel.set_single_cell(df=df, column_name="Column 1",cell_number= 1, value="abc")
df

set_value_in_df(df, row_number, column_number, value)

Sets value in dataframe

Parameters:

Name Type Description Default
df pd.DataFrame

dataframe to be modified

required
row_number int

Row number of the cell

required
column_number int

Column number of the cell

required
value str

value to be set in the cell

required

Returns:

Type Description
pd.DataFrame

pndas dataframe (pd.DataFrame): dataframe with value set

Examples:

>>> excel.set_value_in_df(df=df, row_number=1, column_number=2, value="abc")
modified_dataframe

tabular_data_from_website(website_url, table_number=1)

Returns a dataframe from a website table.

Parameters:

Name Type Description Default
website_url str

Website URL.

required
table_number int

Table number. Defaults to 1.

1

Returns: pd.DataFrame: Table from the website.

Examples:

>>> excel.tabular_data_from_website(website_url="https://en.wikipedia.org/wiki/Wiki")

to_dataframe(input_filepath, input_sheetname, header=1)

Converts excel file to dataframe.

Parameters:

Name Type Description Default
input_filepath str || WindowsPath

Input file path.

required
input_sheetname str

Input sheet name.

required
header int

Header row number. Defaults to 1.

1

Returns:

Name Type Description
df pd.DataFrame

Dataframe of the excel file.

Examples:

>>> excel.to_dataframe(input_filepath="C:\Users\user\Desktop\test.xlsx", input_sheetname="Sheet1")
dataframe

upload_dataframe_to_google_spreadsheet(auth, spreadsheet_url, sheet_name, df)

Uploads a dataframe to a google spreadsheet.

Parameters:

Name Type Description Default
auth object

Authentication object.

required
spreadsheet_url str

Spreadsheet URL.

required
sheet_name str

Sheet name.

required
df pd.DataFrame

Dataframe object.

required

Examples:

>>> excel.upload_dataframe_to_google_spreadsheet(auth=auth, spreadsheet_url="https://docs.google.com/spreadsheets/d/1X2X3X4X5X6X7X8X9X/edit#gid=0", sheet_name="Sheet1", df=df)