d6tstack package

Submodules

d6tstack.combine_csv module

class d6tstack.combine_csv.CombinerCSV(fname_list, sep=', ', nrows_preview=3, chunksize=1000000.0, read_csv_params=None, columns_select=None, columns_select_common=False, columns_rename=None, add_filename=True, apply_after_read=None, log=True, logger=None)[source]

Bases: object

Core combiner class. Sniffs columns, generates preview, combines aka stacks to various output formats.

Parameters:
  • fname_list (list) – file names, eg [‘a.csv’,’b.csv’]
  • sep (string) – CSV delimiter, see pandas.read_csv()
  • has_header (boolean) – data has header row
  • nrows_preview (int) – number of rows in preview
  • chunksize (int) – number of rows to read into memory while processing, see pandas.read_csv()
  • read_csv_params (dict) – additional parameters to pass to pandas.read_csv()
  • columns_select (list) – list of column names to keep
  • columns_select_common (bool) – keep only common columns. Use this instead of columns_select
  • columns_rename (dict) – dict of columns to rename `{‘name_old’:’name_new’}
  • add_filename (bool) – add filename column to output data frame. If False, will not add column.
  • apply_after_read (function) – function to apply after reading each file. needs to return a dataframe
  • log (bool) – send logs to logger
  • logger (object) – logger object with send_log()
columns(*args, **kwargs)
columns_common(*args, **kwargs)
columns_unique(*args, **kwargs)
combine_preview(*args, **kwargs)
get_sniff_results(*args, **kwargs)
head(*args, **kwargs)
is_all_equal(*args, **kwargs)
is_column_present(*args, **kwargs)
is_column_present_common(*args, **kwargs)
is_column_present_unique(*args, **kwargs)
preview_rename(*args, **kwargs)
preview_select(*args, **kwargs)
sniff_columns(*args, **kwargs)
to_csv_align(*args, **kwargs)
to_csv_combine(*args, **kwargs)
to_csv_head(*args, **kwargs)
to_mssql_combine(*args, **kwargs)
to_mysql_combine(*args, **kwargs)
to_pandas(*args, **kwargs)
to_parquet_align(*args, **kwargs)
to_parquet_combine(*args, **kwargs)
to_psql_combine(*args, **kwargs)
to_sql_combine(*args, **kwargs)

d6tstack.convert_xls module

class d6tstack.convert_xls.XLSSniffer(fname_list, logger=None)[source]

Bases: object

Extracts available sheets from MULTIPLE Excel files and runs diagnostics

Parameters:
  • fname_list (list) – file paths, eg [‘dir/a.csv’,’dir/b.csv’]
  • logger (object) – logger object with send_log(), optional
all_contain_sheetname(*args, **kwargs)
all_have_idx(*args, **kwargs)
all_same_count(*args, **kwargs)
all_same_names(*args, **kwargs)
sniff(*args, **kwargs)
class d6tstack.convert_xls.XLStoBase(if_exists='skip', output_dir=None, logger=None)[source]

Bases: object

convert_single(*args, **kwargs)
class d6tstack.convert_xls.XLStoCSVMultiFile(fname_list, cfg_xls_sheets_sel_mode='idx_global', cfg_xls_sheets_sel=0, output_dir=None, if_exists='skip', logger=None)[source]

Bases: d6tstack.convert_xls.XLStoBase

Converts xls|xlsx files to csv files. Selects a SINGLE SHEET from each file. To extract MULTIPLE SHEETS from a file use XLStoCSVMultiSheet

Parameters:
  • fname_list (list) – file paths, eg [‘dir/a.csv’,’dir/b.csv’]
  • cfg_xls_sheets_sel_mode (string) –

    mode to select tabs

    • name: select by name, provide name for each file, can customize by file
    • name_global: select by name, one name for all files
    • idx: select by index, provide index for each file, can customize by file
    • idx_global: select by index, one index for all files
  • cfg_xls_sheets_sel (dict) – values to select tabs {‘filename’:’value’}
  • output_dir (str) – If present, file is saved in given directory, optional
  • if_exists (str) – Possible values: skip and replace, default: skip, optional
  • logger (object) – logger object with send_log(‘msg’,’status’), optional
convert_all(*args, **kwargs)
set_files(*args, **kwargs)
set_select_mode(*args, **kwargs)
class d6tstack.convert_xls.XLStoCSVMultiSheet(fname, sheet_names=None, output_dir=None, if_exists='skip', logger=None)[source]

Bases: d6tstack.convert_xls.XLStoBase

Converts ALL SHEETS from a SINGLE xls|xlsx files to separate csv files

Parameters:
  • fname (string) – file path
  • sheet_names (list) – list of int or str. If not given, will convert all sheets in the file
  • output_dir (str) – If present, file is saved in given directory, optional
  • if_exists (str) – Possible values: skip and replace, default: skip, optional
  • logger (object) – logger object with send_log(‘msg’,’status’), optional
convert_all(*args, **kwargs)
convert_single(*args, **kwargs)
d6tstack.convert_xls.read_excel_advanced(fname, remove_blank_cols=True, remove_blank_rows=True, collapse_header=True, header_xls_range=None, header_xls_start=None, header_xls_end=None, is_preview=False, nrows_preview=3, **kwds)[source]

Read Excel files to pandas dataframe with advanced options like set header ranges and remove blank columns and rows

Parameters:
  • fname (str) – Excel file path
  • remove_blank_cols (bool) – remove blank columns
  • remove_blank_rows (bool) – remove blank rows
  • collapse_header (bool) – to convert multiline header to a single line string
  • header_xls_range (string) – range of headers in excel, eg: A4:B16
  • header_xls_start (string) – Starting cell of excel for header range, eg: A4
  • header_xls_end (string) – End cell of excel for header range, eg: B16
  • is_preview (bool) – Read only first nrows_preview lines
  • nrows_preview (integer) – Initial number of rows to be used for preview columns (default: 3)
  • kwds (mixed) – parameters for pandas.read_excel() to pass through
Returns:

pandas dataframe

Return type:

df (dataframe)

Note

You can pass in any pandas.read_excel() parameters in particular sheet_name

d6tstack.helpers module

Module with several helper functions

d6tstack.helpers.check_valid_xls(fname_list)[source]
d6tstack.helpers.cols_filename_tofront(_list)[source]
d6tstack.helpers.columns_all_equal(col_list)[source]

Checks that all lists in col_list are equal.

Parameters:col_list (list) – columns, eg [[‘a’,’b’],[‘a’,’b’,’c’]]
Returns:all lists in list are equal?
Return type:bool
d6tstack.helpers.compare_pandas_versions(version1, version2)[source]
d6tstack.helpers.df_filename_tofront(dfg)[source]
d6tstack.helpers.file_extensions_all_equal(ext_list)[source]

Checks that all file extensions are equal.

Parameters:ext_list (list) – file extensions, eg [‘.csv’,’.csv’]
Returns:all extensions are equal to first extension in list?
Return type:bool
d6tstack.helpers.file_extensions_contains_csv(ext_list)[source]
d6tstack.helpers.file_extensions_contains_xls(ext_list)[source]
d6tstack.helpers.file_extensions_contains_xlsx(ext_list)[source]
d6tstack.helpers.file_extensions_get(fname_list)[source]

Returns file extensions in list

Parameters:fname_list (list) – file names, eg [‘a.csv’,’b.csv’]
Returns:file extensions for each file name in input list, eg [‘.csv’,’.csv’]
Return type:list
d6tstack.helpers.file_extensions_valid(ext_list)[source]

Checks if file list contains only valid files

Notes

Assumes all file extensions are equal! Only checks first file

Parameters:ext_list (list) – file extensions, eg [‘.csv’,’.csv’]
Returns:first element in list is one of [‘.csv’,’.txt’,’.xls’,’.xlsx’]?
Return type:bool
d6tstack.helpers.list_common(_list, sort=True)[source]
d6tstack.helpers.list_tofront(_list, val)[source]
d6tstack.helpers.list_unique(_list, sort=True)[source]

d6tstack.helpers_ui module

d6tstack.sniffer module

Finds CSV settings and Excel sheets in multiple files. Often needed as input for stacking

class d6tstack.sniffer.CSVSniffer(fname, nlines=10, delims=', ;t|')[source]

Bases: object

Automatically detects settings needed to read csv files. SINGLE file only, for MULTI file use CSVSnifferList

Parameters:
  • fname (string) – file path
  • nlines (int) – number of lines to sample from each file
  • delims (string) – possible delimiters, default “,; |
check_column_length_consistent(*args, **kwargs)
count_skiprows(*args, **kwargs)
get_delim(*args, **kwargs)
has_header(*args, **kwargs)
has_header_inverse(*args, **kwargs)
read_nlines(*args, **kwargs)
scan_delim(*args, **kwargs)
class d6tstack.sniffer.CSVSnifferList(fname_list, nlines=10, delims=', ;t|')[source]

Bases: object

Automatically detects settings needed to read csv files. MULTI file use

Parameters:
  • fname_list (list) – file names, eg [‘a.csv’,’b.csv’]
  • nlines (int) – number of lines to sample from each file
  • delims (string) – possible delimiters, default ‘,; |
count_skiprows(*args, **kwargs)
get_all(*args, **kwargs)
get_delim(*args, **kwargs)
has_header(*args, **kwargs)
d6tstack.sniffer.csv_count_rows(fname)[source]
d6tstack.sniffer.sniff_settings_csv(fname_list)[source]

d6tstack.sync module

d6tstack.utils module

class d6tstack.utils.PrintLogger[source]

Bases: object

send(data)[source]
send_log(msg, status)[source]

Module contents