Hi, I want to know what is the best way to keep the databases I use in different projects? I use a lot of CSVs that I need to prepare every time I'm working with them (I just copy paste the code from other projects) but would like to make some module that I can import and it have all the processes of the databases for example for this database I usually do columns = [(configuration of, my columns)], names = [names], dates = [list of columns dates], dtypes ={column: type},

then database_1 = pd.read_fwf(**kwargs), database_2 = pd.read_fwf(**kwargs), database_3 = pd.read_fwf(**kwargs)...

Then database = pd.concat([database_1...])

But I would like to have a module that I could import and have all my databases and configuration of ETL in it so I could just do something like 'database = my_module.dabase' to import the database, without all that process everytime.

Thanks for any help.

you are viewing a single comment's thread
view the rest of the comments

[–] originalfrozenbanana@lemm.ee 3 points 7 months ago (1 children)

This sounds kind of like a data warehouse. Depending on the size of the data and number of connections I’d say script or database or module, this is a much bigger problem. Look into dbt (data build tool) and airflow

[–] driving_crooner 2 points 7 months ago (1 children)

I have a Datawerehouse some of the dabases I got come from there, but can only be accessed in the virtual machine.

[–] odium@programming.dev 2 points 7 months ago

I would say consider having a script that combines all these sources into a single data mart for your monthly reports. Could also be useful for the ad hoc studies, but idk how much of the same fields you're using for these studies.