Show
author: Diego Fernandez Links:
OverviewA package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames. It enables you to easily pull data from Google spreadsheets into DataFrames as well as push data into spreadsheets from DataFrames. It leverages gspread in the backend for most of the heavylifting, but it has a lot of added functionality to handle things specific to working with DataFrames as well as some extra nice to have features. The target audience are Data Analysts and Data Scientists, but it can also be used by Data Engineers or anyone trying to automate workflows with Google Sheets and Pandas. Some key goals/features:
Installation / UsageTo install use pip:
Or clone the repo:
Before using, you will need to download Google client credentials for your app. Client CredentialsTo allow a script to use Google Drive API we need to authenticate our self towards Google. To do so, we need to create a project, describing the tool and generate credentials. Please use your web browser and go to Google console and :
Thanks to similar project df2gspread for this great description of how to get the client credentials. You can read more about it in the configuration docs including how to change the default behavior. Example
TroubleshootingEOFError in RodeoIf you’re trying to use gspread_pandas from within Rodeo you might get an EOFError: EOF when reading a line error when trying to pass in the verification code. The workaround for this is to first verify your account in a regular shell. Since you’re just doing this to get your Oauth token, the spreadsheet doesn’t need to be valid. Just run this in shell:
Then follow the instructions to create and store the OAuth creds. This action would increase the number of cells in the workbook above the limit of 10000000 cells.IMO, Google sheets is not the right tool for large datasets. However, there’s probably good reaons you might have to use it in such cases. When uploading a large DataFrame, you might run into this error. By default, Spread.df_to_sheet will add rows and/or columns needed to accomodate the DataFrame. Since a new sheet contains a fairly large number of columns, if you’re uploading a DF with lots of rows you might exceed the max number of cells in a worksheet even if your data does not. In order to fix this you have 2 options:
There’s a strange caveat with resizing, so going to 1x1 first is recommended (replace=True already does this). To read more see this issue Can Pandas read Google Sheets?The GSpread package makes it quick and easy to read Google Sheets spreadsheets from Google Drive and load them into Pandas dataframes.
Does Google Sheets work with Python?To work with Google Sheets in Python, first, we have to install gspread and oauth2client. In addition to that, we'll use Pandas to read our local data and update it to Google Sheets. Now let's import the libraries and connect to Google Sheets.
How do I write Pandas Dataframe in Google Sheets?Create a project and service account. ... . Create JSON credentials file. ... . Share your google sheet to the service account. ... . Access google sheet using libraries. ... . Write or Append the dataframe to Google Sheet.. Can I use Google Sheets as a database?Google Sheets is a cloud-based app with advanced capabilities of spreadsheets. To your knowledge, it can also be utilized as a database for websites or small applications. Most organizations use it instead of other heavily-priced databases such as PostgreSQL, MySQL, etc., for storing and managing data in real-time.
|