GPT Prompts for Data Teams
Most of the prompts below are real life prompts suggested by readers, used by our team internally at Castor, found on Reddit, or gathered from conversations at data events. Not all of them will be relevant to you but the objective is to inspire people to try ChatGPT to drive productivity.
How to use this GPT Prompts guide?
- Explore the sections that are interesting to you.
- Do not Copy/Paste any proprietary data in ChatGPT. This can be detrimental for your company. Always generate fake data or make sure what you are sending is non-sensitive
- GPT4 will give you way better experience with coding scripts than GPT3. Still not perfect yet but will keep improving over time.
- Customize GPT to yourself before asking him anything. Write him a quick 10 lines presentation about you, what you care about & what your goals are. This will increase drastically the output of the following prompts.
- You might always need to tweak the code ChatGPT gives back but gets you 90% in the right direction.
Want to help?
➡️ Give feedback in the chat, in the bottom right corner.
➡️ Share it so more data teams can increase their productivity with ChatGPT
GPT Prompts for Data Engineering
Prompt:
"I want you to act as a fake data generator. I need a dataset that has [x] rows and [y] columns: [insert column names]”
Example:
Prompt:
"Please help me generate sample data for the following SQL DDL table definition:
SQL DDL:[Provide your SQL DDL table definition, including table name, column names, and data types]
Based on the table definition, please generate a set of somewhat realistic sample data that can be used for testing and mock data generation. Ensure that the sample data is consistent with the meaning of the column names and adheres to the specified data types."
Prompt:
I want you to act as a data engineer and code in python for me. I have a two datasets A and B. A is [explain A structure]. B is [explain B structure]. I need to join them on a foreign key [enter FK].
Example:
Prompt:
"Act as a senior data engineer & provide a Python code sample demonstrating data engineering best practices to move data from a CSV file to BigQuery. Use the standard library when possible, but feel free to use external libraries if they significantly improve the process.”
Example:
I had a list of immediate dependencies for jobs I wanted to reverse to find upstream sources
Prompt:
"In python I have a dependency tree in a dict. Write a script to invert that dependency tree."
Prompt:
"I'm working on a data pipeline using Apache Airflow, and I need to create a DAG that performs the following tasks in sequence:
- Extract data from an API and save it to a CSV file.
- Load the CSV data into a PostgreSQL database.
- Run an SQL query on the database to aggregate the data and generate a report.
- Email the report to a list of recipients.
Can you help me write an Airflow DAG that accomplishes these tasks? Please include comments explaining each part of the DAG, and assume that I have the necessary Python functions to perform the data extraction, loading, querying, and emailing tasks
"
Prompt:
"Help me solve this regex problem: I need to create a regular expression pattern that matches [specific requirement]. Can you provide a regex pattern and explain how it works?”
Prompt:
"Please help me identify any issues or potential problems in the following SQL code:
[Insert SQL Code]
Analyze the provided SQL code and point out any syntax errors, logical issues, performance concerns, or best practice violations that may be present. Additionally, suggest possible improvements or fixes for the identified problems."
Prompt:
I want you to act as a data engineer and code in python for me. I have a two datasets A and B. A is [explain A structure]. B is [explain B structure]. I need to join them on a foreign key [enter FK].
Example:
Prompt:
"Prompt: I want you to act as a coder and write SQL code for [DBMS 1]. What is the equivalent of [DBMS 2]'s DATE_TRUNC for MySQL?
Prompt:
"How to make Hive table persist in pyspark”
Example:
Prompt:
"Please help me identify any issues or potential problems in the following SQL code:
[Insert SQL Code]
Analyze the provided SQL code and point out any syntax errors, logical issues, performance concerns, or best practice violations that may be present. Additionally, suggest possible improvements or fixes for the identified problems."
Prompt:
"Please help me create an AWS CloudFormation stack using the AWS Cloud Development Kit (CDK) in Python:
AWS Services to Include: [List the AWS services you want to include in the stack, e.g., EC2, S3, Lambda, RDS, etc.]
Stack Requirements: [Provide specific requirements for the stack, such as the desired instance types, number of instances, storage capacities, or any other configuration details]
Please provide step-by-step instructions and Python code for creating the CloudFormation stack using the AWS CDK, along with any necessary prerequisites, imports, and dependencies. Additionally, include any tips or best practices for working with the AWS CDK and CloudFormation in Python."
Prompt:
"I want you to act as a software developer. Please help me catch edge cases for this function [insert function]”
Prompt:
"I want you to act as a software developer. Please help me catch data quality tests for this data pipeline [insert code]"
Prompt 1:
"I have a SQL query that I'd like to optimize. Here's the query:SELECT * FROM ordersJOIN customers ON orders.customer_id = customers.customer_idWHERE customers.country = 'USA';
Can you help me identify any potential performance issues with this query?”
Prompt 2:
"Thank you for the feedback. I've checked and there are no indexes on the 'customer_id' column in both the 'orders' and 'customers' tables, and there's no index on the 'country' column in the 'customers' table. Should I create any indexes to improve the query's performance? If so, which columns should I index?"
Prompt 3:
"I see, that makes sense. I also noticed that I'm using 'SELECT *' in the query, which selects all columns from both tables. However, I only need a few specific columns from each table. How should I modify the query to select only the columns I need, and will this improve performance?"
Prompt 4:
"Thanks for the advice. I'm also concerned about the number of rows returned by the query. It could potentially return a large number of rows. Is there a way to paginate the results so that I only retrieve a limited number of rows at a time? How can I implement pagination in the query?"
Prompt:
"I'm working with Apache Spark to process large datasets, and I'm looking for ways to optimize the performance of my Spark jobs. Specifically, I'm interested in improving the execution time, reducing the memory footprint, and minimizing data shuffling. Can you provide me with some practical ideas and best practices for optimizing Spark jobs? Additionally, if you have any tips for tuning Spark configurations, I'd appreciate hearing them."
Prompt:
"I want you to act as a code optimizer. Can you point out what's wrong with the following Pandas code and optimize it? [Insert code here]"
Prompt:
"I want you to act as a software developer. Please help me improve the time complexity of the code below. [Insert code]"
Prompt:
"I want you to act as a code optimizer. The code is poorly written. How do I correct it? [Insert code here]"
Prompt:
"I want you to act as a python code simplifier. Can you simplify the following code?"
Prompt:
"Assuming you are a data engineer who has optimized data pipeline processes, Code performance or SQL query output.The output is the following: [provide ROI & metrics about the optimisation] Provide a non-technical explanation highlighting the importance and benefits of these optimizations for business stakeholders, and how it can contribute to the overall success of the company.
Structure the output in 3 bullet points and less than 150 words. Keep it data driven."
Example:
GPT Prompts for Data Governance
Prompt:
"Assuming a team has no existing data governance framework in place, provide a step-by-step guide on how to implement data governance from scratch, prioritizing the most important aspects first.”
Examples:
Prompt:
"Act a data governance leader. You work in a company doing [industry] . Data is strategic in your company for [X, Y, Z reasons]. Your data governance practice has [number of years]. You have already succesfully implemented [Project 1, 2, 3]. You need to define data governance goals for the next quarter. You want to impact [Strategic Project A & B]"
Prompt:
"Act as a security engineer from Snowflake. You want to write the Access Control Privileges for your company. Here’s the role & access levels I want to create: [Role 1: System Access 1, Schema Access 1, Object Access 1Role 2: System Access 2, Schema Access 2, Object Access 2Role 3: System Access 3, Schema Access 3, Object Access 3]"
Examples:
Prompt:
"List data governance books to read"
Examples:
Prompt:
"Can you give me an in-depth summary of the following book on data governance? I am already familiar to the data governance world.
[Insert Book Title & writer]"
Examples:
Prompt:
"design a data governance strategy for [Add your industry] to [add context & use case] based on the principles in this book: [add book details]"
Prompt:
"I want you to act as a code analyzer. Can you improve the following code for readability and maintainability? [Insert code]”
Prompt:
"Here’s a table: [insert table sample] Can you write data quality tests in SQL/python to make sure the output is consistent. Flag nulls & duplicates."
Prompt:
"Please describe the key data quality standards you would like to establish within your company. Consider including aspects such as accuracy, completeness, consistency, timeliness, and uniqueness. For each standard, provide a brief explanation and suggest appropriate metrics or methods to measure and ensure compliance. Additionally, mention any specific industry regulations or requirements that need to be adhered to.
1. Standard Name (e.g., Accuracy)
- Explanation: Briefly explain the importance of this standard.
- Measurement/Compliance Method: How will you measure and ensure compliance with this standard?
- Industry Requirements (if applicable): Any specific industry regulations to be considered.
2. Standard Name (e.g., Completeness):
- Explanation: Briefly explain the importance of this standard.
- Measurement/Compliance Method: How will you measure and ensure compliance with this standard?
- Industry Requirements (if applicable): Any specific industry regulations to be considered.
[Add more standards as necessary]"
Prompt:
"As a data governance expert, I am tasked with creating a training session for my company's employees on data quality best practices. The goal of this training is to educate employees on the importance of data quality, common data quality issues, and best practices for ensuring high-quality data. Please provide an outline for the training session, including key topics and explanations for each section. Make sure to cover the following areas:
- Introduction to data quality
- [add common data quality issues and their impact at your company]
- [best practices for data quality management at your company]
- Practical tips for maintaining data quality
- Conclusion and next steps"
Prompt:
"Please provide a summary of the [X compliance standard] and create a prioritized checklist to help organizations ensure their adherence to the requirements of this standard.Provide the answer in a table”
Example:
Prompt:
[insert output of prompt above]
Can you list all the columns that contains personal information?
Prompt:
"As an AI expert in data security, I am seeking advice on the best methods to encrypt data. My goal is to ensure the confidentiality and integrity of sensitive information. Please provide a list of recommended encryption methods, along with brief descriptions of each method and their use cases. Additionally, if there are any Python libraries that can be used to implement these encryption methods, please mention them as well."
Prompt:
Here’re our data governance policies:
[insert policies] Can you answer all the following questions based on what is written in this policy?
Prompt:
"Generate business tags for a table named: [table name]. With the following columns: [columns name] . The query used to create the table: [insert query]. And for non-sensitive tables, you can add a data sample: [data sample].”
Example:
Prompt:
Organize & regroup this list of data tables by theme and business tags:[List Tables]
Copy/Paste Jira Ticket
Prompt:
Can you write a memo to summarize the issue in this ticket? Please structure the answer in the following format.
[Your Name][Your Title/Position][Your Department][Date]
TO: [Recipient Name(s)]CC: [Optional - Other Relevant Parties to be Copied]FROM: [Your Name]SUBJECT: Data Quality Issue and Resolution
Dear [Recipient Name(s)],
I am writing to inform you of a recent data quality issue that was identified within our [data system/database] and to outline the steps taken to address and resolve the matter.
Issue Description:[Provide a brief and clear description of the data quality issue. Include details such as the nature of the problem, the data set(s) affected, and the potential impact on business operations or decision-making.]
Issue Discovery:[Explain how the data quality issue was discovered. If applicable, mention any tools or processes used to identify the issue.]
Resolution Steps:[Outline the steps taken to address and resolve the data quality issue. Include any corrective actions, data validation, or data cleansing processes that were implemented. If the issue has not been fully resolved, explain the ongoing efforts to address it.]
Preventive Measures:[Describe any preventive measures or process improvements that have been put in place to avoid similar data quality issues in the future. This may include changes to data validation rules, data governance policies, or staff training.]
Next Steps:[If applicable, outline any next steps or actions that need to be taken by the recipient(s) or other stakeholders. This may include reviewing updated data, providing feedback, or participating in meetings to discuss the issue further.]
I would like to thank [relevant team members or departments] for their prompt and diligent efforts in addressing this issue. Ensuring the accuracy and integrity of our data is a top priority, and we are committed to continuously improving our data management practices.
Please do not hesitate to reach out to me if you have any questions or require further information regarding this matter.
Thank you for your attention to this issue.
Sincerely,
Prompt:
"Please help me create a Jira ticket with the following details:
Title: [Short, descriptive summary of the issue or feature request]
Description:
- Background: [Provide context or background information about the issue or feature request]
- Issue/Feature: [Explain the problem or desired functionality in detail]
- Expected behavior: [Describe what the expected outcome should be]
- Steps to reproduce: [If applicable, list the steps required to reproduce the issue]
- Acceptance criteria: [Clearly define the criteria that must be met for the ticket to be considered complete]
- Additional notes: [Include any other relevant information, such as screenshots, logs, or potential solutions]"
Prompt:
Convert this code [insert code] into SQL. You can also guide me through what the code is doing.
Prompt:
"Please help me explain the technical data concept of [Technical data concept] to a non-technical business user, focusing on the [Add context about industry] industry.
Provide a clear and concise explanation of the concept, tailored to someone without a technical background, and include a relevant example from the specified industry to help illustrate the concept's application and importance in that context."
Prompt:
"Compose a persuasive message to leadership advocating for the investment in a [tool], outlining the reasons for the investment, who will benefit, the estimated cost, and the expected impact on the organization.”
Example:
GPT Prompts for Data Science
Prompt:
"I am working on a project to build a predictive model for [insert specific problem or domain] and would like to showcase my expertise in [insert specific skills or techniques]. Can you recommend the top five datasets that would be most suitable for my use case, allowing me to effectively demonstrate my knowledge and skills?”
Prompt:
I want you to act as a data scientist and code for me. I have a dataset of [describe dataset]. Please write code for data visualisation and exploration.
Prompt:
I want you to act as a coder. Please write me a regex in python that [describe regex]
Prompt:
I'm working on a SQL task that involves creating a series of similar tables for different months. Each table should have the same structure, but the table names should include the month and year. The structure of each table is as follows:
- id (integer, primary key)
- name (varchar)
- amount (decimal)
- date (date)
I need to create tables for the months of January, February, and March 2023. The table names should be in the format "sales_YYYY_MM" (e.g., "sales_2023_01" for January 2023). I find this task a bit repetitive and boring, so I'm hoping you can help me generate the SQL code to create these tables. Thanks!
Prompt:
"I want you to act as a coder. I have trained a machine learning model on an imbalanced dataset. The predictor variable is the column [Insert column name]. In Python, how do I oversample and/or under sample my data?"
Prompt:
Please help me create a Sankey diagram with the following information:
- Number of stages or categories: [number_of_stages]
- Stage names: [stage_1_name], [stage_2_name], ..., [stage_n_name]
- Connections between stages and their flow quantities:
- From [stage_name] to [stage_name]: [quantity]
- From [stage_name] to [stage_name]: [quantity]
- ...
- From [stage_name] to [stage_name]: [quantity]
Thank you!
Prompt:
I want you to act as a data scientist and code for me. I have a dataset of [describe dataset]. Please build a machine learning model that predicts [target variable].
Prompt:
"As a data scientist, I have trained a decision tree model using [insert model details here, e.g., dataset, libraries, and settings]. Can you help me understand the results of this model and provide Python code to identify the most important features?”
Prompt:
I want you to act as a data scientist and code for me. I have trained a [model name]. Please write the code to tune the hyper parameters."
Prompt:
I have [insert dataset type] dataset: [copy dataset sample]. Can you describe this dataset? I want to reuse this description in another ChatGPT prompt later on.Make sure you extract in a structured format:- table name- list of columns- 3 associated business tags- 5 first lines as data sample
Prompt:
I want you to act as a data scientist and code for me. I have a time series dataset [describe dataset]. Please build a machine learning model that predicts [target variable]. Please use [time range] as train and [time range] as validation.
Prompt:
I want you to act as a software developer. I would like to compare the efficiency of two algorithms that performs the same task in Python. Please write code that helps me run an experiment that can be repeated for 5 times. Please output the runtime and other summary statistics of the experiment. [Insert functions]
Prompt:
I want you to act as a code analyzer. Can you improve the following code for readability and maintainability? [Insert code]
Prompt:
I want you to act as a data scientist. Please write code to test if that my pandas Data frame [insert requirements here]
Prompt:
I want you to act as a software developer. Please write unit tests for the function [Insert function]. The test cases are: [Insert test cases]
Prompt:
I want you to act as a software developer. Please compare the time complexity of the two algorithms below. [Insert two functions]
Python
Prompt:
I want you to act as a software developer. This code is supposed to [expected function]. Please help me debug this Python code that cannot be run. [Insert function]
SQL
Prompt:
I want you to act as a SQL code corrector. This code does not run in [your DBMS, e.g. PostgreSQL]. Can you correct it for me? [SQL code here]
Prompt:
"Please help me with using a naive Bayes approach for hyperparameter tuning in Databricks:
- Dataset: [Provide details about the dataset, including its location, format, and features]
- Problem: [Specify the problem you are trying to solve, such as classification or regression]
- Hyperparameters: [List the hyperparameters you want to tune, such as learning rate, number of iterations, or regularization parameters]
- Search space: [Define the search space for each hyperparameter, e.g., ranges or specific values to be explored]
- Evaluation metric: [Mention the evaluation metric to be used for comparing model performance, such as accuracy, F1 score, or mean squared error]
Please provide step-by-step instructions on how to perform hyperparameter tuning using a naive Bayes approach in Databricks, including any required code snippets and best practices."
Prompt:
"As a data scientist, I have trained a decision tree model using [insert model details here, e.g., dataset, libraries, and settings]. Can you help me understand the results of this model and provide Python code to identify the most important features?”
GPT Prompts for Data Analyst
Prompt:
I want you to act as a fake data generator. I need a dataset that has [x] rows and [y] columns: [insert column names]
Output:
Prompt:
"Please help me generate sample data for the following SQL DDL table definition:
SQL DDL:[Provide your SQL DDL table definition, including table name, column names, and data types]
Based on the table definition, please generate a set of somewhat realistic sample data that can be used for testing and mock data generation. Ensure that the sample data is consistent with the meaning of the column names and adheres to the specified data types."
Prompt:
"Please help me perform a specific operation (x) on the following example DataFrame represented as a table in Markdown format:
[Insert Example DataFrame]
Operation (x): [Describe the desired operation, e.g., filter rows based on a condition, calculate a new column, sort the DataFrame, or group by a specific column]
Please provide the necessary Pandas code to perform the specified operation (x) on this example DataFrame, and show the resulting DataFrame after the operation is applied."
Prompt:
"Please provide a Python code snippet that demonstrates how to clean and preprocess a dataset, including handling missing values, removing duplicates, and standardizing data formats. Use a sample dataset with columns 'Name,' 'Age,' 'Gender,' and 'Email' for this demonstration.”
Prompt:
"Please provide a Python code snippet that demonstrates how to merge two datasets using the Pandas library. Assume that the first dataset, 'df1,' contains columns 'ID,' 'Name,' and 'Age,' and the second dataset, 'df2,' contains columns 'ID,' 'City,' and 'Country.' Merge the two datasets on the 'ID' column, and show the resulting merged dataset.”
Prompt:
"Please provide a Python code snippet that demonstrates how to scrape data from the homepage of 'www.castordoc.com' using the BeautifulSoup and requests libraries. Extract and display the page title and the text content of the main headings (e.g., h1, h2) on the page. Note: Ensure that your web scraping practices comply with the website's terms of service.Store the data in a pd dataframe"
Prompt:
"Please provide a Python code snippet that demonstrates how to collect data from a public REST API endpoint using the 'requests' library. As an example, use the following API endpoint that returns JSON data about users: 'https://jsonplaceholder.typicode.com/users'. Retrieve the data, parse the JSON response, and display the result in a readable format."
Example:
Prompt:
I want you to act as a data scientist and code for me. I have a dataset of [describe dataset]. Please build a machine learning model that predicts [target variable].
Prompt:
"As a data scientist, I have a table with two columns: [Insert column names]. I'd like to calculate a running average for [specify the desired value or column]. Can you provide the SQL code to accomplish this in PostgreSQL 14?”
Prompt:
"Please help me modify the following SQL query to achieve a slightly different result:
[Insert Original SQL Query]
Original Query Purpose: [Describe the purpose or goal of the original SQL query]
Desired Modification: [Explain the specific modification you want to make to the query, such as changing the filtering criteria, adding or removing columns, modifying the aggregation, or altering the sorting order]
Please provide the modified SQL query that achieves the desired result, along with an explanation of the changes made and how the new query differs from the original one."
Prompt:
What is the equivalent of the FUNC1 function in BigQuery?
Prompt:
"Please help me compare the following two similar SQL queries and explain the differences between them:
[SQL QUERY 1]
[SQL QUERY 2]
Analyze both SQL queries and provide a detailed comparison that highlights the differences in terms of structure, syntax, filtering criteria, columns selected, aggregation, and any other relevant aspects. Additionally, explain how these differences may impact the results returned by each query and any potential implications for performance or data accuracy.”
Prompt:
“As a senior data analyst, [insert schema & data sample]given the above schemas and data, write a detailed and correct [insert DBMS] sql query to answer the analytical question:
[question]
Comment the query with your logic.”
Prompt:
“Double check the Postgres query above for common mistakes, including:
- Remembering to add `NULLS LAST` to an ORDER BY DESC clause
- Handling case sensitivity, e.g. using ILIKE instead of LIKE
- Ensuring the join columns are correct
- Casting values to the appropriate type
Rewrite the query here if there are any mistakes. If it looks good as it is, just reproduce the original query."
Prompt:
[insert query from previous prompt]
The query above produced the following error:
[insert query error]
Rewrite the query with the error fixed:"
Prompt:
"Please help me create PySpark StructType and StructField schema definitions for the following dataset:
Dataset columns:
- Column Name: [Name of the first column]Data Type: [Data type of the first column, e.g., StringType, IntegerType, DoubleType, etc.]Nullable: [True/False, indicating if the first column can contain null values]
- Column Name: [Name of the second column]Data Type: [Data type of the second column]Nullable: [True/False, indicating if the second column can contain null values]
[Continue with further columns as needed]
Please provide the PySpark code for creating the StructType and StructField objects that define the schema for this dataset."
Prompt:
”As an expert in data visualization, I need your help to choose the best visualization method for the following problem:
[PROBLEM]
Please describe the problem in detail and recommend the most appropriate visualization method to effectively communicate the information. Explain why you think this method is the best choice.
Example:
Prompt:
”Write python code to visualize [metric] using [choose viz method]”
Example:
Prompt:
"[Insert data sample] Can you do visualizations & descriptive analyses to help me understand the data?"
Prompt:
”[insert data sample]Can you try regressions and look for patterns? Can you run regression diagnostics?
Prompt:
I want you to act as a software developer. Please provide documentation for func1 below. [Insert function]
Prompt:
"Please help me extract the structure of the following data sample:
Data Sample:[Provide a sample of your data, either as a small dataset, a JSON snippet, or a few rows of a CSV file]
Based on this sample, please provide the inferred structure, including column names, data types, and any relationships or hierarchies that can be observed in the data. Additionally, provide any suggestions or best practices for storing and processing this data using appropriate tools and technologies."
Prompt:
Write OKRs for my X people data team. The focus for this quarter is X, Y, Z.
Example:
GPT Prompts for Head of Data
Prompt:
"Measure data team ROI. Use best practice from this article: https://www.castordoc.com/blog/how-to-measure-the-roi-of-your-data-team”
Prompt:
I am recruiting for [insert job title] to take over the following responsibilities [insert responsibilities]Can you draft a job description?Customize it to our company. Here’s an example of other job descriptions in our career page: [insert other job desc]
Prompt:
Identify 15 key metrics for [insert industry]. Our objective for the year is to [insert strategic priority]. We are already following [X, Y, Z KPIs], please don’t add them but you can suggest complementary KPIs or ways to improve current ones.
Prompt:
As data engineer, I am interested in benchmarking [list tools or category] to evaluate their performance and suitability for specific use cases. My goal is to identify the best tools for [X]. Please provide a step-by-step guide on how to conduct the benchmark, including the key criteria to consider, the metrics to measure, and any best practices to follow during the benchmarking process. Additionally, if there are any widely-used benchmarking frameworks or tools that can assist in this process, please mention them as well.
Prompt:
"Define the GDPR and HIPAA compliance processes that a data team must follow, including key principles, requirements, and best practices. Provide a step-by-step guide on how to implement and maintain a compliant data handling and processing environment, taking into account aspects such as data collection, storage, access, and processing. [add customization depending on the specific organization and data types involved].”
Prompt:
Explain the following data privacy regulations and requirements:
[insert policy]
Make sure my 15-year old brother can understand this.
Prompt:
[describe your data team]
[describe your data maturity]
[add your timeline constraints]
Can you suggest the best roll out plan for a data catalog project?
Prompt:
I want you to act as a data science coach. I would like to train my team about [topic]. Please suggest 3 best specific resources. You can include [specify resource type]
Prompt:
Outline and internal team training on [X], include training objectives and outcomes
Prompt:
"As an academic, please provide a simplified one-paragraph summary of the following research paper: [Insert paper title, author(s), and publication details].”
Prompt:
How does the job of a data team change in a recession?What are the key KPIs to follow?
Prompt:
I want you to act as a code explainer. What is this code doing? [Insert code]
Prompt:
I want you to act as a data science instructor. Can you please explain to me what this SQL code is doing? [Insert SQL code]
Prompt:
I want you to act as a Google Sheets formula explainer. Explain the following Google Sheets command. [Insert formula]
Prompt:
I want you to act as a Google Sheets formula explainer. Explain the following Google Sheets command. [Insert formula]
Level 1
Prompt: I want you to act as a data science instructor. Explain [concept] to a five-year-old.
Level 2
Prompt: I want you to act as a data science instructor. Explain [concept] to an undergraduate.
Level 3
Prompt: I want you to act as a data science instructor. Explain [concept] to a professor.
Level 4
Prompt: I want you to act as a data science instructor. Explain [concept] to a business stakeholder.
Level 5
Prompt: I want you to act as an answerer on StackOverflow. You can provide code snippets, sample tables and outputs to support your answer. [Insert technical question]
Prompt:
I want you to act as a Google Sheets formula explainer. Explain the following Google Sheets command. [Insert formula]
Example:
Prompt:
Awesome now write a mermaid diagram code to explain these relationships
GPT Prompts for Analytics Engineer
Prompt:
"Please help me create a Jinja macro for my dbt project:
Macro Purpose: [Describe the purpose of the macro, e.g., calculate the age of users, create a timestamp, or format a currency value]
Input Parameters: [List the input parameters required for the macro, including their names and data types]
Expected Output: [Describe the expected output of the macro, including its data type and any specific formatting requirements]
Please provide the Jinja macro code that meets the requirements and can be used in my dbt project, along with an example of how to use the macro in a dbt model SQL file."
Prompt:
"Please help me add a runtime session setting to a model in my dbt project:
Model Name: [Provide the name of the model you want to apply the runtime session setting to]
Session Setting: [Specify the session setting you want to apply, e.g., setting a specific database schema, changing the statement timeout, or adjusting the query priority]
Please provide step-by-step instructions on how to apply the desired runtime session setting to the specified model in my dbt project, including any required code snippets and best practices for implementing session settings in dbt."
Prompt:
"Write a dbt model configuration for [use case], including necessary configuration settings such as materialization, schema tests, and any other relevant configurations to optimize the model for the given use case. Make sure to include placeholders where customization is needed.”
Prompt:
Convert this SQL code:[insert code] into dbt model. Make sure you include necessary configuration settings such as materialization, schema tests, and any other relevant configurations to optimize the model for the given use case.
Prompt:
"Provide detailed explanations and examples of common dbt syntax and functions, focusing on their usage in analytics engineering projects. Include explanations of key concepts such as ref(), source(), materializations, incremental models, and schema tests. Make sure to cover both basic and advanced functions, as well as any relevant tips and best practices for their effective application.”
Ask it general questions about data modeling. The key here compared to Google/SO is that you can ask follow up questions and request examples
Prompt:
"How can I design a data model for [YOUR USE CASE] that takes into [DATA POINTS]? Please provide insights on entities, attributes, and relationships."
Prompt:
"Share dbt best practices for analytics engineers, including but not limited to using incremental materializations, adopting proper naming conventions, and organizing projects with packages. Provide explanations, examples, and tips to ensure that analytics engineers are following industry standards and optimizing their work in dbt.”
Prompt:
"Explain the process of using dbt snapshots for data versioning, including the benefits, key concepts, and configuration options. Provide a step-by-step guide on how to create, configure, and manage snapshots in a dbt project, along with best practices for using snapshots effectively.
[Make sure to include placeholders for customization depending on the specific use case or dataset.]”
Prompt:
"Explain the process of integrating dbt workflows with orchestration tools like Apache Airflow, including the benefits, key concepts, and best practices. Provide a step-by-step guide on how to set up and configure the integration between dbt and Apache Airflow, including creating DAGs, tasks, and any necessary scripts or configurations. Make sure to include placeholders for customization depending on the specific project requirements and use case.”
Prompt:
"Propose a comprehensive data validation process for a dbt pipeline, including key steps, methodologies, and best practices. Cover aspects such as schema tests, custom data tests, using dbt assertions, and any relevant third-party tools or packages. Provide a step-by-step guide on how to implement and maintain an effective data validation process, making sure to include placeholders for customization depending on the specific project requirements and use case.”
Prompt:
What is the process to add data quality tests to a dbt model?
Prompt:
”Write data quality test for the following dbt model: [insert code]”
Prompt:
explain this dbt model. [Insert Model]. Structure the answer in the following format:- 1 liner title about the model- explain step-by-step how the model works
Prompt:
[Insert dbt Code]Act as an analytics engineer & add inline comments to explain the most important part of the code. Be consise
Prompt:
[Insert dbt schema] Act as a analytics engineer & describe the schema above.
Prompt:
[Insert dbt code] Identify the gap in the documentation of this dbt code.Make suggestions to improve it
Prompt:
[insert data sample]
[insert dbt model]
Please document this data table based on the column values & dbt model.
Prompt:
Explain dbt model with simple terms that a business user can understand.
Example: An analytics engineer analyzing user engagement data from a mobile app might consult ChatGPT for ideas on which visualizations and statistical tests would be most effective in uncovering insights about user behavior.
Prompt:
I want to do [X] with the following data [insert data]. Can you suggest statistical techniques that will help me do [X]. Provide a SQL code sample if possible.
Prompt:
if I am missing [X] data, what is best way to measure [X]
Prompt:
[Insert dbt schema] Act as a analytics engineer & describe the schema above.
Prompt:
I want to do [X] with the following data [insert data]. Can you suggest the best data visualisation idea for my use-case?
GPT Prompts for Business Analyst (COMING SOON)
GPT Prompts for Data Architect (COMING SOON)