AI for Everyday Data Analysis

Please note:

  1. If you're logged in to your IC account with Microsoft Copilot, you can safely share your data since it has enterprise data protection. Otherwise, consider anonymizing your data before sharing it with AI tools to protect your privacy.
  2. Although AI tools can handle much of the technical work, human judgment should remain paramount.

Free AI tools and file types

Here is a table of different types of files and which free AI tools allow upload and analysis of these file types.

Types of files 

   ChatGPT   

   Copilot   

   Claude   

   Gemini   

   Perplexity   

CSV 

√ 

√ 

√ 

 

√ 

Excel (xlsx) 

√ 

√ 

√ 

 

√ 

Word (docx) 

√ 

√ 

√ 

√ 

√ 

PDF 

√ 

√ 

√ 

√ 

√ 

   PowerPoint (pptx)   

√ 

√ 

√ 

√ 

√ 

Image 

√ 

√ 

√ 

√ 

√ 

Video 

√ 

 

 

 

 

Prompts for data analysis

By utilizing AI for data analysis, college staff can automate repetitive tasks, uncover hidden patterns, and predict future trends, ultimately fostering a more responsive and effective educational environment.

Here are some examples of prompts you can use to analyze your data.

Data Exploration:

- Here is a dataset containing survey responses with numerical ratings for various questions. Can you provide a statistical summary, including mean, median, mode, standard deviation, and any noticeable trends?

- Summarize key characteristics in the dataset.

- Generate a concise summary of the dataset for non-technical audience.

- Identify anomalous patterns in the [column] and suggest possible causes.

Date Cleaning:

- Suggest methods and best practices for cleaning and preprocessing this messy dataset.

- How can I extract meaningful features from [column]?

- Recommend approaches to identify and remove duplicate records from the dataset.

Data Visualization:

- What type of chart or graph is most suitable for displaying this data?

- Create a line graph to show the trend in enrollment over time.

- Based on my dataset, suggest relevant features for building a predictive model.

Analyze unstructured data

Unstructured data refers to data that doesn’t fit into traditional rows and columns such as images, text documents, emails, social media posts, etc. Below are some examples of prompts that you can use to analyze unstructured data.

Data extraction:

  • Identify and list the names of people, places, organizations, and dates mentioned in the following text: [Text]
  • Extract the sender's name, email address, and the names and email addresses of all recipients from the following email: [Email]
  • Extract the main ideas and supporting details from the article below. Summarize each paragraph in one sentence: [Article]

Sentiment analysis:

  • Analyze the sentiment of the following text and categorize it as positive, negative, or neutral: [Text]
  • Determine the emotional tone of the following passage: [Text]
  • Determine if the text contains any underlying emotions (e.g., frustration, excitement). Support your analysis with specific examples: [Text]

Analyze images and videos with computer vision:

AI can analyze images through a combination of advanced technologies and methodologies, primarily leveraging machine learning, deep learning, and computer vision. Computer vision is a field of artificial intelligence enabling computers to understand and interpret the visual world.

  • Analyze the image to recognize and classify any faces. Provide details on each face detected, including age, gender, and emotion: [Image]
  • Identify the transitions between scenes and describe the changes in context: [Video]
  • Detect and list significant events occurring in this video: [Video]

Analyze data from a survey

Here are some helpful prompt templates you can use to analyze a survey:

  • Categorize the following survey responses into relevant topics such as [column 1], [column 2], and [column 3]:
    • [Responses]
  • Compare the survey responses from Group A and Group B. What are the main differences or similarities?
    • Group A Responses: [Responses]
    • Group B Responses: [Responses]
  • Which responses, if any, seem to deviate significantly from the majority
    • [Responses]
  • Prioritize the following survey feedback items based on their impact and urgency:
    • [Responses]

Protecting your data

Using a dummy data set

Real-world data is often limited, incomplete, or plagued with privacy concerns. Generative AI can address these issues by creating synthetic data that mimics the structure and characteristics of real data. Below is a prompt template that you can use to create a mock dataset.

Prompt template:

Create a mock dataset for [topic] for [company]. The dataset should comprise the following columns:

  • [column 1]
  • [column 2]
  • etc.

Please ensure that the mock dataset is realistic and representative.

Note: Here are a few examples you could fill out the placeholders:

[topic]: enrollment trends over time, room assignments, hiring data, etc.

[company]: colleges, a department specializing in finance, etc.

[column 1], [column 2]: Name, Gender, Age, etc.

Anonymizing data

Data anonymization is the process of transforming personal data so that individuals cannot be identified, either directly or indirectly. This process ensures privacy and is often used to protect sensitive information while still allowing the data to be analyzed or shared. Here is how you can anonymize data:

Remove Direct Identifiers

Direct identifiers are pieces of information that can directly reveal someone's identity, such as:

  • Names
  • Social Security numbers (SSN)
  • Phone numbers
  • Addresses

Simply remove or mask these columns from your dataset to reduce the risk of identifying individuals.

Generalize Indirect Identifiers

Indirect identifiers are pieces of information that, when combined, could identify someone. Examples include:

  • Dates of birth
  • ZIP codes
  • Detailed job titles

Instead of completely removing this information, you can generalize it to make identification harder:

  • Convert a date of birth to just the birth year.
  • Generalize ZIP codes to broader regions (e.g., first 3 digits instead of the full ZIP).

Randomize or Perturb Data

Randomization involves slightly altering certain data points so they are less identifiable but still useful for analysis:

  • Add a small amount of noise (random variation) to numerical data.
  • Swap values between records (e.g., swapping age values between two individuals).

Use Pseudonyms

Replace unique identifiers with pseudonyms (random codes or numbers) that can't be linked back to the original individual. For instance, you could replace an employee ID with a randomly generated number. This method allows you to still perform data analysis while protecting individual identities.

Document Anonymization Process

Keep a clear record of the methods used for anonymization to ensure transparency and compliance with regulations.

Limit Sensitive Data Use

Only keep the data you need for your analysis and remove any unnecessary information that might compromise anonymity. For example, if you don’t need someone’s exact location, you can drop the location data or generalize it to a broader area.

Check for Unintentional Re-identification

After anonymizing your data, it's a good practice to check if it’s possible to re-identify individuals. For example, if your dataset has rare combinations of information, such as a specific job in a specific town, someone could still be identified. You can mitigate this risk by removing or generalizing these rare combinations.