1 Setting Up Environment
Data analysis in Python often requires setting up an environment that supports data manipulation, visualization, and coding efficiently. Two popular environments for Python-based data analysis are Visual Studio Code (VSCode) and Google Colab. This guide will help you set up both environments to get started with data analysis.
1.1 Setting Up VSCode for Data Analysis
Visual Studio Code (VSCode) is a free, open-source code editor developed by Microsoft. It supports multiple programming languages, extensions, and integrations, making it an excellent choice for data analysis.
Here is a clean Markdown (MD) version in English based on your content:
1.1.1 1. Install Visual Studio Code
Download and install Visual Studio Code from the official website:
1.1.2 2. Install Miniconda
Download and install Miniconda:
1.1.3 3. Set Up Python Environment
1.1.3.1 3.1 Launch Anaconda (Miniconda) Prompt
Open the Anaconda Prompt (or Miniconda Prompt) from your system.
1.1.3.2 3.2 Check Python Installation
Verify that Python is installed correctly:
python --version1.1.3.3 3.3 Create a Virtual Environment
Create a new virtual environment named engtech_env with Python 3.13:
conda create -n engtech_env python=3.131.1.3.4 3.4 Activate the Virtual Environment and Install Packages
Activate the environment and install Jupyter and ipykernel:
conda activate engtech_env
conda install jupyter ipykernel1.2 Setting Up Google Colab for Data Analysis
Google Colab is a free, cloud-based platform that provides Jupyter Notebook environments. It allows you to write and execute Python code in your browser, making it a great tool for data analysis, especially when dealing with large datasets and GPU-based computations.
1.2.1 Getting Started with Google Colab
Access Google Colab:
- Go to Google Colab using your Google account.
Creating a New Notebook:
- Click on “New Notebook” to create a new notebook. This will open a new page with a code cell ready to run.
Installing and Importing Libraries:
Use
pipto install any additional libraries you need directly within a cell:# Install pandas and numpy !pip install pandas numpy
Writing and Running Code:
Write your code in the code cells and run it by clicking the play button on the left side of the cell or pressing
Shift+Enter.import pandas as pd import numpy as np # Creating a DataFrame data = {'Product': ['Apples', 'Bananas', 'Cherries'], 'Price': [1.2, 0.8, 2.5]} df = pd.DataFrame(data) # Displaying the DataFrame print(df)
1.2.2 Uploading Files and Connecting to Google Drive
Uploading Files: You can upload files directly to the Colab environment by using the file upload button on the left sidebar.
Connecting to Google Drive: If you have data stored on Google Drive, you can easily access it by mounting your drive.
from google.colab import drive drive.mount('/content/drive')This code will prompt you to authorize access to your Google Drive.
1.2.3 Using GPU or TPU for Accelerated Computing
Google Colab allows you to use GPUs or TPUs to speed up your computations, especially useful for machine learning tasks.
- Changing Runtime Type:
- Go to
Runtime > Change runtime type. - Select “GPU” or “TPU” under “Hardware accelerator” and click “Save”.
- Go to
1.2.4 Saving and Exporting Your Work
Saving Notebooks: Colab automatically saves your work to Google Drive, but you can also save a copy manually by selecting
File > Save a copy in Drive.Exporting Notebooks: You can download your notebook as a
.ipynbor.pyfile by selectingFile > Download.
1.3 Summary
- VSCode is great for local development with rich support for extensions, virtual environments, and debugging.
- Google Colab is ideal for cloud-based development, especially when you need quick access to computational resources like GPUs and TPUs.