Priya R Priya R
Updated date Nov 18, 2023
In this blog, we will learn how to convert CSV data into YAML format using Python. This step-by-step guide covers multiple methods, complete with code examples and explanations.

Introduction:

CSV (Comma-Separated Values) and YAML (YAML Ain't Markup Language) are two widely used data formats, each with its own set of advantages. CSV files are commonly used for tabular data storage, while YAML is a human-readable data serialization format. In this blog, we will explore how to convert CSV data into YAML using Python. Whether you are a beginner or an experienced programmer, you will find this guide helpful.

Method 1: Using the csv and pyYAML Libraries

In Python, there are libraries readily available for handling both CSV and YAML formats. One of the most common methods to convert CSV to YAML is to use the csv and pyYAML libraries. If you haven't already installed the pyYAML library, you can do so using pip:

pip install PyYAML

Here's a simple Python program to convert a CSV file to YAML:

import csv
import yaml

# Open the CSV file
with open('data.csv', 'r') as csv_file:
    # Read the CSV data
    csv_data = csv.DictReader(csv_file)
    # Convert the CSV data to a list of dictionaries
    data_list = [row for row in csv_data]

# Write the YAML data
with open('data.yaml', 'w') as yaml_file:
    yaml.dump(data_list, yaml_file)
  • We start by opening the CSV file using Python's csv library.
  • Next, we read the data from the CSV file using csv.DictReader, which creates a list of dictionaries from the CSV data. This makes it easier to work with structured data.
  • We then open a YAML file in write mode and use the yaml.dump method to write the data from the list of dictionaries into the YAML file.

Output:

- name: John
  age: 28
  city: New York
- name: Alice
  age: 32
  city: Los Angeles
- name: Bob
  age: 23
  city: Chicago

Method 2: Using Pandas

Another popular method for converting CSV to YAML in Python is by using the Pandas library. Pandas is a powerful data manipulation library that simplifies data handling, and it's particularly useful for dealing with tabular data like CSV files.

If you don't have Pandas installed, you can do so using pip:

pip install pandas

Here's a program using Pandas to convert CSV to YAML:

import pandas as pd
import yaml

# Read the CSV file using Pandas
df = pd.read_csv('data.csv')

# Convert the DataFrame to a list of dictionaries
data_list = df.to_dict(orient='records')

# Write the YAML data
with open('data.yaml', 'w') as yaml_file:
    yaml.dump(data_list, yaml_file)
  • We start by using Pandas to read the CSV file into a DataFrame. This allows us to easily manipulate and transform the data.
  • The df.to_dict(orient='records') method is used to convert the DataFrame into a list of dictionaries.
  • Finally, we open a YAML file in write mode and use yaml.dump to write the data from the list of dictionaries into the YAML file.

Output:

- name: John
  age: 28
  city: New York
- name: Alice
  age: 32
  city: Los Angeles
- name: Bob
  age: 23
  city: Chicago

Method 3: Using Custom Python Code

While the previous methods use libraries like csv, pyYAML, and Pandas to simplify the conversion process, you can also write custom Python code to convert CSV to YAML. This approach gives you more control over the conversion process and allows for specific customization if needed.

Here's a custom Python program for converting CSV to YAML:

import csv
import yaml

# Initialize an empty list to store the data
data_list = []

# Open the CSV file
with open('data.csv', 'r') as csv_file:
    csv_data = csv.reader(csv_file)
    
    # Skip the header row
    header = next(csv_data)
    
    for row in csv_data:
        # Create a dictionary for each row
        data_dict = {}
        for i, field in enumerate(header):
            data_dict[field] = row[i]
        
        # Append the dictionary to the list
        data_list.append(data_dict)

# Write the YAML data
with open('data.yaml', 'w') as yaml_file:
    yaml.dump(data_list, yaml_file)
  • We start by initializing an empty list data_list to store the data in the desired format.
  • The CSV file is opened, and we read the data row by row using the csv.reader object.
  • We skip the header row, as it typically contains column names.
  • For each row, we create a dictionary where the keys are the column names and the values are the corresponding data from the row.
  • The dictionary is then appended to the data_list.
  • Finally, we use yaml.dump to write the data from data_list into the YAML file.

Output:

- name: John
  age: 28
  city: New York
- name: Alice
  age: 32
  city: Los Angeles
- name: Bob
  age: 23
  city: Chicago

Conclusion:

In this blog, we have explored various methods to convert CSV data to YAML format using Python. We covered three methods, each offering a different level of simplicity and control:

  • Using the csv and pyYAML Libraries: This method leverages the built-in csv library for CSV parsing and the pyYAML library for YAML generation. It's a straightforward way to convert data for those who prefer libraries that handle most of the heavy lifting.
  • Using Pandas: The Pandas library is a powerful tool for data manipulation, making it an excellent choice for handling CSV data. This method provides a balance between ease of use and control, allowing you to transform your data effortlessly.
  • Using Custom Python Code: For those who prefer more control and customization, we provided a custom Python code example. This approach allows you to tailor the conversion process to your specific needs.

Comments (0)

There are no comments. Be the first to comment!!!