Skip to content

A high-performance, standalone C library for CSV file handling, providing efficient reading and writing capabilities with minimal memory footprint.

License

Notifications You must be signed in to change notification settings

csvtoolkit/FastCSV-C

Repository files navigation

CSV Library

Build Status Memory Safe Tests Valgrind Performance License: MIT C99

A high-performance, memory-safe CSV parsing and writing library written in C with custom arena-based memory management. Designed for production use with zero memory leaks, comprehensive error handling, and enterprise-grade features including multi-encoding support and RFC 4180 compliance.

πŸš€ Features

  • πŸ›‘οΈ Memory Safe: Zero memory leaks, validated with Valgrind
  • ⚑ Ultra High Performance: 7.6M+ operations/second with optimized parsing
  • 🎯 Custom Memory Management: Arena-based allocator for efficient memory usage
  • 🌐 Multi-Encoding Support: UTF-8, UTF-16, UTF-32, ASCII, Latin1 with BOM support
  • πŸ“ RFC 4180 Compliant: Proper quote escaping and multi-line field support
  • πŸ”§ Flexible Configuration: Customizable delimiters, quotes, strict mode, and field trimming
  • πŸ“Š Advanced Reader Features: Navigation, seeking, header management, and position tracking
  • βœ… Comprehensive Testing: 60+ tests across 6 test suites with 100% pass rate
  • 🌐 Cross-Platform: Works on Linux, macOS, and other Unix-like systems
  • πŸ“š Library Ready: Designed for integration into larger projects and language bindings

πŸ“‹ Table of Contents

πŸ”§ Installation

Prerequisites

  • C99 compatible compiler (GCC, Clang)
  • POSIX-compliant system
  • Make build system
  • Valgrind (optional, for memory testing)

Build from Source

git clone https://github.com/csvtoolkit/FastCSV-C.git
cd FastCSV-C

# Build shared and static libraries
make

# Run tests to verify installation
make test

# Optional: Run memory safety checks
make valgrind

# Performance benchmarks
make benchmark

Build Targets

Target Description
make Build shared and static libraries
make shared Build shared library (libcsv.so)
make static Build static library (libcsv.a)
make test Run all tests
make valgrind Run tests with Valgrind
make benchmark Run performance benchmarks
make clean Clean build artifacts
make help Show all available targets

πŸš€ Quick Start

Reading CSV Files

#include "csv_reader.h"
#include "arena.h"

int main() {
    // Initialize arena allocator
    Arena arena;
    arena_create(&arena, 4096);
    
    // Create configuration with encoding support
    CSVConfig *config = csv_config_create(&arena);
    csv_config_set_path(config, "data.csv");
    csv_config_set_has_header(config, true);
    csv_config_set_encoding(config, CSV_ENCODING_UTF8);
    
    // Initialize reader
    CSVReader *reader = csv_reader_init_with_config(&arena, config);
    
    // Get headers
    int header_count;
    char **headers = csv_reader_get_headers(reader, &header_count);
    printf("Headers: ");
    for (int i = 0; i < header_count; i++) {
        printf("%s ", headers[i]);
    }
    printf("\n");
    
    // Read records with navigation support
    while (csv_reader_has_next(reader)) {
        CSVRecord *record = csv_reader_next_record(reader);
        if (record) {
            printf("Record at position %ld:\n", csv_reader_get_position(reader));
            for (int i = 0; i < record->field_count; i++) {
                printf("  %s: %s\n", headers[i], record->fields[i]);
            }
        }
    }
    
    // Cleanup
    csv_reader_free(reader);
    arena_destroy(&arena);
    return 0;
}

Writing CSV Files with Encoding

#include "csv_writer.h"
#include "arena.h"

int main() {
    Arena arena;
    arena_create(&arena, 4096);
    
    // Configure with UTF-8 and BOM
    CSVConfig *config = csv_config_create(&arena);
    csv_config_set_path(config, "output.csv");
    csv_config_set_encoding(config, CSV_ENCODING_UTF8);
    csv_config_set_write_bom(config, true);
    csv_config_set_strict_mode(config, true);
    
    // Initialize writer
    CSVWriter *writer;
    char *headers[] = {"Name", "Age", "City"};
    csv_writer_init(&writer, config, headers, 3, &arena);
    
    // Write data with automatic quoting
    char *row1[] = {"John Doe", "30", "New York"};
    csv_writer_write_record(writer, row1, 3);
    
    char *row2[] = {"Jane Smith", "25", "Los Angeles"};
    csv_writer_write_record(writer, row2, 3);
    
    csv_writer_free(writer);
    arena_destroy(&arena);
    return 0;
}

πŸ“– API Reference

Core Components

Component Description
Arena (arena.h) Custom memory allocator
CSV Parser (csv_parser.h) Low-level parsing engine with RFC 4180 support
CSV Reader (csv_reader.h) High-level reading interface with navigation
CSV Writer (csv_writer.h) CSV output generation with encoding support
CSV Config (csv_config.h) Configuration management with encoding options
CSV Utils (csv_utils.h) Utility functions

Arena Management

// Initialize arena with specified size
Arena arena;
ArenaResult result = arena_create(&arena, size_t size);

// Allocate memory from arena
void* ptr;
ArenaResult result = arena_alloc(&arena, size_t size, &ptr);

// Duplicate string in arena
ArenaResult result = arena_strdup(&arena, const char* str, char** result);

// Reset arena for reuse
arena_reset(&arena);

// Clean up arena
arena_destroy(&arena);

Enhanced CSV Reading

// Initialize reader with configuration
CSVReader *reader = csv_reader_init_with_config(&arena, config);

// Navigation and positioning
int has_more = csv_reader_has_next(reader);
long position = csv_reader_get_position(reader);
int seek_result = csv_reader_seek(reader, long position);
csv_reader_rewind(reader);

// Header management
int header_count;
char **headers = csv_reader_get_headers(reader, &header_count);

// Configuration updates
csv_reader_set_config(reader, &arena, new_config);

// Read records
CSVRecord *record = csv_reader_next_record(reader);

Advanced CSV Writing

// Initialize with encoding and BOM support
CSVWriter *writer;
CSVWriterResult result = csv_writer_init(&writer, config, headers, count, &arena);

// Write records with automatic formatting
csv_writer_write_record(writer, fields, field_count);

// Write with field mapping
csv_writer_write_record_map(writer, field_names, field_values, count);

// Utility functions
bool needs_quoting = field_needs_quoting(field, delimiter, enclosure, strict_mode);
bool is_numeric = is_numeric_field(field);

βš™οΈ Configuration

Basic Configuration

CSVConfig *config = csv_config_create(&arena);

// Customize delimiters and quotes
csv_config_set_delimiter(config, ';');        // Default: ','
csv_config_set_enclosure(config, '\'');       // Default: '"'
csv_config_set_escape(config, '\\');          // Default: '"'

// Configure parsing behavior
csv_config_set_trim_fields(config, true);     // Default: false
csv_config_set_skip_empty_lines(config, true); // Default: false
csv_config_set_strict_mode(config, true);     // Default: false
csv_config_set_preserve_quotes(config, false); // Default: false

Advanced Configuration

// Encoding and BOM support
csv_config_set_encoding(config, CSV_ENCODING_UTF8);
csv_config_set_write_bom(config, true);

// File handling
csv_config_set_path(config, "data.csv");
csv_config_set_has_header(config, true);
csv_config_set_offset(config, 100);  // Skip first 100 lines
csv_config_set_limit(config, 1000);  // Process only 1000 records

🌐 Encoding Support

Supported Encodings

Encoding Constant BOM Support Notes
UTF-8 CSV_ENCODING_UTF8 βœ… Unicode, default
UTF-16 LE CSV_ENCODING_UTF16LE βœ… Unicode
UTF-16 BE CSV_ENCODING_UTF16BE βœ… Unicode
UTF-32 LE CSV_ENCODING_UTF32LE βœ… Unicode
UTF-32 BE CSV_ENCODING_UTF32BE βœ… Unicode
ASCII CSV_ENCODING_ASCII ❌ Single-byte, no BOM, no Unicode
Latin1 CSV_ENCODING_LATIN1 ❌ Single-byte, no BOM, Western European
  • ASCII and Latin1 are fully supported for both reading and writing. No BOM is written for these encodings. They are suitable for legacy systems and Western European text, but do not support Unicode characters outside their range.

BOM (Byte Order Mark) Writing

// Enable BOM for UTF encodings
csv_config_set_encoding(config, CSV_ENCODING_UTF8);
csv_config_set_write_bom(config, true);

// BOM bytes are automatically written:
// UTF-8: EF BB BF
// UTF-16LE: FF FE
// UTF-16BE: FE FF
// UTF-32LE: FF FE 00 00
// UTF-32BE: 00 00 FE FF

πŸ”§ Advanced Features

Multi-line Field Support

// Automatic handling of quoted multi-line fields
char *content = "name,description\n"
                "\"Product A\",\"A great product\nwith multiple lines\"\n"
                "\"Product B\",\"Another product\"";

// Parser automatically handles multi-line quoted fields
CSVParseResult result = csv_parse_line_inplace(content, &arena, config, 1);

RFC 4180 Quote Escaping

// Proper quote escaping: "" becomes "
char *input = "\"Say \"\"Hello\"\" World\",normal";
// Results in: Say "Hello" World, normal

// Enhanced quote handling in parser
CSVParseResult result = csv_parse_line_inplace(input, &arena, config, 1);

Strict Mode Processing

// Enable strict mode for enhanced validation
csv_config_set_strict_mode(config, true);

// Strict mode features:
// - Fields with spaces are automatically quoted
// - Enhanced validation of field content
// - Stricter RFC 4180 compliance

πŸ§ͺ Testing

The library includes comprehensive test coverage:

Test Suite Tests Coverage
Arena Tests 13 Memory allocation, alignment, bounds, safety
Config Tests 7 Configuration management, encoding, flags
Utils Tests 11 String utilities, validation, trimming
Parser Tests 7 Core parsing, quotes, multi-line, edge cases
Writer Tests 15 Record writing, BOM, encoding, formatting
Reader Tests 8 Navigation, headers, seeking, positioning
Total 60+ All components with edge cases

Running Tests

# Run all tests
make test

# Run specific test suite
make test-arena
make test-config
make test-utils
make test-parser
make test-writer
make test-reader

# Memory leak detection
make valgrind
make valgrind-all

# Performance testing
make benchmark
make stress-test

Test Results Summary

βœ… Arena Tests: 13/13 passed
βœ… Config Tests: 7/7 passed  
βœ… Utils Tests: 11/11 passed
βœ… Parser Tests: 7/7 passed
βœ… Writer Tests: 15/15 passed
βœ… Reader Tests: 8/8 passed
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
πŸŽ‰ Total: 60+ tests passed

⚑ Performance

Benchmarks

Operation Performance Memory
Parse 1M records 7.6M ops/sec 90% less malloc
Write 1M records 5.2M ops/sec Zero fragmentation
Memory allocations Arena-based Predictable cleanup
Multi-line parsing Optimized Streaming support

Performance Features

  • Zero-copy parsing where possible
  • In-place string modification to avoid allocations
  • Arena-based memory management for reduced malloc overhead
  • Optimized field parsing with minimal string operations
  • Streaming processing for large files
  • Enhanced quote handling without performance penalty

Stress Test Results

# 50,000 iteration stress test
βœ… All iterations completed successfully
βœ… Zero memory leaks detected
βœ… Consistent performance maintained

πŸ›‘οΈ Memory Safety

Validated with Valgrind:

βœ… Zero memory leaks
βœ… Zero memory errors  
βœ… Proper allocation/deallocation balance
βœ… No buffer overflows or underflows
βœ… No uninitialized memory access

Detailed Test Results:

  • Arena Tests: 10 allocs, 10 frees, 8,384 bytes - βœ… Clean
  • Config Tests: 7 allocs, 7 frees, 25,600 bytes - βœ… Clean
  • Utils Tests: 1 alloc, 1 free, 1,024 bytes - βœ… Clean
  • Parser Tests: 14 allocs, 14 frees, 34,328 bytes - βœ… Clean
  • Writer Tests: 47 allocs, 47 frees, 12,661,592 bytes - βœ… Clean
  • Reader Tests: 6 allocs, 6 frees, 14,256 bytes - βœ… Clean

πŸ”§ Error Handling

The library uses comprehensive error codes for robust error handling:

// Arena errors
typedef enum {
    ARENA_OK = 0,
    ARENA_ERROR_NULL_POINTER,
    ARENA_ERROR_INVALID_SIZE,
    ARENA_ERROR_OUT_OF_MEMORY,
    ARENA_ERROR_ALIGNMENT
} ArenaResult;

// Writer errors
typedef enum {
    CSV_WRITER_OK = 0,
    CSV_WRITER_ERROR_NULL_POINTER,
    CSV_WRITER_ERROR_MEMORY_ALLOCATION,
    CSV_WRITER_ERROR_FILE_OPEN,
    CSV_WRITER_ERROR_FILE_WRITE,
    CSV_WRITER_ERROR_INVALID_FIELD_COUNT,
    CSV_WRITER_ERROR_FIELD_NOT_FOUND,
    CSV_WRITER_ERROR_BUFFER_OVERFLOW,
    CSV_WRITER_ERROR_ENCODING
} CSVWriterResult;

// Parser errors with detailed information
typedef struct {
    bool success;
    const char *error;
    int error_line;
    int error_column;
    FieldArray fields;
} CSVParseResult;

πŸ“š Examples

Custom Delimiter Processing

CSVConfig *config = csv_config_create(&arena);
csv_config_set_delimiter(config, ';');  // Use semicolon
csv_config_set_enclosure(config, '\''); // Use single quotes
csv_config_set_strict_mode(config, true); // Enable strict validation

Large File Processing with Navigation

// Efficient streaming for large files
CSVReader *reader = csv_reader_init_with_config(&arena, config);

// Skip to specific position
csv_reader_seek(reader, 1000);

// Process with position tracking
while (csv_reader_has_next(reader)) {
    long position = csv_reader_get_position(reader);
    CSVRecord *record = csv_reader_next_record(reader);
    
    printf("Processing record at position %ld\n", position);
    process_record(record);
    
    // Arena automatically manages memory
}

Multi-Encoding File Processing

// Process files with different encodings, including ASCII and Latin1
CSVEncoding encodings[] = {
    CSV_ENCODING_UTF8,
    CSV_ENCODING_UTF16LE,
    CSV_ENCODING_LATIN1, // Now fully supported
    CSV_ENCODING_ASCII   // Now fully supported
};

for (int i = 0; i < 4; i++) {
    csv_config_set_encoding(config, encodings[i]);
    csv_config_set_write_bom(config, true); // No BOM for ASCII/Latin1
    process_csv_file(config);
}

Integration with Other Languages

The library is designed for easy integration:

  • Python: Use ctypes or cffi
  • Node.js: Use N-API
  • PHP: Direct C extension integration (optimized API)
  • Go: Use cgo
  • Rust: Use bindgen

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   CSV Reader    β”‚    β”‚   CSV Writer    β”‚
β”‚  + Navigation   β”‚    β”‚  + Encoding     β”‚
β”‚  + Headers      β”‚    β”‚  + BOM Support  β”‚
β”‚  + Seeking      β”‚    β”‚  + Strict Mode  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚                      β”‚
          β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
         β”‚  CSV Parser   β”‚
         β”‚ + RFC 4180    β”‚
         β”‚ + Multi-line  β”‚
         β”‚ + Quote Esc   β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”
         β”‚  CSV Config   β”‚
         β”‚ + Encoding    β”‚
         β”‚ + BOM Flags   β”‚
         β”‚ + Validation  β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚     Arena Allocator     β”‚
    β”‚   + Memory Safety       β”‚
    β”‚   + Zero Leaks          β”‚
    β”‚   + Performance         β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

🀝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Development Setup

git clone https://github.com/csvtoolkit/FastCSV-C.git
cd FastCSV-C
make test
make valgrind

Code Style

  • Follow C99 standard
  • Use consistent indentation (4 spaces)
  • Add tests for new features
  • Ensure Valgrind clean runs
  • Update documentation for API changes

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“¦ Releases

See Releases for downloadable packages and release notes.

Latest Release Features

  • Production-ready CSV library with enterprise features
  • Multi-encoding support with BOM writing
  • Enhanced RFC 4180 compliance with proper quote escaping
  • Advanced navigation APIs for CSV readers
  • Memory-safe with comprehensive Valgrind validation
  • High-performance with 7.6M+ operations/second
  • Cross-platform support (Linux, macOS)
  • Complete test suite with 60+ tests

πŸ™ Acknowledgments

  • Built with performance and safety in mind
  • Inspired by modern C library design principles
  • RFC 4180 compliant implementation
  • Tested extensively for production use
  • Optimized for integration with multiple programming languages

Made with ❀️ for the C community

About

A high-performance, standalone C library for CSV file handling, providing efficient reading and writing capabilities with minimal memory footprint.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published