Alveus.FileType.Detection (0.1.0)

Published 2025-12-26 13:19:27 +01:00 by mike

Installation

dotnet nuget add source --name feed --username your_username --password your_token 
dotnet add package --source feed --version 0.1.0 Alveus.FileType.Detection

About this package

Utility library to detect a file type from it's signature. Compatible with .NET Standard 2.0+ and AOT scenarios.

Alveus.FileType.Detection

Release Build status

A high-performance .NET library for detecting file types based on their binary signatures (magic bytes), not file extensions.

Buy me a Coffee

Features

  • Magic Byte Detection: Identifies files by their actual content, not just file extensions
  • Built-in Definitions: Includes signatures for common image, video, audio, document, and archive formats
  • Custom Signatures: Easily define your own file type signatures
  • Multi-Pattern Support: Match files requiring multiple byte patterns at different offsets
  • Masked Matching: Support for pattern masks to ignore certain bytes
  • High Performance: Uses ArrayPool<byte> for zero-allocation buffer management
  • Thread-Safe: Safe for concurrent use across multiple threads
  • AOT Compatible: Works with Native AOT compilation

Installation

dotnet add package Alveus.FileType.Detection

Quick Start

Using the Default Detector

The simplest way to detect file types is using the default detector with built-in definitions:

using Alveus.FileType.Detection;

// Use the default detector (includes all built-in definitions)
var detector = FileTypeDetector.Default;

// Detect from a file stream
using var stream = File.OpenRead("image.png");
var result = detector.Detect(stream);

if (result.IsMatch)
{
    Console.WriteLine($"MIME Type: {result.MimeType}");
    Console.WriteLine($"Extensions: {string.Join(", ", result.Extensions)}");
}

Custom Detector with Specific Formats

Create a custom detector with only the formats you need:

using Alveus.FileType.Detection;
using Alveus.FileType.Detection.Extensions;

var detector = new FileTypeDetector();
detector.AddImageDefinitions();  // Add only image formats
detector.AddAudioDefinitions();  // Add only audio formats

var result = detector.Detect(stream);

Defining Custom Signatures

Simple Single-Pattern Signature

Define a custom file type with a single magic byte pattern:

using Alveus.FileType.Detection;
using Alveus.FileType.Detection.Core;

var detector = new FileTypeDetector();

// Define a custom PDF signature
var pdfDefinition = new FileTypeDefinition(
    mimeType: "application/pdf",
    extensions: [".pdf"],
    patterns: [
        new BytePattern(
            offset: 0,
            pattern: [0x25, 0x50, 0x44, 0x46]  // %PDF
        )
    ]
);

detector.AddDefinition(pdfDefinition);

Multi-Pattern Signature

Some file formats require multiple patterns at different offsets. All patterns must match for the file type to be detected:

// WebP files have "RIFF" at offset 0 and "WEBP" at offset 8
var webpDefinition = new FileTypeDefinition(
    mimeType: "image/webp",
    extensions: [".webp"],
    patterns: [
        new BytePattern(0, [0x52, 0x49, 0x46, 0x46]),  // RIFF
        new BytePattern(8, [0x57, 0x45, 0x42, 0x50])   // WEBP
    ]
);

detector.AddDefinition(webpDefinition);

Using Pattern Masks

Pattern masks allow you to ignore specific bytes during matching. This is useful when certain bytes can vary:

// Match JPEG 2000 files where some bytes can vary
var jp2Definition = new FileTypeDefinition(
    mimeType: "image/jp2",
    extensions: [".jp2", ".j2k"],
    patterns: [
        new BytePattern(
            offset: 0,
            pattern: [0x00, 0x00, 0x00, 0x0C, 0x6A, 0x50, 0x20, 0x20],
            mask: [0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF]  // All bytes must match
        ),
        new BytePattern(
            offset: 8,
            pattern: [0x0D, 0x0A, 0x87, 0x0A],
            mask: [0xFF, 0xFF, 0xFF, 0xFF]
        )
    ]
);

detector.AddDefinition(jp2Definition);

Advanced Custom Format Example

Here's a complete example defining a custom binary format:

using Alveus.FileType.Detection;
using Alveus.FileType.Detection.Core;

var detector = new FileTypeDetector();

// Custom binary format with header and footer validation
var customFormat = new FileTypeDefinition(
    mimeType: "application/x-custom",
    extensions: [".custom", ".cst"],
    patterns: [
        // Magic bytes at start of file
        new BytePattern(0, [0x43, 0x55, 0x53, 0x54]),  // "CUST"

        // Version marker at offset 4
        new BytePattern(4, [0x01, 0x00]),  // Version 1.0

        // Optional: Use mask to ignore certain version bytes
        new BytePattern(
            offset: 6,
            pattern: [0xFF, 0x00, 0x00, 0xFF],
            mask: [0xFF, 0x00, 0x00, 0xFF]  // Only first and last bytes matter
        )
    ]
);

detector.AddDefinition(customFormat);

// Test the custom detector
using var stream = File.OpenRead("file.custom");
var result = detector.Detect(stream);

if (result.IsMatch)
{
    Console.WriteLine($"Detected: {result.MimeType}");
}

Built-in Format Categories

The library includes extension methods to add common format definitions:

detector.AddImageDefinitions();      // PNG, JPEG, GIF, WebP, BMP, TIFF, ICO, etc.
detector.AddVideoDefinitions();      // MP4, AVI, MKV, WebM, etc.
detector.AddAudioDefinitions();      // MP3, WAV, FLAC, OGG, etc.
detector.AddDocumentDefinitions();   // PDF, DOCX, XLSX, PPTX, etc.
detector.AddArchiveDefinitions();    // ZIP, RAR, 7Z, TAR, GZIP, etc.
detector.AddExecutableDefinitions(); // EXE, DLL, ELF, Mach-O, etc.

// Or add all at once
detector.AddDefaultDefinitions();

API Reference

FileTypeDetector

  • FileTypeDetector() - Create an empty detector
  • FileTypeDetector(IReadOnlyList<FileTypeDefinition>) - Create with initial definitions
  • static FileTypeDetector.Default - Pre-configured detector with all built-in definitions

Detection Methods

  • FileTypeResult Detect(Stream) - Detect file type from stream

Managing Definitions

  • void AddDefinition(FileTypeDefinition) - Add a single definition
  • void AddDefinitions(IEnumerable<FileTypeDefinition>) - Add multiple definitions
  • bool RemoveDefinition(FileTypeDefinition) - Remove a definition
  • void ClearDefinitions() - Remove all definitions
  • IReadOnlyCollection<FileTypeDefinition> GetDefinitions() - Get all current definitions
  • int DefinitionCount - Get the number of registered definitions

Performance Considerations

  • The detector orders definitions by specificity (more patterns first, longer patterns first)
  • Uses ArrayPool<byte> to minimize allocations
  • Thread-safe for concurrent reads
  • Stream must be seekable (position is restored after detection)

License

MIT License - Copyright (c) Alveus Dev (https://alveus.dev). All rights reserved.

Dependencies

ID Version Target Framework
System.Buffers 4.5.1 .NETStandard2.0
System.Memory 4.5.5 .NETStandard2.0
Details
NuGet
2025-12-26 13:19:27 +01:00
3
Alveus Dev (https://alveus.dev)
43 KiB
Assets (4)
Versions (1) View all
0.1.0 2025-12-26