使用 .NET Core C# 进行数据验证,主要功能在 List<T> 中

use*_*245 6 .net c# etl package nuget

我通过读取不同的格式(例如 CSV、Parquet、Avro、JSON)以 List < T > 的形式获取数据。

我想验证数据的主要特征,例如温度应在 95% 的时间内保持在范围内,其余时间列值可以为空或超出范围。

示例用例期望:

Expect_Column_Values_To_Be_Between(
    columnName = "temprature",
    minValue   =  60,
    maxValue   =  75,
    mostly     = .95
)
Run Code Online (Sandbox Code Playgroud)

数据注释似乎部分解决了这个问题(缺少大部分功能),因为它在行级别而不是整个表(即对象级别)上工作。

[Range(60, 75, ErrorMessage = "Thermostat value {0} must be between {1} and {2}.")]
public int Temprature;
Run Code Online (Sandbox Code Playgroud)

Python 包参考: https: //github.com/great-expectations/.great_expectations包含类似的数据级别验证。

现在尝试寻求如何验证数据的指导(通过 .NET 中任何现有的等效库或通过创建新的帮助器类/扩展方法)

use*_*245 1

创建了一个示例扩展方法,该方法在表即对象级别验证数据

public class Room
{
    public int RoomId { get; set; }
    public string Name { get; set; }
    public double Temprature { get; set; }
}
Run Code Online (Sandbox Code Playgroud)
List<Room> rooms = new List<Room>();
rooms.Add(new Room() { RoomId = 1, Name = "Hall", Temprature = 65 });
rooms.Add(new Room() { RoomId = 2, Name = "Kitchen", Temprature = 75 });

bool result = rooms.Expect_Column_Values_To_Be_Between("Temprature", 60, 75, .95);
Run Code Online (Sandbox Code Playgroud)
public static class ValidationExtensions
{
    public static bool Expect_Column_Values_To_Be_Between<T>(this List<T> items,
                    string columnName, double minValue, double maxValue, double mostly = 1)
    {
        if (mostly < 0 || mostly > 1)
            throw new ArgumentOutOfRangeException(
                       $"Mostly value {{{mostly}}} can not be less 0 or greater than 1");
        else if (mostly == 0)
            return true;

        if (items == null || items.Count == 0)
            return false;


        int itemsInRangeCount = 0;

        foreach (var item in items)
        {
            PropertyInfo? propertyInfo = item.GetType().GetProperty(columnName);
            if (propertyInfo == null)
                throw new InvalidDataException($"Column not found : {columnName}");

            var itemValue = Convert.ToDouble(propertyInfo.GetValue(item));

            if (itemValue >= minValue && itemValue <= maxValue)
                itemsInRangeCount++;
        }

        return (itemsInRangeCount / items.Count) >= mostly ? true : false;
    }   
}
Run Code Online (Sandbox Code Playgroud)