use*_*245 6 .net c# etl package nuget
我通过读取不同的格式(例如 CSV、Parquet、Avro、JSON)以 List < T > 的形式获取数据。
我想验证数据的主要特征,例如温度应在 95% 的时间内保持在范围内,其余时间列值可以为空或超出范围。
示例用例期望:
Expect_Column_Values_To_Be_Between(
columnName = "temprature",
minValue = 60,
maxValue = 75,
mostly = .95
)
Run Code Online (Sandbox Code Playgroud)
数据注释似乎部分解决了这个问题(缺少大部分功能),因为它在行级别而不是整个表(即对象级别)上工作。
[Range(60, 75, ErrorMessage = "Thermostat value {0} must be between {1} and {2}.")]
public int Temprature;
Run Code Online (Sandbox Code Playgroud)
Python 包参考: https: //github.com/great-expectations/.great_expectations包含类似的数据级别验证。
现在尝试寻求如何验证数据的指导(通过 .NET 中任何现有的等效库或通过创建新的帮助器类/扩展方法)
创建了一个示例扩展方法,该方法在表即对象级别验证数据
public class Room
{
public int RoomId { get; set; }
public string Name { get; set; }
public double Temprature { get; set; }
}
Run Code Online (Sandbox Code Playgroud)
List<Room> rooms = new List<Room>();
rooms.Add(new Room() { RoomId = 1, Name = "Hall", Temprature = 65 });
rooms.Add(new Room() { RoomId = 2, Name = "Kitchen", Temprature = 75 });
bool result = rooms.Expect_Column_Values_To_Be_Between("Temprature", 60, 75, .95);
Run Code Online (Sandbox Code Playgroud)
public static class ValidationExtensions
{
public static bool Expect_Column_Values_To_Be_Between<T>(this List<T> items,
string columnName, double minValue, double maxValue, double mostly = 1)
{
if (mostly < 0 || mostly > 1)
throw new ArgumentOutOfRangeException(
$"Mostly value {{{mostly}}} can not be less 0 or greater than 1");
else if (mostly == 0)
return true;
if (items == null || items.Count == 0)
return false;
int itemsInRangeCount = 0;
foreach (var item in items)
{
PropertyInfo? propertyInfo = item.GetType().GetProperty(columnName);
if (propertyInfo == null)
throw new InvalidDataException($"Column not found : {columnName}");
var itemValue = Convert.ToDouble(propertyInfo.GetValue(item));
if (itemValue >= minValue && itemValue <= maxValue)
itemsInRangeCount++;
}
return (itemsInRangeCount / items.Count) >= mostly ? true : false;
}
}
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
595 次 |
最近记录: |