bPr*_*tik 14 character-encoding azure azure-storage azure-storage-blobs
上传文件到Azure Blob Storage与原来的文件名,并指定文件名作为meta-data该CloudBlob
这些字符是不允许的,meta-data但可以接受为blob名称:
š Š ñ Ñ ç Ç ÿ Ÿ ž Ž Ð œ Œ « » éèëêð ÉÈËÊ àâä ÀÁÂÃÄÅ àáâãäå ÙÚÛÜ ùúûüµ òóôõöø ÒÓÔÕÖØ ìíîï ÌÍÎÏ
Run Code Online (Sandbox Code Playgroud)
meta-data?我们是否遗漏了一些导致此异常的设置?blob和meta-data命名约定,但没有关于数据本身!var dirtyFileName = file.FileName;
var normalizedFileName = file.FileName.CleanOffDiacriticAndNonASCII();
// Blob name accepts almost characters that are acceptable as filenames in Windows
var blob = container.GetBlobReference(dirtyFileName);
//Upload content to the blob, which will create the blob if it does not already exist.
blob.Metadata["FileName"] = normalizedFileName;
blob.Attributes.Properties.ContentType = file.ContentType;
// ERROR: Occurs here!
blob.UploadFromStream(file.InputStream);
blob.SetMetadata();
blob.SetProperties();
Run Code Online (Sandbox Code Playgroud)
文件名中的非法字符只是冰山的一角,仅为此问题而放大!更大的图景是我们使用索引这些文件Lucene.net,因此需要将大量文件meta-data存储在blob.请不要建议将它们全部单独存储在数据库中,只是不要!到目前为止,我们很幸运,只有一个带有变音字符的文件!
所以,目前我们正努力避免将文件名保存meta-data为变通方法!
除非我得到实际解决问题的答案,否则此解决方法是针对上述问题的解决方案!
为了实现这一点,我使用以下方法的组合来:
但这并不理想,因为我们正在丢失数据!
/// <summary>
/// Converts all Diacritic characters in a string to their ASCII equivalent
/// Courtesy: http://stackoverflow.com/a/13154805/476786
/// A quick explanation:
/// * Normalizing to form D splits charactes like è to an e and a nonspacing `
/// * From this, the nospacing characters are removed
/// * The result is normalized back to form C (I'm not sure if this is neccesary)
/// </summary>
/// <param name="value"></param>
/// <returns></returns>
public static string ConvertDiacriticToASCII(this string value)
{
if (value == null) return null;
var chars =
value.Normalize(NormalizationForm.FormD)
.ToCharArray()
.Select(c => new {c, uc = CharUnicodeInfo.GetUnicodeCategory(c)})
.Where(@t => @t.uc != UnicodeCategory.NonSpacingMark)
.Select(@t => @t.c);
var cleanStr = new string(chars.ToArray()).Normalize(NormalizationForm.FormC);
return cleanStr;
}
Run Code Online (Sandbox Code Playgroud)
/// <summary>
/// Removes all non-ASCII characters from the string
/// Courtesy: http://stackoverflow.com/a/135473/476786
/// Uses the .NET ASCII encoding to convert a string.
/// UTF8 is used during the conversion because it can represent any of the original characters.
/// It uses an EncoderReplacementFallback to to convert any non-ASCII character to an empty string.
/// </summary>
/// <param name="value"></param>
/// <returns></returns>
public static string RemoveNonASCII(this string value)
{
string cleanStr =
Encoding.ASCII
.GetString(
Encoding.Convert(Encoding.UTF8,
Encoding.GetEncoding(Encoding.ASCII.EncodingName,
new EncoderReplacementFallback(string.Empty),
new DecoderExceptionFallback()
),
Encoding.UTF8.GetBytes(value)
)
);
return cleanStr;
}
Run Code Online (Sandbox Code Playgroud)
我真的希望得到答案,因为解决方法显然不理想,而且为什么这是不可能的也没有意义!
为了扩展 bPratik 的答案,我们发现 Base64 编码元数据效果很好。我们使用这个扩展方法来进行编码和解码:
public static class Base64Extensions
{
public static string ToBase64(this string input)
{
var bytes = Encoding.UTF8.GetBytes(input);
return Convert.ToBase64String(bytes);
}
public static string FromBase64(this string input)
{
var bytes = Convert.FromBase64String(input);
return Encoding.UTF8.GetString(bytes);
}
}
Run Code Online (Sandbox Code Playgroud)
然后在设置 blob 元数据时:
blobReference.Metadata["Filename"] = filename.ToBase64();
Run Code Online (Sandbox Code Playgroud)
检索它时:
var filename = blobReference.Metadata["Filename"].FromBase64();
Run Code Online (Sandbox Code Playgroud)
对于搜索,您必须在将文件名呈现给索引器之前对其进行解码,或者使用 blob 的实际文件名(假设您仍然使用原始文件名)。
| 归档时间: |
|
| 查看次数: |
7173 次 |
| 最近记录: |