什么是昏暗,什么是事实?

Mat*_*ing 3 data-warehouse dimensions

我有一个应用程序,我知道它将成为一个伟大的多维数据集,并将比标准的平面Reporting Services报告更有用.我们准备和顾问一起跳进商务智能的东西,但我想在我们做之前试一试,主要是因为我知道我们要做什么.

该应用程序跟踪全国养老院的调查.它们可以是年度,投诉或其他几种类型的调查,它们与给定的标签相关的处罚,并且具有与它们相关的文档.

我想做的是想出一种方法,让我们可以利用我们拥有的数据 - 六月份佛罗里达州的标签数量是多少?有多少设施按时交付文件?与去年相比,今年第一季度发生了多少次年度(意外)调查?

我包括这些模式,希望有人能够告诉我不仅是什么是暗淡的,什么是事实,而是什么数据在哪里.我认为这将是一个很好的开始.

任何事都会有所帮助.我正在努力建立一个小型数据集市,同时我正在使用Kimball的数据仓库生命周期工具包.

谢谢!米@

实体表 - 我们所有设施的列表:主键是表示建筑物的五个字母代码

CREATE TABLE [dbo].[Entity](
 [entID] [varchar](10) NOT NULL,
 [entShortName] [varchar](150) NULL,
 [entNumericID] [int] NOT NULL,
 [orgID] [int] NOT NULL,
 [regionID] [int] NOT NULL,
 [portID] [int] NOT NULL,
 [busTypeID] [int] NOT NULL,
 [adpID] [varchar](50) NULL,
 [eHealthDataID] [varchar](50) NULL,
 [updateDate] [datetime] NULL CONSTRAINT [DF_Entity_updateDate]  DEFAULT (getdate()),
 [powProID] [int] NULL,
 [regionReportingID] [int] NULL,
 [regionPresEmail] [varchar](300) NULL,
 [regionClinDirEmail] [varchar](300) NULL,
 CONSTRAINT [PK_EntityNEW] PRIMARY KEY CLUSTERED 
(
 [entID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 75) ON [PRIMARY]
) ON [PRIMARY]
Run Code Online (Sandbox Code Playgroud)

调查主要

CREATE TABLE [dbo].[surveyMain](
 [surveyID] [int] IDENTITY(1,1) NOT NULL,
 [surveyDateFac]  AS (([facility]+'-')+CONVERT([varchar],[surveyDate],(101))),
 [surveyDate] [datetime] NOT NULL,
 [surveyType] [int] NOT NULL,
 [surveyBy] [int] NULL,
 [facility] [varchar](10) NOT NULL,
 [originalSurvey] [int] NULL,
 [exitDate] [datetime] NULL,
 [dpnaDate]  AS (dateadd(month,(3),[exitDate])),
 [clearedTags] [varchar](1) NULL,
 [substantiated] [varchar](1) NULL,
 [firstRevisit] [int] NULL,
 [secondRevisit] [int] NULL,
 [thirdRevisit] [int] NULL,
 [fourthRevisit] [int] NULL,
 [updated] [datetime] NULL CONSTRAINT [DF_surveyMain_updated]  DEFAULT (getdate()),
 CONSTRAINT [PK_tagSurvey] PRIMARY KEY CLUSTERED 
(
 [surveyID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
Run Code Online (Sandbox Code Playgroud)

调查类型:

CREATE TABLE [dbo].[surveyTypes](
 [surveyTypeID] [int] IDENTITY(1,1) NOT NULL,
 [surveyTypeDesc] [varchar](100) NOT NULL,
 CONSTRAINT [PK_surveyTypes] PRIMARY KEY CLUSTERED 
(
 [surveyTypeID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
Run Code Online (Sandbox Code Playgroud)

调查文件

CREATE TABLE [dbo].[surveyFiles](
 [surveyFileID] [int] IDENTITY(1,1) NOT NULL,
 [surveyID] [int] NOT NULL,
 [surveyFilesTypeID] [int] NOT NULL,
 [documentDate] [datetime] NOT NULL,
 [responseDate] [datetime] NULL,
 [receiptDate] [datetime] NULL,
 [dateCertain] [datetime] NULL,
 [fileName] [varchar](250) NULL,
 [fileUpload] [image] NULL,
 [fileDesc] [varchar](100) NULL,
 [updated] [datetime] NOT NULL CONSTRAINT [DF_surveyFiles_updated]  DEFAULT (getdate()),
 CONSTRAINT [PK_surveyFiles] PRIMARY KEY CLUSTERED 
(
 [surveyFileID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 75) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Run Code Online (Sandbox Code Playgroud)

调查罚款

CREATE TABLE [dbo].[surveyFines](
 [surveyFinesID] [int] IDENTITY(1,1) NOT NULL,
 [surveyID] [int] NULL,
 [surveyFinesTypeID] [int] NULL,
 [dateRecommended] [datetime] NULL,
 [dateImposed] [datetime] NULL,
 [totalFineAmt] [varchar](100) NULL,
 [wasImposed] [varchar](3) NULL,
 [dateCleared] [datetime] NULL,
 [comments] [varchar](500) NULL,
 [updated] [datetime] NOT NULL CONSTRAINT [DF_surveyFines_updated]  DEFAULT (getdate()),
 CONSTRAINT [PK_surveyFines] PRIMARY KEY CLUSTERED 
(
 [surveyFinesID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 75) ON [PRIMARY]
) ON [PRIMARY]
Run Code Online (Sandbox Code Playgroud)

调查标签

CREATE TABLE [dbo].[surveyTags](
 [seq] [int] IDENTITY(1,1) NOT NULL,
 [surveyID] [int] NOT NULL,
 [tagDescID] [int] NOT NULL,
 [tagStatus] [int] NULL,
 [scopesev] [varchar](5) NOT NULL,
 [comments] [varchar](1000) NULL,
 [clearedDate] [datetime] NULL,
 [updated] [datetime] NULL CONSTRAINT [DF_surveyTags_updated]  DEFAULT (getdate()),
 CONSTRAINT [PK_tagMain] PRIMARY KEY CLUSTERED 
(
 [seq] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
Run Code Online (Sandbox Code Playgroud)

Gil*_*anc 6

我想做的是想出一种方法,让我们能够利用我们拥有的数据 - 佛罗里达州六月份的标签数量是多少?有多少设施按时交付文件?与去年相比,今年第一季度发生了多少次年度(意外)调查?

尺寸是一个测量范围.测量范围可以是连续的,如日期,也可以是离散的,如设施.在您的问题中,维度分别是设施和日期,日期/时间和日期.

唯一的方法是你可以回答"佛罗里达州六月份有多少标签?"的问题.是将标签与设施和标签与日期相关联.

唯一的方法是你可以回答"有多少设施按时交付文件?" 是将文件交付与设施和设施的日期相关联.

您应该使用您期望数据仓库回答的其他问题或查询来执行相同的分析过程.

事实是一个实体或对象.标签是一个事实.文档传递是一个事实.一旦数据仓库加载,事实几乎总是不可变的.

至于您的架构,我必须更多地研究它以提供具体的建议,但一般来说,您希望使用星型架构.恒星的中心是你的事实,实体和物体.构成星形点的表格是维度表格.

您需要做的第一件事是分离您的事实和您的维度.您的实体表都不应包含日期,位置代码或您确定的任何其他维度.但是,事实表将包含日期表,位置表或其他维表的外键.

您可能还需要汇总表.汇总表包含与事实表相同的列,并在不同维度上添加一个或多个总和.例如,问题是"佛罗里达州六月份的标签数量是多少?" 如果您已经拥有2010年6月的佛罗里达州(或者更确切地说,佛罗里达州的每个设施)的标签总和,那么可以更快地回答.

您总结的时间取决于您期望的混合查询.在您的数据仓库中,日期可能太短.换句话说,在SQL中进行摘要和选择摘要行一样快.

你也需要一个日历表.日历表提出的问题是,"今年第一季度与去年(第一季度)相比发生了多少年(惊喜)调查?" 更容易查询.