感谢user3125280,DW和Evgeny Kluev,问题已更新.
我有一个网页列表,我必须经常下载它们,每个网页都有不同的下载频率.基于此频率,我们将网页分为5组:
Items in group 1 are downloaded once per 1 hour
items in group 2 once per 2 hours
items in group 3 once per 4 hours
items in group 4 once per 12 hours
items in group 5 once per 24 hours
Run Code Online (Sandbox Code Playgroud)
这意味着,我们必须在1小时内下载所有组1的网页,所有组2在2小时内等.
我正在尝试制作算法.作为输入,我有:
a)DATA_ARR=一个有5个数字的数组.每个数字代表该组中的项目数.
b)TIME_ARR=一个包含5个数字(1,2,4,12,24)的数组,表示项目下载的频率.
b)X=每小时下载的网页总数.这是使用items_in_group/download_frequently并向上舍入计算的.If we have 15 items in group 5, and 3 items in group 4, this will be 15/24 + 3/12 = 0.875 and rounded is 1.
每个小时我的程序必须在最大X站点下载.我希望算法输出如下:
Hour 1: A1 B0 C4 D5
Hour 2: A2 B1 C2 D2
...
Run Code Online (Sandbox Code Playgroud)
A1 =第1组的第2项
C0 =第3组的第1项
我的算法必须尽可能高效.这意味着:
a)模式必须可以扩展到至少200+小时
b)不需要创建可重复的模式
c)在可能的情况下需要空间以便使用绝对最小带宽
d)永远不会比更新频率更频繁地下载项目, 没有例外
例:
group 1: 0 items | once per 1 hour
group 2: 3 items | once per 2 hours
group 3: 4 items | once per 4 hours
group 4: 0 items | once per 12 hours
group 5: 0 items | once per 24 hours
Run Code Online (Sandbox Code Playgroud)
我们计算每小时可以采取的物品数量: 3/2+4/4 = 2.5. We round this upwards and it's 3.
使用铅笔和纸,我们可以找到以下解决方案:
Hour 1: B0 C0 B1
Hour 2: B2 C1 c2
Hour 3: B0 C3 B1
Hour 4: B2
Hour 5: B0 C0 B1
Hour 6: B2 C1 c2
Hour 7: B0 C3 B1
Hour 8: B2
Hour 9: B0 C0 B1
Hour 10: B2 C1 c2
Hour 11: B0 C3 B1
Hour 12: B2
Hour 13: B0 C0 B1
Hour 14: B2 C1 c2
and continue the above.
Run Code Online (Sandbox Code Playgroud)
我们采取C0,C1 C2和C3每4小时一次.我们还采取B0,B1并且B2每2小时一次.
问题:请解释一下,如何设计一个能够下载项目的算法,同时使用绝对最小下载次数?蛮力不是解决方案,算法必须是高效的CPU,因为元素的数量可能很大.
您可以阅读此处发布的答案:https://cs.stackexchange.com/a/19422/12497以及user3125280发布的答案.
use*_*280 11
您的问题是典型的调度问题.这些问题在计算机科学中得到了很好的研究,因此需要咨询大量的文献.
代码有点像Deficit循环法,但有一些简化.首先,我们通过添加data_to_process变量来自己提供队列.其次,队列只是迭代一个值列表.
一个区别是,除了数学错误之外,此解决方案将获得您想要的最佳值.
粗略草图:尚未编译(c ++ 11)基于unix的规范代码
#include <iostream>
#include <vector>
#include <numeric>
#include <unistd.h>
//#include <cmath> //for ceil
#define TIME_SCALE ((double)60.0) //1 for realtime speed
//Assuming you are not refreshing ints in the real case
template<typename T>
struct queue
{
const std::vector<T> data; //this will be filled with numbers
int position;
double refresh_rate; //must be refreshed ever ~ hours
double data_rate; //this many refreshes per hour
double credit; //amount of refreshes owed
queue(std::initializer_list<T> v, int r ) :
data(v), position(0), refresh_rate(r), credit(0) {
data_rate = data.size() / (double) refresh_rate;
}
int getNext() {
return data[position++ % data.size()];
}
};
double time_passed(){
static double total;
//if(total < 20){ //stop early
usleep(60000000 / TIME_SCALE); //sleep for a minute
total += 1.0 / 60.0; //add a minute
std::cout << "Time: " << total << std::endl;
return 1.0; //change to 1.0 / 60.0 for real time speed
//} else return 0;
}
int main()
{
//keep a list of the queues
std::vector<queue<int> > queues{
{{1, 2, 3}, 2},
{{1, 2, 3, 4}, 3}};
double total_data_rate = 0;
for(auto q : queues) total_data_rate += q.data_rate;
double data_to_process = 0; //how many refreshes we have to do
int queue_number = 0; //which queue we are processing
auto current_queue = &queues[0];
while(1) {
data_to_process += time_passed() * total_data_rate;
//data_to_process = ceil(data_to_process) //optional
while(data_to_process >= 1){
//data_to_process >= 0 will make the the scheduler more
//eager in the first time period (ie. everything will updated correctly
//in the first period and and following periods
if(current_queue->credit >= 1){
//don't change here though, since credit determines the weighting only,
//not how many refreshes are made
//refresh(current_queue.getNext();
std::cout << "From queue " << queue_number << " refreshed " <<
current_queue->getNext() << std::endl;
current_queue->credit -= 1;
data_to_process -= 1;
} else {
queue_number = (queue_number + 1) % queues.size();
current_queue = &queues[queue_number];
current_queue->credit += current_queue->data_rate;
}
}
}
return 0;
}
Run Code Online (Sandbox Code Playgroud)
现在应该在gcc上使用--std = c ++ 11编译该示例,并为您提供所需内容.
这里是测试用例输出:(对于非时间缩放的早期代码)
Time: 0
From queue 1 refreshed 1
From queue 0 refreshed 1
From queue 1 refreshed 2
Time: 1
From queue 0 refreshed 2
From queue 0 refreshed 3
From queue 1 refreshed 3
Time: 2
From queue 0 refreshed 1
From queue 1 refreshed 4
From queue 1 refreshed 1
Time: 3
From queue 0 refreshed 2
From queue 0 refreshed 3
From queue 1 refreshed 2
Time: 4
From queue 0 refreshed 1
From queue 1 refreshed 3
From queue 0 refreshed 2
Time: 5
From queue 0 refreshed 3
From queue 1 refreshed 4
From queue 1 refreshed 1
Run Code Online (Sandbox Code Playgroud)
作为扩展,通过允许此调度程序仅完成第一个lcm(update_rate*lcm(...刷新率...),ceil(update_rate))步骤,然后重复该模式来回答重复模式问题.
另外:由于小时边界的要求,这确实有时无法解决.当我使用您无法解决的示例,并修改time_passed以返回0.1时,计划将每1.1小时更新一次(仅在小时边界处!).