Kor*_*gay 21 java arrays binary-search
总而言之,挑战如下:
Hackerland是一维城市Ñ房屋,其中每个房子我位于一些X 我在x轴上.市长希望在城市房屋的屋顶上安装无线电发射器.每个发射机具有一个范围,ķ,这意味着它可以将信号发送到所有的房屋≤ ķ单位距离的路程.
鉴于Hackerland的地图和k的值,您能找到覆盖每个房屋所需的最小数量的发射器吗?
我的实现如下:
package biz.tugay;
import java.util.*;
public class HackerlandRadioTransmitters {
public static int minNumOfTransmitters(int[] houseLocations, int transmitterRange) {
// Sort and remove duplicates..
houseLocations = uniqueHouseLocationsSorted(houseLocations);
int towerCount = 0;
for (int nextHouseNotCovered = 0; nextHouseNotCovered < houseLocations.length; ) {
final int towerLocation = HackerlandRadioTransmitters.findNextTowerIndex(houseLocations, nextHouseNotCovered, transmitterRange);
towerCount++;
nextHouseNotCovered = HackerlandRadioTransmitters.nextHouseNotCoveredIndex(houseLocations, towerLocation, transmitterRange);
if (nextHouseNotCovered == -1) {
break;
}
}
return towerCount;
}
public static int findNextTowerIndex(final int[] houseLocations, final int houseNotCoveredIndex, final int transmitterRange) {
final int houseLocationWeWantToCover = houseLocations[houseNotCoveredIndex];
final int farthestHouseLocationAllowed = houseLocationWeWantToCover + transmitterRange;
int towerIndex = houseNotCoveredIndex;
int loop = 0;
while (true) {
loop++;
if (towerIndex == houseLocations.length - 1) {
break;
}
if (farthestHouseLocationAllowed >= houseLocations[towerIndex + 1]) {
towerIndex++;
continue;
}
break;
}
System.out.println("findNextTowerIndex looped : " + loop);
return towerIndex;
}
public static int nextHouseNotCoveredIndex(final int[] houseLocations, final int towerIndex, final int transmitterRange) {
final int towerCoversUntil = houseLocations[towerIndex] + transmitterRange;
int notCoveredHouseIndex = towerIndex + 1;
int loop = 0;
while (notCoveredHouseIndex < houseLocations.length) {
loop++;
final int locationOfHouseBeingChecked = houseLocations[notCoveredHouseIndex];
if (locationOfHouseBeingChecked > towerCoversUntil) {
break; // Tower does not cover the house anymore, break the loop..
}
notCoveredHouseIndex++;
}
if (notCoveredHouseIndex == houseLocations.length) {
notCoveredHouseIndex = -1;
}
System.out.println("nextHouseNotCoveredIndex looped : " + loop);
return notCoveredHouseIndex;
}
public static int[] uniqueHouseLocationsSorted(final int[] houseLocations) {
Arrays.sort(houseLocations);
final HashSet<Integer> integers = new HashSet<>();
final int[] houseLocationsUnique = new int[houseLocations.length];
int innerCounter = 0;
for (int houseLocation : houseLocations) {
if (integers.contains(houseLocation)) {
continue;
}
houseLocationsUnique[innerCounter] = houseLocation;
integers.add(houseLocationsUnique[innerCounter]);
innerCounter++;
}
return Arrays.copyOf(houseLocationsUnique, innerCounter);
}
}
Run Code Online (Sandbox Code Playgroud)
我很确定这个实现是正确的.但请查看函数中的详细信息:findNextTowerIndex和nextHouseNotCoveredIndex:它们逐个遍历数组!
我的一项测试如下:
static void test_01() throws FileNotFoundException {
final long start = System.currentTimeMillis();
final File file = new File("input.txt");
final Scanner scanner = new Scanner(file);
int[] houseLocations = new int[73382];
for (int counter = 0; counter < 73382; counter++) {
houseLocations[counter] = scanner.nextInt();
}
final int[] uniqueHouseLocationsSorted = HackerlandRadioTransmitters.uniqueHouseLocationsSorted(houseLocations);
final int minNumOfTransmitters = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, 73381);
assert minNumOfTransmitters == 1;
final long end = System.currentTimeMillis();
System.out.println("Took: " + (end - start) + " milliseconds..");
}
Run Code Online (Sandbox Code Playgroud)
其中input.txt可以从这里下载.(这不是这个问题中最重要的细节,但仍然是..)所以我们有73382个房子的阵列,我故意设置发射器范围,所以我循环的方法很多:
以下是我的机器中此测试的示例输出:
findNextTowerIndex looped : 38213
nextHouseNotCoveredIndex looped : 13785
Took: 359 milliseconds..
Run Code Online (Sandbox Code Playgroud)
我也有这个测试,它没有断言任何东西,只是保持时间:
static void test_02() throws FileNotFoundException {
final long start = System.currentTimeMillis();
for (int i = 0; i < 400; i ++) {
final File file = new File("input.txt");
final Scanner scanner = new Scanner(file);
int[] houseLocations = new int[73382];
for (int counter = 0; counter < 73382; counter++) {
houseLocations[counter] = scanner.nextInt();
}
final int[] uniqueHouseLocationsSorted = HackerlandRadioTransmitters.uniqueHouseLocationsSorted(houseLocations);
final int transmitterRange = ThreadLocalRandom.current().nextInt(1, 70000);
final int minNumOfTransmitters = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, transmitterRange);
}
final long end = System.currentTimeMillis();
System.out.println("Took: " + (end - start) + " milliseconds..");
}
Run Code Online (Sandbox Code Playgroud)
我随机创建400个变送器范围,并运行程序400次..我将在我的机器中获得如下运行时间..
Took: 20149 milliseconds..
Run Code Online (Sandbox Code Playgroud)
所以现在,我说,为什么我不使用二进制搜索而不是走数组并改变我的实现如下:
public static int findNextTowerIndex(final int[] houseLocations, final int houseNotCoveredIndex, final int transmitterRange) {
final int houseLocationWeWantToCover = houseLocations[houseNotCoveredIndex];
final int farthestHouseLocationAllowed = houseLocationWeWantToCover + transmitterRange;
int nextTowerIndex = Arrays.binarySearch(houseLocations, 0, houseLocations.length, farthestHouseLocationAllowed);
if (nextTowerIndex < 0) {
nextTowerIndex = -nextTowerIndex;
nextTowerIndex = nextTowerIndex -2;
}
return nextTowerIndex;
}
public static int nextHouseNotCoveredIndex(final int[] houseLocations, final int towerIndex, final int transmitterRange) {
final int towerCoversUntil = houseLocations[towerIndex] + transmitterRange;
int nextHouseNotCoveredIndex = Arrays.binarySearch(houseLocations, 0, houseLocations.length, towerCoversUntil);
if (-nextHouseNotCoveredIndex > houseLocations.length) {
return -1;
}
if (nextHouseNotCoveredIndex < 0) {
nextHouseNotCoveredIndex = - (nextHouseNotCoveredIndex + 1);
return nextHouseNotCoveredIndex;
}
return nextHouseNotCoveredIndex + 1;
}
Run Code Online (Sandbox Code Playgroud)
我期待一个很好的性能提升,因为现在我将最多循环log(N)次,而不是O(N)..所以test_01输出:
Took: 297 milliseconds..
Run Code Online (Sandbox Code Playgroud)
记住,它是Took:359毫秒..之前.对于test_02:
Took: 18047 milliseconds..
Run Code Online (Sandbox Code Playgroud)
因此,对于二进制搜索实现,我总是在数组步行实现时获得大约20秒的值,在18到19秒时获得值.
我期待使用Arrays.binarySearch获得更好的性能提升,但显然事实并非如此,为什么会这样呢?我错过了什么?我是否需要超过73382的阵列才能看到好处,或者它是否无关紧要?
编辑#01
在@huck_cussler的评论之后,我尝试将我拥有的数据集加倍和三倍(随机数)并尝试运行test02(当然在测试本身中将数组大小增加三倍......).对于线性实现,时间如下:
Took: 18789 milliseconds..
Took: 34396 milliseconds..
Took: 53504 milliseconds..
Run Code Online (Sandbox Code Playgroud)
对于二进制搜索实现,我得到如下值:
Took: 18644 milliseconds..
Took: 33831 milliseconds..
Took: 52886 milliseconds..
Run Code Online (Sandbox Code Playgroud)
pha*_*ers 15
您的时间安排包括从硬盘驱动器中检索数据.这可能占用了大部分运行时间.省略时间上的数据负载,以便更准确地比较两种方法.想象一下,如果它需要18秒,你比较18.644对比18.789(改善0.77%)而不是0.644对比0.789(改善18.38%).
如果您有线性操作O(n),例如加载二进制结构,并将它与二进制搜索O(log n)组合,则最终得到O(n).如果您信任Big O表示法,那么您应该期望O(n + log n)与O(2*n)没有明显不同,因为它们都减少到O(n).
而且,取决于塔之间的房屋密度,二元搜索可以比线性搜索更好或更差.比如说,有1024个家庭,每4个家庭均匀分布一个塔.线性搜索将每塔步进4次,而二进制搜索将采用log2(1024)=每塔10步.
还有一件事......你的minNumOfTransmitters方法是从test_01和传递给它的已经排序的数组test_02.求助步骤比搜索本身花费的时间更长,这进一步模糊了两种搜索算法之间的时序差异.
======
我创建了一个小型计时课程,以便更好地了解正在发生的事情.我从minNumOfTransmitters中删除了代码行,以防止它重新运行排序,并添加了一个布尔参数来选择是否使用二进制版本.它总计400次迭代的总和,将每一步分开.我的系统上的结果表明,加载时间使排序时间相形见绌,这反过来使解决时间相形见绌.
Load: 22.565s
Sort: 4.518s
Linear: 0.012s
Binary: 0.003s
Run Code Online (Sandbox Code Playgroud)
很容易看出最后一步的优化如何在整体运行时间上没有太大差异.
private static class Timing {
public long load=0;
public long sort=0;
public long solve1=0;
public long solve2=0;
private String secs(long millis) {
return String.format("%3d.%03ds", millis/1000, millis%1000);
}
public String toString() {
return " Load: " + secs(load) + "\n Sort: " + secs(sort) + "\nLinear: " + secs(solve1) + "\nBinary: " + secs(solve2);
}
public void add(Timing timing) {
load+=timing.load;
sort+=timing.sort;
solve1+=timing.solve1;
solve2+=timing.solve2;
}
}
static Timing test_01() throws FileNotFoundException {
Timing timing=new Timing();
long start = System.currentTimeMillis();
final File file = new File("c:\\path\\to\\xnpwdiG3.txt");
final Scanner scanner = new Scanner(file);
int[] houseLocations = new int[73382];
for (int counter = 0; counter < 73382; counter++) {
houseLocations[counter] = scanner.nextInt();
}
timing.load+=System.currentTimeMillis()-start;
start=System.currentTimeMillis();
final int[] uniqueHouseLocationsSorted = HackerlandRadioTransmitters.uniqueHouseLocationsSorted(houseLocations);
timing.sort=System.currentTimeMillis()-start;
start=System.currentTimeMillis();
final int minNumOfTransmitters = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, 73381, false);
timing.solve1=System.currentTimeMillis()-start;
start=System.currentTimeMillis();
final int minNumOfTransmittersBin = HackerlandRadioTransmitters.minNumOfTransmitters(uniqueHouseLocationsSorted, 73381, true);
timing.solve2=System.currentTimeMillis()-start;
final long end = System.currentTimeMillis();
return timing;
}
Run Code Online (Sandbox Code Playgroud)
在您的时间测量中,您包含的操作比数组搜索慢得多.即文件系统I/O和数组排序.一般的I/O(从文件系统读取/写入,网络通信)比仅涉及CPU和RAM访问的操作慢几个数量级.
让我们以在每次循环迭代中不读取文件的方式重写您的测试:
static void test_02() throws FileNotFoundException {
final File file = new File("input.txt");
final Scanner scanner = new Scanner(file);
int[] houseLocations = new int[73382];
for (int counter = 0; counter < 73382; counter++) {
houseLocations[counter] = scanner.nextInt();
}
scanner.close();
final int rounds = 400;
final int[] uniqueHouseLocationsSorted = uniqueHouseLocationsSorted(houseLocations);
final int transmitterRange = 73381;
final long start = System.currentTimeMillis();
for (int i = 0; i < rounds; i++) {
final int minNumOfTransmitters = minNumOfTransmitters(uniqueHouseLocationsSorted, transmitterRange);
}
final long end = System.currentTimeMillis();
System.out.println("Took: " + (end - start) + " milliseconds..");
}
Run Code Online (Sandbox Code Playgroud)
请注意,在此版本的测试中,文件只读一次,之后开始计时.有了上述内容,我得到Took: 1700 milliseconds..(或多或少几毫秒)迭代版本和二进制搜索.所以我们仍然看不到二进制搜索更快.那是因为几乎所有的时间都用于对数组进行400次排序.
现在让我们删除从minNumOfTransmitters方法中对输入数组进行排序的行.我们在测试开始时对数组进行排序(一次).
现在我们可以看到事情要快得多.houseLocations = uniqueHouseLocationsSorted(houseLocations)从minNumOfTransmitters我得到的行中删除:Took: 68 milliseconds..为迭代版本.显然,由于此持续时间已经非常小,我们不会发现二进制搜索版本存在显着差异.
所以让我们将循环次数增加到:100000.
现在我得到Took: 2121 milliseconds..了迭代版本和Took: 36 milliseconds..二进制搜索版本.
因为我们现在隔离了我们测量的内容并专注于数组搜索,而不是包括慢得多的操作,我们可以注意到二进制搜索的性能(更好)的巨大差异.
如果您想查看二进制搜索进入其while循环的次数,您可以自己实现它并添加一个计数器:
private static int binarySearch0(int[] a, int fromIndex, int toIndex, int key) {
int low = fromIndex;
int high = toIndex - 1;
int loop = 0;
while (low <= high) {
loop++;
int mid = (low + high) >>> 1;
int midVal = a[mid];
if (midVal < key) {
low = mid + 1;
} else if (midVal > key) {
high = mid - 1;
} else {
return mid; // key found
}
}
System.out.println("binary search looped " + loop + " times");
return -(low + 1); // key not found.
}
Run Code Online (Sandbox Code Playgroud)
该方法是从JDK中的Arrays类复制的 - 我刚刚添加了循环计数器和println.
当要搜索的数组长度为73382时,循环仅输入16次.这正是我们所期望的:log(73382) =~ 16.
| 归档时间: |
|
| 查看次数: |
853 次 |
| 最近记录: |