Run them all through a min-heap of size 100: for each input number k
, replace the current min m
with max(k, m)
. Afterwards the heap holds the 100 largest inputs.
A search engine like Lucene can use this method, with refinements, to choose the most-relevant search answers.
Edit: I fail the interview -- I got the details wrong twice (after having done this before, in production). Here's code to check it; it's almost the same as Python's standard heapq.nlargest()
:
import heapq
def funnel(n, numbers):
if n == 0: return []
heap = numbers[:n]
heapq.heapify(heap)
for k in numbers[n:]:
if heap[0] < k:
heapq.heapreplace(heap, k)
return heap
>>> funnel(4, [3,1,4,1,5,9,2,6,5,3,5,8])
[5, 8, 6, 9]
Notes:
>>> help(heapq)
Help on module heapq:
NAME
heapq - Heap queue algorithm (a.k.a. priority queue).
FILE
c:\python26\lib\heapq.py
DESCRIPTION
Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for all k, counting elements from 0. For the sake of comparison, non-existing elements are considered to be infinite. The interesting property of a heap is that a[0] is always its smallest element.
Usage:
heap = [] # creates an empty heap
heappush(heap, item) # pushes a new item on the heap
item = heappop(heap) # pops the smallest item from the heap
item = heap[0] # smallest item on the heap without popping it
heapify(x) # transforms list into a heap, in-place, in linear time
item = heapreplace(heap, item) # pops and returns smallest item, and adds
# new item; the heap size is unchanged
This one differs from textbook heap algorithms as follows:
- use 0-based indexing. This makes the relationship between the index for a node and the indexes for its children slightly less obvious, but is more suitable since Python uses 0-based indexing.
- Our heappop() method returns the smallest item, not the largest.
These two make it possible to view the heap as a regular Python list without surprises: heap[0] is the smallest item, and heap.sort() maintains the heap invariant!
For all the function in heapq:
>>> heapq.__all__
['heappush', 'heappop', 'heapify', 'heapreplace', 'merge', 'nlargest', 'nsmallest', 'heappushpop']
nlargest(n, iterable, key=None) # Find the n largest elements in a dataset.
Equivalent to: sorted(iterable, key=key, reverse=True)[:n]
nsmallest(n, iterable, key=None) # Find the n smallest elements in a dataset.
# Equivalent to: sorted(iterable, key=key)[:n]
Some useful links:
http://stackoverflow.com/questions/2550624/retrieving-the-top-100-numbers-from-one-hundred-million-of-numbers
http://stackoverflow.com/questions/2486093/millions-of-3d-points-how-to-find-the-10-of-them-closest-to-a-given-point/2486341#2486341
♥ ¸¸.•*¨*•♫♪♪♫•*¨*•.¸¸♥
No comments:
Post a Comment