代码见 https://github.com/jiangzhuti/IRResolver
有一些区间数据:
$ head data.txt
3640-3610(s, sh) O-H stretch, free hydroxyl alcohols, phenols
3500-3200(s,b) O–H stretch, H–bonded alcohols, phenols
3400-3250(m) N–H stretch 1°, 2° amines, amides
3300-2500(m) O–H stretch carboxylic acids
3330-3270(n, s) –C≡C–H: C–H stretch alkynes (terminal)
3100-3000(s) C–H stretch aromatics
3100-3000(m) =C–H stretch alkenes
3000-2850(m) C–H stretch alkanes
2830-2695(m) H–C=O: C–H stretch aldehydes
2260-2210(v) C≡N stretch nitriles
要求给出一个数值,输出所有包含该数值的区间信息。
利用STL set构建查询树。需要注意两点:
这可以说是一个“很有用,却一直没有实现”的功能,即用户提供自定义比较函数,用来在关联容器中查询一个和Key不同类型的参数。好在从c++14开始,std::less
做了void
特化,能够十分方便的支持这个功能,在之前只能用boost的侵入式容器来实现,当然不嫌麻烦的话也可以手撸一个查找树。cppreference上有一个简单的例子:https://en.cppreference.com/w/cpp/container/set/find 。
利用这一特性,我们可以做出定义:
struct IRFatKey
{
unsigned min_freq;
unsigned max_freq;
std::vector<std::shared_ptr<std::string>> records;
IRFatKey() = default;
IRFatKey(const IRFatKey& other) = default;
};
friend bool operator<(const IRFatKey& k1, const IRFatKey& k2)
{
return k1.max_freq < k2.min_freq;
}
friend bool operator<(const IRFatKey& k1, const unsigned& k2)
{
return k1.max_freq < k2;
}
friend bool operator<(const unsigned& k1, const IRFatKey& k2)
{
return k1 < k2.min_freq;
}
std::set<IRFatKey, std::less<>> ir_set;
思路就是每一次插入新的区间之前,先find
查找一下,没有查到,则直接插入;如果有结果,说明待插入的数据和查找结果之间存在重叠,然后取出查找结果,跟待插入的区间取交集,分为(最大情况)三个区间;然后三个区间依次递归进行上述插入操作:
void IRResolver::do_insert(const IRFatKey& fat_key)
{
auto iter = ir_set.find(fat_key);
if (iter != ir_set.end()) {
auto k(*iter);
ir_set.erase(iter);
std::cout << "erased: " << k.min_freq << "-" << k.max_freq << "\n";
IRFatKey left, intersection, right;
intersection.min_freq = (fat_key.min_freq < k.min_freq) ?
k.min_freq : fat_key.min_freq;
intersection.max_freq = (fat_key.max_freq < k.max_freq) ?
fat_key.max_freq : k.max_freq;
for (const auto& r : fat_key.records) {
intersection.records.push_back(r);
}
for (const auto& r : k.records) {
intersection.records.push_back(r);
}
const IRFatKey* l = nullptr;
const IRFatKey* r = nullptr;
if (fat_key.min_freq < intersection.min_freq) {
l = &fat_key;
} else if (k.min_freq < intersection.min_freq) {
l = &k;
}
if (fat_key.max_freq > intersection.max_freq) {
r = &fat_key;
} else if (k.max_freq > intersection.max_freq) {
r = &k;
}
if (l != nullptr) {
left.min_freq = l->min_freq;
left.max_freq = intersection.min_freq - 1;
for (const auto& r : l->records) {
left.records.push_back(r);
}
do_insert(left);
}
if (r != nullptr) {
right.min_freq = intersection.max_freq + 1;
right.max_freq = r->max_freq;
for (const auto& r : r->records) {
right.records.push_back(r);
}
do_insert(right);
}
do_insert(intersection);
} else {
if (fat_key.records.empty()) {
std::cout << "!!! " << fat_key.min_freq << " " << fat_key.max_freq << std::endl;
}
auto ret = ir_set.insert(fat_key);
if (!ret.second) {
std::cerr << "fat_key failed!\n";
std::cerr << "fat_key.min = " << fat_key.min_freq << " "
<< "fat_key.max = " << fat_key.max_freq << "\n";
exit(0);
}
std::cout << "inserted: " << fat_key.min_freq << "-" << fat_key.max_freq << "\n";
}
}