Skip to content. | Skip to navigation

You are here: Home Blog Similarity-Enhanced Transfer and Live-Spins

Similarity-Enhanced Transfer and Live-Spins

Filed Under:

Similarity-Enhanced Transfer (SET) looks like it could prove very useful for efficiently sharing collections of Live-Spins without having to re-download an entire ISO image for every desired Live-Spin.

What is Similarity-Enhanced Transfer?

After a brief skim, it seems that SET is a concept similar to BitTorrent but without arbitrary chunking of data. By using handprinting both similar and exact match chunks can be identified and utilized in the download process. The concept looks very interesting and I'm hoping to set aside some time to work on proof of concept code in the near future. I would also like to extend an invitation to the community to help develop and prove the viability of such a solution for mass-hosting of Live-Spins and Live-Spin collections, such as localized spins based on the same package set. We could easily setup an upstream git repo (likely on fedorahosted) or we could just add a branch to the existing pyJigdo repo and get right to work.

Why should this Concept Even be Considered?

Well, I'll quote the abstract and hope it's enough to encourage reading the entire paper:

"Many contemporary approaches for speeding up large file transfers attempt to download chunks of a data object from multiple sources. Systems such as BitTorrent quickly locate sources that have an exact copy of the desired object, but they are unable to use sources that serve similar but non-identical objects. Other systems automatically exploit cross-file similarity by identifying sources for each chunk of the object. These systems, however, require a number of lookups proportional to the number of chunks in the object and a mapping for each unique chunk in every identical and similar object to its corresponding sources. Thus, the lookups and mappings in such a system can be quite large, limiting its scalability.

This paper presents a hybrid system that provides the best of both approaches, locating identical and similar sources for data objects using a constant number of lookups and inserting a constant number of mappings per object. We first demonstrate through extensive data analysis that similarity does exist among objects of popular file types, and that making use of it can sometimes substantially improve download times. Next, we describe handprinting, a technique that allows clients to locate similar sources using a constant number of lookups and mappings. Finally, we describe the design, implementation and evaluation of Similarity-Enhanced Transfer (SET), a system that uses this technique to download objects. Our experimental evaluation shows that by using sources of similar objects, SET is able to significantly out-perform an equivalently configured BitTorrent."


Himabindu Pucha, David G. Andersen, Michael Kaminsky
Purdue University, Carnegie Mellon University, Intel Research Pittsburgh 

Read the whole thing.


Document Actions