Lists making software ?

Diaze

New member
Jun 15, 2011
75
5
0
Hi guys,

I thought i would drop by and ask for advice.

I have a big list of url and i would like to make shorter lists out of this big file.

The source file has a lot of duplicate domains (not duplicate url) :

Domain1.com/XX
Domain2.com/XX
Domain1.com/XX2
Domain2.com/XX2

And i want the small lists to be unique :

List 1 :
Domain1.com/XX
Domain2.com/XX

List 2 :
Domain1.com/XX2
Domain2.com/XX2

Is there any software that does that ?
Thank you :banana_sml: :banana_sml: :banana_sml:

Ps : Inb4 CCarter
 
Last edited:


Might be best to head to the BST and look for a developer who can do a custom script for you.

If this is for a bot then maybe it would make sense to generate a list randomly when accessed instead of creating multiple files?
 
Open in excel -> click on the column -> click Data -> click sort.

Sort by - Column A
Sort on - Values
Order - A to Z

-

This will break down the data domain wise. You can cut/paste this into seperate columns or seperate excel sheets.
 
Hey bro, you can make lists like this:

Code:
mylist = []
mylist.append("Domain1.com/XX")
 
What are you willing to pay for such a program? I could do make a program to do this. But its not worth the trouble if your have $10.

Also if this is a one time deal why don't you just do it in excel?
 
Open in excel -> click on the column -> click Data -> click sort.

Sort by - Column A
Sort on - Values
Order - A to Z

-

This will break down the data domain wise. You can cut/paste this into seperate columns or seperate excel sheets.



Doing it a la mano would take hours, there are about +50K urls on the feed list.

Also the random solution isn't the best, i need the url to be unique and there's a chance that i get an already used url if i go the random route.

The mylist.append solution would just take the urls from the same domains and make a list for this domain :/

Going to look into the few solutions given and post back later.

Thank you guys, really appreciate the help.
 
ScrapeBox does that.
edit: o wait, didn't read the OP fully, it just filters duplicates..
 
Split each url into 2 parts at the hostname.

If you know they are full URLs, you can split at the 3rd '/' it finds.

Sort and then partition the list by the 2nd part.
 
By the way, the XX and XX2 doesn't mean the end of the url are the same, i know you guys are not stupid but i wouldn't want to make it wrong, my bad !

Matt you rock, thank you for the code :o Just a quick question, the end of the urls must match ?
 
The path must be the same, that is, what is before any "?" or "#". Your OP was a little ambiguous, so I guessed what you wanted.
 
So you want a given domain to only appear in a certain list once? And to keep making new lists as long as there are remaining urls?
 
Parlez-vous python?

https://gist.github.com/mattseh/5382797

Code:
['http://www.lemonde.fr/economie/article/2013/04/13/les-venezueliens-sont-las-des-penuries_3159283_3234.html', 'http://www.lefigaro.fr/international/2013/04/14/01003-20130414ARTFIG00061-palestine-la-demission-de-salam-fayyad-est-un-coup-dur-pour-les-etats-unis.php']
['http://www.lemonde.fr/politique/article/2013/04/14/commandes-de-sondages-fillon-vise-par-une-enquete-preliminaire_3159533_823448.html', 'http://www.lefigaro.fr/flash-eco/2013/04/14/97002-20130414FILWWW00063-fillon-ne-votera-pas-la-loi-de-moralisation.php']
 
That's it ! You totally nailed it ! Each list having unique urls and unique domains. If a url is used in a list it disappears from the feed list.

Edit : Oh yeah, you rock bro. How can i thank you ?

French : C'est totalement ça ! T'assures mec !