Probably am not the first, but I didn't see it mentioned when I searched for it, so here goes nothing
I am trying to get this wonderful list into a meraki (not mine), but alas it only accepts the domains...
I have stripped the 0.0.0.0 easy, but there are so many comments in there, especially some ancient ones, and thus far I don't feel like I am making any headway.
Has someone already done this or have a nice tool to do this and I am just wasting my time here?
Hello! Thank you for opening your first issue in this repo. It鈥檚 people like you who make these host files better!
Hello and welcome here,
There are many other tools to do the job but here is how I will do it...
$ awk '{if(NF > 0 && substr($1,1,1) != "#") print $2 | "sort -u -i"}' file
To understand the awk:
if (NF > 0) # If the line is not empty
if (substr($1,1,1) != "#") # If the first character is not `#`
md5-981ec4336f4887cc4c2392f52e20af65
print $2 # Print the second column... So in our case the sub-domain.
md5-981ec4336f4887cc4c2392f52e20af65
| "sort -u -i" # Sort the list
Cheers,
Nissar
sed is probably the easiest way to doit
https://www.gnu.org/software/sed/manual/sed.html
I haven't tried it, but this should get you pretty close:
sed -e 's/^0\.0\.0\.0 //' \
-e 's/^127\..*//' \
-e 's/^fe80::.*//' \
-e 's/^ff0[02]::.*//' \
-e 's/#.*//' \
-e 's/^ *//' \
-e 's/ *$//' \
-e 's/^www\.www\.//' \
-e '/^$/d' \
filename > newfilename
-e 's/^0\.0\.0\.0 //' \ removes the leading "0.0.0.0 "
-e 's/^127\..*//'
-e 's/^fe80::.*//'
-e 's/^ff0[02]::.*//' \ blanks any line that starts with "127.", "fe80::", "ff00::" or "ff02::"
(".*" matches everything to the end of the line)
-e 's/#.*//' \ removes the comment
-e 's/^ *//' \ removes leading spaces
-e 's/ *$//' \ removes trailing spaces
-e 's/^www\.www\.//' \ they started tacking "www" on the front of blacklisted hosts without checking if it already starts with "www." so remove all the "www.www." noise
-e '/^$/d' removes empty lines
And on a related note.. I'd rather have all the "*.example.com" names together instead of sorting by host name. Does anyone have an easier way to sort by domain name?
$ cat fqdnsort
#!/usr/bin/gawk -f
# gnu awk script to sort by domain name
# requires the gawk extension PROCINFO["sorted_in"] for array sorting
#
{
sub("\r", "", $0) # get rid of trailing ^M
sub("#.*", "", $0) # remove inline comments
sub("^[ \t]*", "", $0) # remove leading whitespace
n = index($0, "/")
if ( n > 0 ) { # remove path info
url = substr($0, 1, n-1) # everything to the left of the /
} else {
url = $0
}
sub("[ \t]*$", "", url) # remove trailing whitespace
if ( url == "" ) next
n = split($0, a, "."); rev = "";
for ( i = n; i > 0; i-- ) { rev = rev a[i] "." }
fqdn[rev] = url
}
END {
PROCINFO["sorted_in"] = "@ind_str_asc"
# Order by indices in ascending order compared as strings
for (s in fqdn )
printf("%s\n", fqdn[s] )
}
-e 's/^www.www.//' they started tacking "www" on the front of blacklisted hosts without checking if it already starts with "www." so remove all the "www.www." noise
I've noticed that too as of late. PING Steven @StevenBlack. If you don't mind discussing this with whoever did the code to add www. so he can do the necessary adjustments and avoid the creation of double www. domains. Thank you 馃憤
@dnmTX thanks for that, I was not aware.
Fixing that now, on 30 hosts in my data file, introduced in commit 36fab9ad8f6518024e2c4def34a3b1f1a73e2f59 on May 24th 2019.
I will attempt the 2 suggestions here hopefully in the upcoming week, so far the other methods have left me rather annoyed as there are so many instances where it doesn't follow the same inclusion policy for find and remove type operations.
Thanks Steve @StevenBlack. I was planing to open a issue about it but couldn't find the time i guess as it's too busy as of late.
So i'm glad that somebody else mentioned it( @ler762 ),perfect opportunity to PING you 馃槈.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 daysif no further activity occurs. Thank you for your contributions.
Closing.
Most helpful comment
@dnmTX thanks for that, I was not aware.
Fixing that now, on 30 hosts in my data file, introduced in commit 36fab9ad8f6518024e2c4def34a3b1f1a73e2f59 on May 24th 2019.