October 28, 2019 · Basic Pen-Testing

1.3 : Basic Bash scripting - curl , cut , grep & regex (Part I)

Example: get sub-domains from www.example.com/index.html

String matching approach

  1. download file

    # download www.example.com to current dir
    wget www.example.com
    # show index.html file , permissions, size etc.
    ls -l index.html 
    # show file content
    cat index.html
    
    # well I prefer curl to download
    curl -L www.example.com > index.html
    
  2. analyse file content, get all urls

    # grep all href
    grep "href=" index.html
    
  3. extract / the 3rd field

    grep "href=" index.html | cut -d "/" -f 3
    

    a bit explain on the cut command,
    Take the following case as example:

    <li><a href="//www.example.com/aboutus.html">aboutus</a></li>
    

    the "/" deamintor break into the statement into pieces, start the count by 1:
    1 - <li><a href=
    2 - [empty]
    3 - www.example.com

  4. clean up only leave domain names filtering line contains a .

    grep "href=" index.html | cut -d "/" -f 3 | grep "\." # grep support regex, need \ for escape
    

    return

    www.example.com/abc
    abc.example.com
    def.example.com
    
  5. use cut command to extract first group to get rid of "

    grep "href=" index.html |cut -d "/" -f 3 | grep "\." | cut -d '"' -f 1
    
  6. use sort -u to sort & get unqiue content

    grep "href=" index.html |cut -d "/" -f 3 | grep "\." | cut -d '"' -f 1 | sort -u
    

    return

    abc.example.com
    def.example.com
    www.appdynamics.com
    www.facebook.com
    www.instagram.com
    www.linkedin.com
    www.webex.com
    www.youtube.com
    

Regex approach

grep only matching group , [^"] = a group of chars ^(not) include "

cat index.html | grep -o 'http://[^"]*' | cut -d "/" -f 3 | sort -u > list.txt

return

http://www.w3.org/2000/svg\
http://www.schema.org
http://www.w3.org/2000/svg
http://www.w3.org/2000/svg
http://www.w3.org/2000/svg
http://www.w3.org/2000/svg
http://www.w3.org/2000/svg
http://www.w3.org/2000/svg
http://www.w3.org/1999/xlink
http://www.w3.org/2000/svg
http://www.w3.org/2000/svg
http://www.w3.org/2000/svg

grep only matching REGEX group s? = s or no s

cat index.html | grep -o -E 'https?://[^"]*' | cut -d "/" -f 3 | sort -u > list.txt

return

abc.example.com
def.example.com