Hi
This block of html is from a web store that needs to be migrated. It shows the catalog categories:
<!-- Begin Breadcrumb --><div class="text"><a href="/Pages/default.aspx">Home</a>> <a id="ctl00_PlaceHolderTitleBreadcrumb_ProductsLink" href="search2.aspx#PNavDS=Np:All" href="javascript:__doPostBack('ctl00$PlaceHolderTitleBreadcrumb$ProductsLink','')">Products & Services</a> ><a href="search2.aspx#PNavDS=N:4294943529" id="ctl00_PlaceHolderTitleBreadcrumb_lnk_CatName" class="capitalize">communications</a> ><a href="search2.aspx#PNavDS=N:4294943529-4289402148" id="ctl00_PlaceHolderTitleBreadcrumb_lnk_SubCatName" class="capitalize">mobility devices</a> ><a href="search2.aspx#PNavDS=N:4294943529-4289402148-4289401514" id="ctl00_PlaceHolderTitleBreadcrumb_lnk_ProdLineName" class="capitalize">gps / navigation devices</a> >
I have been trying to get the categories out, just the text is fine. Ideally as part of a script that runs against every page of every product, the result would be:
$category1= "communications"
$category2= "mobility devices"
$category3= "gps / navigation devices"
Across the website, this part of the html looks exactly the same for every product page, with the exception of the categories values. In the above example, I am getting stuck when I try to get the text between the specific characters mostly because of the number of " , = and > characters. I know that I must precede each non-standard char with \ but this still fails and I suspect it is because I wish to get the text that lies between:
"ctl00_PlaceHolderTitleBreadcrumb_lnk_ProdLineName"class="capitalize">
and
</a>
and I suspect that there are multiple instances of </a>
Looking at the code, can anybody suggest a way to extract that text bearing in mind it is variable and as can be seen, it has the occasional / or \ as valid and in the text?
Any help is appreciated.
Many thanks
Gund