Re: [dev] Suckless web crawlers

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Sagar Acharya <sagaracharya_AT_tutanota.com>
Date: Tue, 26 Sep 2023 19:53:42 +0200 (CEST)

It would not be as easy as that. One would have to rank the page, search for keywords for getting the page of relevant words are typed in search box.

Thanking you
Sagar Acharya
https://humaaraartha.in/selfdost/selfdost.html

26 Sept 2023, 22:39 by d.tonitch_AT_gmail.com:

> I don't know exactly what you expect from your web crawler but let's say you want to index every link there is on a page.
>
> you can just curl the page then grep for any link and for each link redo the operation...
>
> This can easily be done with a small bash script (or c program if you want it to be [insert here why you would want that])
>
> I can't personally recommend any crawler as I would clearly do it that way.
>
> Regards.
>
> Debucquoy Anthony (tonitch)
>
> On 9/26/23 14:13, Sagar Acharya wrote:
>
>> Which web crawlers and indexing tools does suckless suggest?
>>
>> The ones I searched for, the best I could find was xapian and it required targeted indexing I guess, i.e. for html, documents, etc.
>>
>> Which crawlers and indexers do you suggest?
>>
>>
>> Thanking you
>> Sagar Acharya
>> https://humaaraartha.in/selfdost/selfdost.html
>>
Received on Tue Sep 26 2023 - 19:53:42 CEST

This archive was generated by hypermail 2.3.0 : Tue Sep 26 2023 - 20:00:09 CEST