My new app AfsaneDB (Beta) is now in PlayStore!

Those who love reading classic literature can now enjoy literary masterpieces in this beautifully designed app.

Thursday 24 December 2020

Rekhta Content Scraper by Shakeeb Ahmad | For Programmers Only


Note: This is not yet available for non-programmers. Soon I'll make an easy-to-use version for all, iA.

This scraper with Node.js works for both prose and poetry. Check the GitHub repo for installation instructions.

You would need a text file with all the links you want to download the contents from. To get the list of links, you could manually collect all which interests you, or use the following to scrape all links from an author/poet page.

Bookmarklets - One Click Solution to get the links etc.

Rekhta loads 50 links at a time, and if user scrolls, it adds more content to the DOM. This extra fetch has not been automated in my code yet. (Well I tried, but parsing it wasted so much time that I preferred using manual scroll. Just let the page load, then press "end" on your keyboard. Wait for a moment, it will add all the remaining links.)

Anyway, once you have the complete list on the page, you can use the bookmarklets below to copy all of them with a click.

I've been testing this in browser console for a while now, i.e. open browser console, then paste the script, then change the page text to only what I need, then select and copy them manually. 

Later on I decided to use magic of bookmarklets to automate these tasks I've been doing repeatedly: 

  • Copy all the links from the Poet/Author page.
  • For LitUrdu specifically, turn them into an "object" with required properties (title, author, link, description, text) and copy it.
  • Use the "object" to automatically fill-in text-boxes on new Blogger post.
Ultimate plan is to use Blogger API and post it directly, but this bookmarklet approach doesn't hurt much because most of the things I'm doing are just a click away.

Bookmarklets
Drag and drop the links to the bookmarks bar in your browser. (Ctrl+Shift+b to toggle the bar)
Use on author/poet's page to copy all the links to their enlisted work
Use on individual poem/story page to copy an object with properties (title, author, link, description, text). Modify as per your needs.
Use on a new Blogger post after pasting the "object" from rekhta in console. This will fill in all the required fields in the new post automatically.

Shakeeb Ahmad Maharashtra, India

Shakeeb Ahmad is a blogger, poet, enthusiast programmer, student of comparative religion and psychology, public speaker, singer and Vedic Maths expert. He loves playing with the numbers and invented a shortcut method to square the numbers at the age of 16. In sports, football is root to his happiness. He lives it.

No comments:

Post a Comment