HackeRnews - specification

Hacker News

The hackeRnews package was created in order to simplify the process of getting data from Hacker News. Hacker News is a user-generated content website that focuses on stories related to computer science. The website is composed of user submitted stories where each one provides a link to the original data source. Moreover, users have the ability to upvote a story if they have found it interesting. Each story contains a comment section which allows users to discuss about the presented subject. Besides news stories Hacker News contains the following sections:

Hacker News API

The Hacker News API official documentation can be found here. The API serves data in JSON format. The hackeRnews package allows the retrieve this data in form of convenient R objects. Each object (story, comment, …) has a unique id and can be retrieved using this id. The API also provides a way to fetch up to 500 top and new stories, latest best stories, ask stories, show stories and job stories.

Examples of using the hackeRnews package to retrieve data from the official Hacker News API are presented below:

hackeRnews usage

library(hackeRnews)

news stories

To fetch best/new/top stories the user can use the get_*_stories function. Each function takes one optional argument max_items that limits the number of returned stories.

For example to fetch the top 5 best stories:

best_stories <- get_best_stories(max_items = 5)
best_stories[[1]]
#> List of 9
#>  $ by         : chr "belter"
#>  $ descendants: int 3574
#>  $ id         : int 43561253
#>  $ kids       : int [1:177] 43571614 43562363 43563625 43568725 43562878 43570277 43563538 43565994 43570551 43568692 ...
#>  $ score      : int 1871
#>  $ time       : POSIXct[1:1], format: "2025-04-02 22:39:06"
#>  $ title      : chr "US Administration announces 34% tariffs on China, 20% on EU"
#>  $ type       : chr "story"
#>  $ url        : chr "https://www.bbc.com/news/live/c1dr7vy39eet"
#>  - attr(*, "class")= chr "hn_item"

There is a method that allows to fetch just raw ids of best/new/top stories as well get_*_stories_ids()

best_stories_ids <- get_best_stories_ids()
best_stories_ids[1:5] # output truncated for legibility
#> [1] 43561253 43595269 43558671 43573156 43595585

ask / job / show stories

Similar to news stories. There are get_latest_*_stories that returns latest * stories and get_latest_*_stories_ids that returns latest * stories ids.

For example to fetch the 3 latest ask stories:

ask_stories <- get_latest_ask_stories(max_items = 3)
ask_stories[[1]]
#> List of 9
#>  $ by         : chr "thawawaycold"
#>  $ descendants: int 3
#>  $ id         : int 43618710
#>  $ kids       : int [1:2] 43619012 43618829
#>  $ score      : int 3
#>  $ text       : chr "Hi HN,<p>this is a follow-up post to https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43282410<p>TLDR of tha"| __truncated__
#>  $ time       : POSIXct[1:1], format: "2025-04-08 07:51:42"
#>  $ title      : chr "Ask HN: Pigeonholed into role with no coding, what do I do?"
#>  $ type       : chr "story"
#>  - attr(*, "class")= chr "hn_item"

comments

The discussion in story threads is represented as system of comments. Each story has top level comments ids stored under the kids property. Each comment post can have it’s own set of comments ids under kids property (sub-comments) and so on. In order to retrieve all of the comments of a specific story, just use the get_comments function.

top_story <- get_top_stories(max_items = 1)[[1]]
get_comments(top_story)
#> # A tibble: 181 × 7
#>          id deleted by           time                text           dead  parent
#>       <int> <lgl>   <chr>        <dttm>              <chr>          <lgl>  <int>
#>  1 43616592 FALSE   mrexroad     2025-04-08 00:22:34 "I clicked th… FALSE 4.36e7
#>  2 43616226 FALSE   pelagic_sky  2025-04-07 23:38:13 "Reminds me o… FALSE 4.36e7
#>  3 43616302 FALSE   zoogeny      2025-04-07 23:47:23 "I still reme… FALSE 4.36e7
#>  4 43618907 FALSE   blixt        2025-04-08 08:31:00 "I remember a… FALSE 4.36e7
#>  5 43619011 FALSE   chenhoey1211 2025-04-08 08:50:39 "“Japan has a… FALSE 4.36e7
#>  6 43616136 FALSE   NickC25      2025-04-07 23:26:29 "That&#x27;s … FALSE 4.36e7
#>  7 43616100 FALSE   1317         2025-04-07 23:23:11 "Original vid… FALSE 4.36e7
#>  8 43616097 FALSE   uneoneuno    2025-04-07 23:22:34 "Reminds me o… FALSE 4.36e7
#>  9 43616567 FALSE   mosura       2025-04-08 00:18:52 "This is a su… FALSE 4.36e7
#> 10 43618261 FALSE   xyzal        2025-04-08 06:07:00 "&gt; Kids ha… FALSE 4.36e7
#> # ℹ 171 more rows

user

To fetch data about user ‘jl’ just use the get_user_by_username function:

user <- get_user_by_username("jl")
user
#> List of 5
#>  $ about    : chr "This is a test"
#>  $ created  : POSIXct[1:1], format: "2007-03-15 02:50:46"
#>  $ id       : chr "jl"
#>  $ karma    : int 4307
#>  $ submitted: int [1:850] 35686379 35675818 25172559 25172553 19464269 18498213 16659709 16659632 16659556 14237416 ...
#>  - attr(*, "class")= chr "hn_user"

all items / latest items

It’s possible to iterate over latest items by fetching the id of the latest item by using the get_max_item_id function and then walking backwards to discover latest items. Using that method it’s possible to fetch all items on Hacker News.

For example to fetch 10 latest items:

max_item_id <- get_max_item_id()
latest_items <- get_items_by_ids(seq(max_item_id, max_item_id - 10))

updates

Latest items and profile changes can be retrieved using get_updates

updates <- get_updates()
updates$profiles[1:5] # output truncated for legibility
#> [1] "phendrenad2" "Baeocystin"  "razakel"     "rightbyte"   "mrexroad"
updates$items[1:5]    # output truncated for legibility
#> [1] 43597359 43615682 43609878 43608492 43618892