10. Způsoby evaluace algoritmů pro řazení (learning to rank) a doporučovacích systémů

Intersection and Divergence: Personalized Learning to Rank and Recommender Systems¶

Despite distinct objectives, Learning to Rank and Recommender systems often intersect and can synergistically function in specific use cases.

Learning to Rank algorithms primarily aim to optimally order a list of items based on a particular input. This input could range from user queries to various contextual data, with the item being anything applicable in the scenario.

On the flip side, Recommender Systems are focused on proposing a list of items likely to resonate with a user's interests. The recommended items are often tailored to individual user tastes and preferences, extrapolated from their previous engagements.

The convergence between these two systems surfaces when personalization is applied. A Personalized Learning to Rank algorithm, similar to a Recommender System, takes into account user preferences while establishing the ranking of items. On the other hand, a Recommender System incorporating ranking into its recommendations essentially executes a Learning to Rank task, with an added personalization layer.

Evaluating Personalized Learning to Rank algorithms would likely leverage the same evaluation metrics as those discussed above, albeit with the added complexity of accommodating different user profiles. For instance, we might gauge the NDCG (Normalized Discounted Cumulative Gain) for diverse user queries or assess the precision and recall of the ranking concerning various users.

Decoding Evaluation Metrics for Learning to Rank Algorithms¶

Learning to Rank involves harnessing machine learning for ranking tasks, finding extensive application in systems like search engines. The effectiveness of these algorithms is gauged using several evaluation methods:

Precision@N and Recall@N¶

Precision@N measures the proportion of the top-N ranked items that are relevant. The formula is: $Precision@N = \frac{\text{Number of relevant items in top-N}}{N}$

Recall@N measures the proportion of relevant items included in the top-N ranked items: $Recall@N = \frac{\text{Number of relevant items in top-N}}{\text{Total number of relevant items}}$

Cumulative Gain (CG)¶

Cumulative Gain (CG) is an evaluation metric for recommendation systems. It sums up the relevance scores of all items up to a given rank. This metric doesn't account for the position of the items in the ranking, just their relevance.

The Cumulative Gain at rank can be calculated using the following formula: $CG_p = \sum_{i=1}^{p}rel(i)$

Here: - $p$ is the rank up to which items are considered. - $rel(i)$ is the relevance score of the item at rank $i$ .

The relevance scores are typically binary (relevant or not relevant), but they can also take on other values, such as a user's rating of an item.

Normalized Discounted Cumulative Gain (NDCG)¶

NDCG is an essential metric that measures the effectiveness of a ranking algorithm. It's designed to evaluate the quality of a ranking list, considering not just the relevance of items but also their position in the list.

At its core, NDCG is based on two components: Discounted Cumulative Gain (DCG) and Ideal Discounted Cumulative Gain (IDCG).

Discounted Cumulative Gain (DCG)¶

DCG quantifies the value of a ranking list up to position $p$ by accumulating the relevance scores of the items, with each score being discounted based on its rank. Essentially, DCG gives more weight to relevant items if they appear higher in the ranking list, reflecting the intuition that higher-ranked items are more important.

Mathematically, DCG is defined as:

$DCG_p = \sum_{i=1}^{p} \frac{rel(i)}{log_2(i+1)}$

Here, $rel(i)$ is the relevance score of the item at rank $i$ , and $p$ is the rank up to which we are considering the items.

Ideal Discounted Cumulative Gain (IDCG)¶

IDCG represents the maximum possible DCG for a particular set of queries or documents. In other words, it's the DCG value we'd obtain if the items were ranked perfectly according to their relevance.

Normalizing DCG: The NDCG Metric¶

NDCG normalizes DCG by dividing it by IDCG, thereby constraining the score to a 0-1 range:

$NDCG_p = \frac{DCG_p}{IDCG_p}$

This normalization step makes NDCG a more versatile metric because it allows for comparisons between different queries, users, or contexts, where the number of relevant items may vary. Note that if there are no relevant items in the top-p results, both IDCG and NDCG are defined as zero.

Kendall's Tau¶

Kendall's Tau is a statistic used to measure the ordinal association between two measured quantities. It can be used to measure the similarity of the orderings of two ranking lists: $\tau = \frac{(\text{number of concordant pairs}) - (\text{number of discordant pairs})}{0.5n(n-1)}$ where $n$ is the number of items in the list.