Discover and read the best of Twitter Threads about #yandexleak

Most recents (4)

#YandexLeak in Slice Web Production there are 42 factors in use to evaluate URLs, including URL characteristics (length, slashes, numbers), geography, user engagement, type and algorithms to calculate and predict relevance and quality (BM25, DSSM, GSK).
curiosity: IsObsolete checks it the URL is considered outdated if it has an old date. Old news is recognized as such. Factor 1 is applied if there is a year in the URL that is less than or equal to 2007.
#YandexLeak There are 3 factors used to detect AI-generated content. 2 of them with Rank Coefficient 0.08033186405 and 0.002431406823. They evaluate the unnaturalness of text from a Russian language perspective.
Read 16 tweets
So many people discussing the #YandexLeak ranking factors, missing out on (part2)

Tag TG_DEPRECATED

Code clearly states what it means:

"the factor shall not be used anywhere, but it is still computed and may be present in existing formulas"

Yet, many discard these factors.
1/
The there's the additional factors.
Many just base their analysis on a single file shared by @searchmartin ca. 48 hours after the actual leak, and not the leak itself

You can't judge a software system with 100 Gigs source code based on 1 text file :)


2/
Those 1923 factors are interesting.

But.... there's a lot more.



#YandexLeak
3/
Read 5 tweets
So many people discussing the #YandexLeak ranking factors.

But I think what's missing is a clear separation off static vs. dynamic factors.

It's tagged nicely in there.

TG_DYNAMIC depends on query

TG_STATIC does not depend on query

#yandex #seo

1/3
Second most important critera is the impact scope of any of those Factors ("modifiers")

TG_DOC
depends on document

TG_HOST
depends on host

TG_OWNER
depends on owner

TG_QUERY_ONLY
depends on query only, not document

#scope

2/3
While the scopes seem to clear, the TG_OWNER sparks my interest.

It appears to relate to "footprinting" owers/operators of websites, by using WHOIS data, for example.

Then based on that data penalties are issues, later in the detail factors.

#yandex #seo #factors

3/3
Read 4 tweets
The #YandexLeak is a "political" statement.

All the 100 GB files in the source code archive seem to have the same file date - Feb 24, 2022 - when Russia started its invasion of Ukraine.

It's a subtle hint that Putin will not get control over Yandex.

#yandex #yandexleak

1/4
That leak was an inside job, a highly privileged user - someone as powerful as the legendary "Matt Cutts" of Google...

So "political" means, for me, some kind of "defense" against the Putin regime having too much say in Yandex...

#yandex #yandexleak #leak #Russia

2/4
Yandex already open-sourced for example, the amazing Clickhouse database successfully, and in a way, they now "open-sourced," so many other Yandex quality products, which could be seen as a strong political statement in the future.

#yandex #yandexleak #leak #Russia

3/4
Read 4 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!